Music Emotion Classification With Neural Network Architecture And Librosa

Authors

  • Syed Aiyan B.E Student’s; Department Of Computer Science And Engineering, ISL Engineering College, Hyderabad, India. Author
  • Syed Wajahat Ali B.E Student’s; Department Of Computer Science And Engineering, ISL Engineering College, Hyderabad, India. Author
  • Mohammed Fardeen Younus B.E Student’s; Department Of Computer Science And Engineering, ISL Engineering College, Hyderabad, India. Author
  • Mr. Mohammed Rahmat Ali Assistant Professor; Department Of Computer Science And Engineering, ISL Engineering College, Hyderabad, India. Author

DOI:

https://doi.org/10.63665/vq6wg888

Keywords:

CNN, MFCC, Pattern Recognition, Chroma Features, Intelligent Music Systems, Artificial Intelligence in Music

Abstract

The classification of musical emotions is essential for organizing, searching, and recommending music on modern platforms. Traditional models often rely on raw audio or textual features, which may not fully capture the rich emotional content embedded in music. To address this, we propose a Convolutional Neural Network (CNN)-based model combined with Librosa for feature extraction to classify musical emotions effectively. In the proposed approach, Librosa is used to extract meaningful audio features from music signals, including Mel-frequency cepstral coefficients (MFCCs), chroma features, spectral contrast, and tonettes representations. These features provide a compact and informative representation of the audio, capturing timbral, harmonic, and rhythmic characteristics relevant to emotion recognition. The CNN model is then applied to learn hierarchical patterns from these extracted features. Convolutional layers automatically capture local correlations in the audio features, while pooling layers reduce dimensionality and highlight dominant emotional patterns. This deep learning framework eliminates the need for handcrafted feature combinations, allowing the model to generalize effectively across diverse music samples. By combining Librosa feature extraction with the pattern learning capability of CNNs, the proposed system is able to capture complex emotional relationships in music. This approach offers a robust and scalable solution for automated music emotion classification, supporting applications such as music recommendation, playlist generation, and music analytics in real-world platforms.

Downloads

Download data is not yet available.

References

[1] L. Zhou, ‘‘Cultivation of artistic expression in college music and vocal music teaching,’’ Art Perform. Lett., vol. 4, no. 12, pp. 43–49, 2023.

[2] S. Ding, ‘‘Research on the artistic expression of vocal music,’’ in Proc. 2nd Int. Conf. Culture, Educ. Econ. Develop. Modern Soc. (ICCESE). Atlantis Press, 2018, pp. 663–665.

[3] A. Sabbadini, ‘‘Opera on the couch: Music, emotional life, and unconscious aspects of music,’’ Int. J. Psychoanalysis, vol. 104, no. 1, pp. 183–185, Jan. 2023.

[4] Q. Xianyang, ‘‘Research of lens model in music emotional communication,’’ BioTechnol, Indian J., vol. 10, p. 19, 2015.

[5] C. Nussbaum, A. Schirmer, and S. R. Schweinberger, ‘‘Electrophysiological correlates of vocal emotional processing in musicians and non-musicians,’’ Brain Sci., vol. 13, no. 11, p. 1563, Nov. 2023.

[6] J. J. Campos-Bueno et al., ‘‘Emotional dimensions of music and painting and their interaction,’’ Spanish J. Psychol., vol. 18, p. E54, 2015.

[7] T. Fischinger, M. Kaufmann, and W. Schlotz, ‘‘If it’s mozart, it must be good? The influence of textual information and age on musical appreciation,’’ Psychol. Music, vol. 48, no. 4, pp. 579–597, Jul. 2020.

[8] X. Cai and H. Zhang, ‘‘Music genre classification based on auditory image, spectral and acoustic features,’’ Multimedia Syst., vol. 28, no. 3, pp. 779–791, Jun. 2022.

[9] B. Wilkes, I. Vatolkin, and H. Müller, ‘‘Statistical and visual analysis of audio, text, and image features for multi-modal music genre recognition,’’ Entropy, vol. 23, no. 11, p. 1502, Nov. 2021.

[10]N. Zeng, P. Wu, Z. Wang, H. Li, W. Liu, and X. Liu, ‘‘A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection,’’ IEEE Trans. Instrum. Meas., vol. 71, pp. 1–14, 2022.

[11]Y. Dong, Q. Liu, B. Du, and L. Zhang, ‘‘Weighted feature fusion of convolutional neural network and graph attention network for hyperspectral image classification,’’ IEEE Trans. Image Process., vol. 31, pp. 1559–1572, 2022.

[12]D. Pathak and U. S. N. Raju, ‘‘Content-based image retrieval for superresolutioned images using feature fusion: Deep learning and hand crafted,’’ Concurrency Comput., Pract. Exper., vol. 34, no. 22, 2022, Art. no. e6851.

[13]C. Yuan, Q. Ma, J. Chen, W. Zhou, X. Zhang, X. Tang, J. Han, and S. Hu, ‘‘Exploiting heterogeneous artist and listener preference graph for music genre classification,’’ in Proc. 28th ACM Int. Conf. Multimedia, Oct. 2020, pp. 3532–3540.

[14]Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, ‘‘A comprehensive survey on graph neural networks,’’ IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan. 2021.

[15] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, ‘‘Graph neural networks: A review of methods and applications,’’ 2018, arXiv:1812.08434.

[16]W. Fan, Y. Ma, Q. Li, Y. He, E. Zhao, J. Tang, and D. Yin, ‘‘Graph neural networks for social recommendation,’’ in Proc. World Wide Web Conf., 2019, pp. 417–426.

[17]X. Wang et al., ‘‘Heterogeneous graph attention network,’’ in Proc. World Wide Web Conf., 2019, pp. 2022–2032.

[18]R. Bing, G. Yuan, M. Zhu, F. Meng, H. Ma, and S. Qiao, ‘‘Heterogeneous graph neural networks analysis: A survey of techniques, evaluations and applications,’’ Artif. Intell. Rev., vol. 56, no. 8, pp. 8003–8042, Aug. 2023.

[19]C. Shi, ‘‘Heterogeneous graph neural networks,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2019, pp. 793–803.

[20]V. Mounika and Y. Charitha, ‘‘Mood-Enhancing music recommendation system based on audio signals and emotions,’’ in Proc. Int. Conf. Inventive Comput. Technol. (ICICT), Apr. 2023, pp. 1766–1772.

[21]X. Song et al., ‘‘Automatic recognition of uterine contractions with electrohysterogram signals based on the zero-crossing rate,’’ Sci. Rep., vol. 11, no. 1, p. 1956, 2021.

[22]B. Baris, M. E. Cek, and D. G. Kuntalp, ‘‘Modulation classification of MFSK modulated signals using spectral centroid,’’ Wireless Pers. Commun., vol. 119, no. 1, pp. 763–775, Jul. 2021.

[23]D. H. Rudd et al., ‘‘Leveraged mel spectrograms using harmonic and percussive components in speech emotion recognition,’’ in Proc. PacificAsia Conf. Knowl. Discovery Data Mining. Cham, Switzerland: Springer, 2023, pp. 392–404.

[24]Y. Zhang, G. Kolkman, and H. Watanabe, ‘‘Phase repair for timedomain convolutional neural networks in music super-resolution,’’ 2023, arXiv:2306.11282.

[25]N. J. O’Leary, ‘‘The tempest presented by the lord Denney’s players, and: The tempest presented by the Cincinnati Shakespeare company,’’ Shakespeare Bull., vol. 35, no. 3, pp. 487–495, 2017.

[26]T. N. Kipf and M. Welling, ‘‘Semi-supervised classification with graph convolutional networks,’’ 2016, arXiv:1609.02907.

[27]P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, ‘‘Graph attention networks,’’ 2017, arXiv:1710.10903.

[28]B. Perozzi, R. Al-Rfou, and S. Skiena, ‘‘Deepwalk: Online learning of social representations,’’ in Proc. 20th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2014, pp. 135–144.

[29]Y. Dong, N. V. Chawla, and A. Swami, ‘‘metapath2vec: Scalable representation learning for heterogeneous networks,’’ in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, New York, NY, USA, Aug. 2017, pp. 135–144.

[30]J. M. Keller, M. R. Gray, and J. A. Givens, ‘‘A fuzzy K-nearest neighbor algorithm,’’ IEEE Trans. Syst., Man, Cybern., vols. SMC–15, no. 4, pp. 580–585, Jul. 1985.

[31]G. Zhao et al., ‘‘Review-driven multi-label music style classification by exploiting style correlations,’’ 2018, arxiv:1808.07604.

[32]Q. Ma, C. Yuan, W. Zhou, J. Han, and S. Hu, ‘‘Beyond statistical relations: Integrating knowledge relations into style correlations for multi-label music style classification,’’ in Proc. 13th Int. Conf. Web Search Data Mining, Jan. 2020, pp. 411–419.

[33]L. Fanioudakis and I. Potamitis, ‘‘Deep networks tag the location of bird vocalisations on audio spectrograms,’’ 2017, arXiv:1711.04347.

Downloads

Published

2026-04-26

How to Cite

Music Emotion Classification With Neural Network Architecture And Librosa. (2026). International Journal of Multidisciplinary Engineering In Current Research, 11(4s), 93-100. https://doi.org/10.63665/vq6wg888