AEPH
Home > Industry Science and Engineering > Vol. 2 No. 9 (ISE 2025) >
Investigation of Speech Emotion Recognition Techniques Utilizing Bidirectional LSTM and Attention Mechanism
DOI: https://doi.org/10.62381/I255906
Author(s)
Chengxia Li, Tiantian Liu
Affiliation(s)
School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, Henan, China
Abstract
Conventional speech emotion identification techniques depend on manually crafted acoustic features and superficial classifiers, resulting in restricted feature representation abilities and inadequate model generalization. This research develops a speech emotion recognition model that integrates bidirectional long short-term memory (LSTM) networks with attention mechanisms. The model initially extracts multidimensional acoustic properties from speech data, such as MFCC, Mel-spectrum, and spectral centroid. It subsequently employs a bidirectional LSTM layer to record contextual dependencies inside speech sequences and integrates an attention mechanism to emphasize emotion-critical portions. A multi-task learning framework is established to concurrently identify emotion categories, speech pace, and volume. Experiments indicate that the suggested model attains a validation accuracy of 95.28% across five emotion detection tests, surpassing SVM, LSTM, and Bi-LSTM models. This study presents a feasible approach for speech emotion recognition in complex environments and is instrumental in enhancing the emotional comprehension capabilities of human-computer interaction systems.
Keywords
Speech Emotion Recognition; Bidirectional Long Short-Term Memory Network; Attention Mechanism; Multi-Task Learning; Acoustic Feature Extraction
References
[1] Ayadi E M, Kamel S M, Karray F. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 2010, 44(3): 572-587. [2] Hodgson J P E. Speech emotion recognition: two decades in a nutshell, benchmarks, and ongoing trends. Computing reviews, 2020(8): 61. [3] Han K, Yu D, Tashev I. Speech Emotion Recognition Using Deep Neural Network and Extreme Learning Machine// Interspeech. 2014. [4] Chen Gang, Zhang Shiqing, Zhao Xiaoming. Natural Speech Emotion Recognition Using CNN+LSTM Combined with Data Balancing and Attention Mechanism. Computer Systems Applications, 2021, 30(05): 269-275. [5] Shiqing Z, Xiaoming Z, Qi T. Spontaneous Speech Emotion Recognition Using Multiscale Deep Convolutional LSTM. IEEE Transactions on Affective Computing, 2019, 1-1. [6] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735-1780. [7] Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks. Acoustics, Speech, and Signal Processing, 1988. ICASSP-88. 1988 International Conference on, 2013, 38. [8] Mirsamadi S, Barsoum E, Zhang C. Automatic speech emotion recognition using recurrent neural networks with local attention// 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017. [9] Zhao Yan, Zhao Li, Lu Cheng, et al. Speech emotion recognition method based on multi-head attention long short-term memory model. Journal of Southeast University (English Edition), 2022, 38 (02): 103-109. [10] Shen Yan, Li Hongyan, Meng Zhihong, et al. Speech Emotion Recognition Model Integrating Dual-Path CNN-LSTM and Attention Mechanism. Electronic Design Engineering, 2024, 32(18): 6-12.
Copyright @ 2020-2035 Academic Education Publishing House All Rights Reserved