[1] M. Swain, A. Routray, and P. Kabisatpathy, "Databases, features and classifiers for speech emotion recognition: a review, " International Journal of Speech Technology, vol. 21, pp. 93-120, یشن2018I:https://doi.org/10.1007/s10772-018-9491-z
[3] M. B. Akçay and K. Oğuz, "Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers, " Speech Communication, vol. 116, pp. 56-76, 2020.. DOI: https://doi.org/10.1016/j.specom.2019.12.001
[4] B. Du, Q. Gao, and H. Ning, "Survey on Intelligent Speech Emotion Recognition, " Forest Chemicals Review, pp. 230-260, 2021.
[5] A. B. A. Qayyum, A. Arefeen, and C. Shahnaz, "Convolutional neural network (CNN) based speech-emotion recognition, " in 2019 IEEE International Conference on Signal Processing, Information, Communication & Systems (SPICSCON), 2019, pp. 122-125. DOI:10.1109/SPICSCON48833.2019.9065172
[6] F. Makhmudov, A. Kutlimuratov, F. Akhmedov, M. S. Abdallah, and Y.-I. Cho, "Modeling Speech Emotion Recognition via Attention-Oriented Parallel CNN Encoders," Electronics, vol. 11, p. 4047, 2022.
[7] Z. Peng, Y. Lu, S. Pan, and Y. Liu, "Efficient speech emotion recognition using multi-scale cnn and attention, " in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 3020-3024.
[8] H. Zhang, R. Gou, J. Shang, F. Shen, Y. Wu, and G. Dai, "Pre-trained deep convolution neural network model with attention for speech emotion recognition, " Frontiers in Physiology, vol. 12, p. 643202, 2021.
[9] G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, et al., "Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network, " in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2016, pp. 5200-5204.
[10] W. Lim, D. Jang, and T. Lee, "Speech emotion recognition using convolutional and recurrent neural networks, " in
2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA), 2016, pp. 1-
DOI: 10.1109/APSIPA.2016.7820699
[11] C.-H. Wu and W.-B. Liang, "Emotion recognition of affective speech based on multiple classifiers using acoustic-prosodic information and semantic labels," IEEE Transactions on Affective Computing, vol. 2, pp. 10-21, 2010.
[12] F. Eyben, M. Wöllmer, A. Graves, B. Schuller, E. Douglas-Cowie, and R. Cowie, "On-line emotion recognition in a 3-D activation-valence-time continuum using acoustic and linguistic cues, " Journal on Multimodal User Interfaces, vol. 3, pp. 7-19, 2010
DOI: https://doi.org/10.1007/s12193-009-0032-6
[13] L. Tian, J. Moore, and C. Lai, "Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features, " in 2016 IEEE Spoken Language Technology Workshop (SLT), 2016, pp. 565-572.
[14] H. Kaya, D. Fedotov, A. Yesilkanat, O. Verkholyak, Y. Zhang, and A. Karpov, "LSTM Based Cross-corpus and Cross-task Acoustic Emotion Recognition, " in Interspeech, 2018, pp. 521-525.
[15] C.-W. Huang and S. S. Narayanan, "Deep convolutional recurrent neural network with attention mechanism for robust speech emotion recognition, " in 2017 IEEE international conference on multimedia and expo (ICME), 2017, pp. 583-588.
[16] S. Han, F. Leng, and Z. Jin, "Speech emotion recognition with a ResNet-CNN-Transformer parallel neural network, " in 2021 International Conference on Communications, Information System and Computer Engineering (CISCE), 2021, pp. 803-807.
[17] W. Chen, X. Xing, X. Xu, J. Pang, and L. Du, "DST: Deformable Speech Transformer for Emotion Recognition, " in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1-5.
[18] F. Andayani, L. B. Theng, M. T. Tsun, and C. Chua, "Hybrid LSTM-transformer model for emotion recognition from speech audio files, " IEEE Access, vol. 10, pp. 36018-36027, 2022.
[19] C. A. Kumar, A. D. Maharana, S. M. Krishnan, S. S. S. Hanuma, G. J. Lal, and V. Ravi, "Speech Emotion Recognition Using CNN-LSTM and Vision Transformer, " in International Conference on Innovations in Bio-Inspired Computing and Applications, 2022, pp. 86-97.
DOI: https://doi.org/10.1007/978-3-031-27499-2_8
[20] A. Dutt and P. Gader, "Wavelet Multiresolution Analysis Based Speech Emotion Recognition System Using 1D CNN LSTM Networks, " IEEE / ACM Transactions on Audio, Speech, and Language Processing, 2023.
[21] S. R. Livingstone and F. A. Russo, "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English, " PloS one, vol. 13, p. e0196391, 2018. DOI: https://doi.org/10.1371/journal.pone.0196391. g001
[22] K. Dupuis and M. K. Pichora-Fuller, "Toronto emotional speech set (TESS) -Younger talker_Happy," 2010.
[23] B. Salian, O. Narvade, R. Tambewagh, and S. Bharne, "Speech Emotion Recognition using Time Distributed CNN and LSTM, " in ITM Web of Conferences, 2021, p. 03006. DOI: https://doi.org/10.1051/itmconf/20214003006
[24] Luna-Jiménez, C.; Griol, D.; Callejas, Z.; Kleinlein, R.; Montero, J.M.; Fernández-Martínez, F. Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors 2021, 21, 7665. https://doi.org/10.3390/s21227665
[25] Tanberk, S., Tükel, D.B. (2022). Ensemble Learning with CNN–LSTM Combination for Speech Emotion Recognition. In: Bashir, A.K., Fortino, G., Khanna, A., Gupta, D. (eds) Proceedings of International Conference on Computing and Communication Networks. Lecture Notes in Networks and Systems, vol 394. Springer, Singapore. https://doi.org/10.1007/978-981-19-0604-6_5
[26] Lakshmi, K.L., Muthulakshmi, P., Nithya, A.A. et al. Recognition of emotions in speech using deep CNN and RESNET. Soft Comput (2023). https://doi.org/10.1007 / s00500-023-07969-5
[27] D. Issa, M. Fatih Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks, ” Biomed. Signal Process. Control, vol. 59, p. 101894, 2020, doi: 10.1016/j.bspc.2020.101894.
[28] Dangol, R., Alsadoon, A., Prasad, P.W.C. et al. Speech Emotion Recognition UsingConvolutional Neural Network and Long-Short TermMemory. Multimed Tools Appl 79, 32917–32934 (2020). https://doi.org/10.1007/s11042-020-09693-w