[1] 王颖洁,朱久祺,汪祖民,等.自然语言处理在文本情感分析领域应用综述[J].计算机应用, 2022, 42(4): 1011-1020. DOI: 10.11772/j.issn.1001-9081.2021071262. [2] 徐月梅,胡玲,赵佳艺,等.大语言模型的技术应用前景与风险挑战[J].计算机应用, 2024, 40(5): 1-10. DOI: 10.11772/j.issn.1001-9081.2023060885. [3] LI R N, WU Z Y, JIA J, et al. Towards discriminative representation learning for speech emotion recognition[C] //Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. August 10-16, 2019. Macao, China. California: International Joint Conferences on Artificial Intelligence Organization, 2019. DOI: 10.24963/ijcai.2019/703. [4] HOLLER J, LEVINSON S C. Multimodal language processing in human communication[J]. Trends Cogn Sci, 2019, 23(8): 639-652. DOI: 10.1016/j.tics.2019.05.006. [5] GEETHA A V, MALA T, PRIYANKA D, et al. Multimodal Emotion Recognition with Deep Learning: Advancements, challenges, and future directions[J]. Information Fusion, 2024, 105: 102218. DOI: 10.1016/j.inffus.2023.102218. [6] RASIPURAM S, BHAT J H, MAITRA A. Multi-modal expression recognition in the wild using sequence modeling[C] //2020 15th IEEE International Conference on Automatic Face and Gesture Recognition(FG 2020). Buenos Aires, Argentina. IEEE, 2020: 629-631. DOI: 10.1109/FG47880.2020.00096. [7] MIAO H T, ZHANG Y F, WANG D L, et al. Multi-output learning based on multimodal GCN and co-attention for image aesthetics and emotion analysis[J]. Mathematics, 2021, 9(12): 1437. DOI: 10.3390/math9121437. [8] SINGH P, SRIVASTAVA R, RANA K P S, et al. A multimodal hierarchical approach to speech emotion recognition from audio and text[J]. Knowledge-Based Systems, 2021, 229: 107316. DOI: 10.1016/j.knosys.2021.107316. [9] ZHANG Q Y, WEI Y K, HAN Z B, et al. Multimodal fusion on low-quality data: a comprehensive survey[EB/OL]. 2024: arXiv: 2404.18947. http://arxiv.org/abs/2404.18947 [10] ZHANG S Q, YANG Y J, CHEN C, et al. Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: a systematic review of recent advancements and future prospects[J]. Expert Systems with Applications, 2024, 237: 121692. DOI: 10.1016/j.eswa.2023.121692. [11] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. 2018: arXiv: 1810.04805. http://arxiv.org/abs/1810.04805 [12] YANG K C, XU H, GAO K. CM-BERT: cross-modal BERT for text-audio sentiment analysis[C] //Proceedings of the 28th ACM International Conference on Multimedia. Seattle WA USA. ACM, 2020. DOI: 10.1145/3394171.3413690. [13] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]. 2014: arXiv: 1412.6572. http://arxiv.org/abs/1412.6572 [14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[EB/OL]. 2017: arXiv: 1706.03762. http://arxiv.org/abs/1706.03762 [15] 赵子天,詹文翰,段翰聪,等.基于SVD的深度学习模型对抗鲁棒性研究[J].计算机科学, 2023, 50(10): 362-368. DOI: 10.11896/jsjkx.220800090. [16] 冼广铭,招志锋,阳先平.基于注意力融合网络的方面级多模态情感分类[J].计算机系统应用, 2024, 33(2): 94-104. DOI: 10.15888/j.cnki.csa.009385. [17] ZADEH A, ZELLERS R, PINCUS E, et al. Multimodal sentiment intensity analysis in videos: facial gestures and verbal messages[J]. IEEE Intelligent Systems, 2016, 31(6): 82-88. DOI: 10.1109/MIS.2016.94. [18] BAGHER ZADEH A, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Melbourne, Australia. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. DOI: 10.18653/v1/p18-1208. [19] ABDU S A, YOUSEF A H, SALEM A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Information Fusion, 2021, 76: 204-226. DOI: 10.1016/j.inffus.2021.06.003. [20] ZADEH A, CHEN M H, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C] //Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark. Stroudsburg, PA, USA: Association for Computational Linguistics, 2017. DOI: 10.18653/v1/d17-1115. [21] LIU Z, SHEN Y, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C] //Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers). Melbourne, Australia. Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. DOI: 10.18653/v1/p18-1209. [22] ZADEH A, LIANG P P, MAZUMDER N, et al. Memory fusion network for multi-view sequential learning[C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2018. DOI: 10.1609/aaai.v32i1.12021. [23] RAHMAN W, HASAN M K, LEE S W, et al. Integrating multimodal information in large pretrained transformers[C] //Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online. Stroudsburg, PA, USA: Association for Computational Linguistics, 2020. DOI: 10.18653/v1/2020.acl-main.214. [24] TSAI Y H H, LIANG P P, ZADEH A, et al. Learning factorized multimodal representations[C] //Proceding of the 7th International Conference on Learning Representations. Appleton: ICLR, 2018: 1-20. DOI: 10.1109/ICLR.2018.0018. [25] HAZARIKA D, ZIMMERMANN R, PORIA S. MISA: modality-invariant and-specific representations for multimodal sentiment analysis[C] //Proceedings of the 28th ACM International Conference on Multimedia. Seattle WA USA. ACM, 2020. DOI: 10.1145/3394171.3413678. [26] TSAI Y H H, BAI S J, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C] //Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy. Stroudsburg, PA, USA: Association for Computational Linguistics, 2019. DOI: 10.18653/v1/p19-1656. [27] ZHANG Q A, SHI L, LIU P Y, et al. Retraction Note: ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis[J]. Applied Intelligence, 2023, 53(16): 19808. DOI: 10.1007/s10489-023-04869-x. [28] 王香,毛力,陈祺东,等.融合动态梯度和多视图协同注意力的情感分析[J].计算机科学与探索, 2024, 18(5): 1328-1338. DOI: 10.3778/j.issn.1673-9418.2301042. [29] YU W M, XU H, YUAN Z Q, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis [C] //Proceedings of the AAAI Conference on Artificial Intelligence. 2021. DOI: 10.1609/aaai.v35i12.17289. [30] SUN H, WANG H Y, LIU J Q, et al. CubeMLP: an MLP-based model for multimodal sentiment analysis and depression estimation[C] //Proceedings of the 30th ACM International Conference on Multimedia. Lisboa Portugal. ACM, 2022. DOI: 10.1145/3503161.3548025. [31] 丁健,杨亮,林鸿飞,等.基于多模态异质动态融合的情绪分析研究[J].中文信息学报, 2022, 36(5): 112-124. DOI: 10.3969/j.issn.1003-0077.2022.05.012. ( |