AI-Enabled Multimodal Framework for Emotion Classification and Sentiment Analysis
Keywords:
multimodal sentiment analysis, emotion recognition, speech–text–vision fusion, cross-attention, self-supervised learning, fairness, calibration, deploymentAbstract
Human affect is inherently multimodal, expressed through prosody, lexical choice, facialdynamics, and body cues. Yet many deployed systems still rely on a single channel (usuallytext), limiting robustness in natural
References
A. Zadeh, P. P. Liang, S. Poria, E. Cambria, and L.-P. Morency, “Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion,” in Proc. ACL, 2018, pp. 2236–2246


