TeFNA: Text-centered Fusion Network with crossmodal attention for multimodal sentiment analysis

Changqin Huang, Junling Zhang, Xuemei Wu, Yi Wang, Ming Li, Xiaodi Huang

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)

Abstract

Multimodal sentiment analysis (MSA), which goes beyond the analysis of texts to include other modalities such as audio and visual data, has attracted a significant amount of attention. An effective fusion of sentiment information in multiple modalities is key to improving the performance of MSA. However, aligning multiple modalities during the process of fusion faces challenges such as maintaining modal-specific information. This paper proposes a Text-centered Fusion Network with crossmodal Attention (TeFNA), a multimodal fusion network that uses crossmodal attention to model unaligned multimodal timing information. In particular, TeFNA employs a Text-Centered Aligned fusion method (TCA) that takes text modality as the primary modality to improve the representation of fusion features. In addition, TeFNA maximizes the mutual information between modality pairs to maintain task-related emotional information, thereby ensuring that the key information of modalities from input to fusion is preserved. The results of our comprehensive experiments on the multimodal datasets of CMU-MOSI and CMU-MOSEI show that our proposed model outperforms methods in terms of most metrics used.
Original languageEnglish
Article number 110502
Number of pages10
JournalKnowledge-Based Systems
Volume269
Early online date29 Mar 2023
DOIs
Publication statusPublished - 07 Jun 2023

Fingerprint

Dive into the research topics of 'TeFNA: Text-centered Fusion Network with crossmodal attention for multimodal sentiment analysis'. Together they form a unique fingerprint.

Cite this