TY - JOUR
T1 - Text-centered cross-sample fusion network for multimodal sentiment analysis
AU - Huang, Qionghao
AU - Chen, Jili
AU - Huang, Changqin
AU - Huang, Xiaodi
AU - Wang, Yi
PY - 2024/7/30
Y1 - 2024/7/30
N2 - Significant advancements in multimodal sentiment analysis tasks have been achieved through cross-modal attention mechanisms (CMA). However, the importance of modality-specific information for distinguishing similar samples is often overlooked due to the inherent limitations of CMA. To address this issue, we propose a Text-centered Cross-sample Fusion Network (TeCaFN), which employs cross-sample fusion to perceive modality-specific information during modal fusion. Specifically, we develop a cross-sample fusion method that merges modalities from distinct samples. This method maintains detailed modality-specific information through the use of adversarial training combined with a task of pairwise prediction. Furthermore, a robust mechanism using a two-stage text-centric contrastive learning approach is developed to enhance the stability of cross-sample fusion learning. TeCaFN achieves state-of-the-art results on the CMU-MOSI, CMU-MOSEI, and UR-FUNNY datasets. Moreover, our ablation studies further demonstrate the effectiveness of contrastive learning and adversarial training as the components of TeCaFN in improving model performance. The code implementation of this paper is available at https://github.com/TheShy-Dream/MSA-TeCaFN.
AB - Significant advancements in multimodal sentiment analysis tasks have been achieved through cross-modal attention mechanisms (CMA). However, the importance of modality-specific information for distinguishing similar samples is often overlooked due to the inherent limitations of CMA. To address this issue, we propose a Text-centered Cross-sample Fusion Network (TeCaFN), which employs cross-sample fusion to perceive modality-specific information during modal fusion. Specifically, we develop a cross-sample fusion method that merges modalities from distinct samples. This method maintains detailed modality-specific information through the use of adversarial training combined with a task of pairwise prediction. Furthermore, a robust mechanism using a two-stage text-centric contrastive learning approach is developed to enhance the stability of cross-sample fusion learning. TeCaFN achieves state-of-the-art results on the CMU-MOSI, CMU-MOSEI, and UR-FUNNY datasets. Moreover, our ablation studies further demonstrate the effectiveness of contrastive learning and adversarial training as the components of TeCaFN in improving model performance. The code implementation of this paper is available at https://github.com/TheShy-Dream/MSA-TeCaFN.
UR - http://www.scopus.com/inward/record.url?scp=85200231666&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85200231666&partnerID=8YFLogxK
U2 - 10.1007/s00530-024-01421-w
DO - 10.1007/s00530-024-01421-w
M3 - Article
SN - 0942-4962
VL - 30
SP - 1
EP - 19
JO - Multimedia Systems
JF - Multimedia Systems
IS - 4
M1 - 228
ER -