Aspect-level multimodal co-attention graph convolutional sentiment analysis model
Aspect-level multimodal sentiment analysis, which aims to predict the sentiment polarity of specific aspects mentioned in multimodal data, is gaining increasing attention. However, current methods often fail to adequately consider the directional role of aspect terms in context modeling and fine-grained alignment across modalities, limiting the performance of aspect-level multimodal sentiment analysis. To address these issues, we propose an aspect-level multimodal co-attention graph convolutional sentiment analysis model (AMCGC) to simultaneously model aspect-directed contextual semantic associations within modalities and fine-grained cross-modal alignment to improve sentiment analysis performance. To capture aspect-oriented local semantic relevance within modalities, AMCGC utilizes an orthogonally constrained self-attention mechanism to generate semantic graphs for each modality. Then, through graph convolution, it obtains a textual semantic graph representation containing aspect terms and a visual semantic graph representation incorporating aspect terms. Two gated local cross-modal interaction mechanisms in different directions are designed to progressively achieve fine-grained cross-modal correlation alignment between the textual semantic graph representation and the visual semantic graph representation, thereby narrowing the heterogeneous gap between modalities. Finally, an aspect mask is designed to select aspect node features from each modality's graph representation as sentiment representation, and a cross-modal loss is introduced to reduce the differences in heterogeneous aspect features. Comparing the proposed method with nine other methods on two multimodal datasets, the method achieved a 1.76% improvement in accuracy over the second-best performing model on the Twitter-2015 dataset and a 1.19% improvement on the Twitter-2017 dataset. Ablation experiments evaluated the orthogonality constraint, cross-modal loss, and cross-cooperative multimodal fusion, validating the rationality of each component of the AMCGC model. The AMCGC model proposed in this paper can better capture the local semantic correlation within the modality and the fine-grained alignment between modalities, improving the accuracy of aspect-level multimodal sentiment analysis.