Multi-modal multi-concept-based deep neural network for automatic image annotation

Haijiao Xu, Changqin Huang, Xiaodi Huang, Muxiong Huang

Research output: Contribution to journalArticlepeer-review

9 Citations (Scopus)

Abstract

Automatic Image Annotation (AIA) remains as a challenge in computer vision with real-world applications, due to the semantic gap between high-level semantic concepts and low-level visual appearances. Contextual tags attached to visual images and context semantics among semantic concepts can provide further semantic information to bridge this gap. In order to effectively capture these semantic correlations, we present a novel approach called Multi-modal Multi-concept-based Deep Neural Network (M2-DNN) in this study, which models the correlations of visual images, contextual tags, and multi-concept semantics. Unlike traditional AIA methods, our M2-DNN approach takes into account not only single-concept context semantics, but also multi-concept context semantics with abstract scenes. In our model, a multi-concept such as {‘‘plane",‘‘buildings"} is viewed as one holistic scene concept for concept learning. Specifically, we first construct a multi-modal Deep Neural Network (DNN) as a concept classifier for visual images and contextual tags, and then employ it to annotate unlabeled images. Second, real-world databases commonly include many difficult concepts that are hard to be recognized, such as concepts with similar appearances, concepts with abstract scenes, and rare concepts. To effectively recognize them, we utilize multi-concept semantics inference and multi-modal correlation learning to refine semantic annotations. Finally, we estimate the most relevant labels for each of unlabeled images through a new strategy of label decision. The results of our comprehensive experiments on two publicly available datasets have shown that our method performs favourably compared with several other state-of-the-art methods.
Original languageEnglish
Pages (from-to)30651-30675
Number of pages25
JournalMultimedia Tools and Applications
Volume78
Issue number21
Early online date24 Aug 2018
DOIs
Publication statusPublished - 2019

Fingerprint

Dive into the research topics of 'Multi-modal multi-concept-based deep neural network for automatic image annotation'. Together they form a unique fingerprint.

Cite this