TY - JOUR
T1 - Multi-modal multi-concept-based deep neural network for automatic image annotation
AU - Xu, Haijiao
AU - Huang, Changqin
AU - Huang, Xiaodi
AU - Huang, Muxiong
N1 - Includes bibliographical references.
PY - 2019
Y1 - 2019
N2 - Automatic Image Annotation (AIA) remains as a challenge in computer vision with real-world applications, due to the semantic gap between high-level semantic concepts and low-level visual appearances. Contextual tags attached to visual images and context semantics among semantic concepts can provide further semantic information to bridge this gap. In order to effectively capture these semantic correlations, we present a novel approach called Multi-modal Multi-concept-based Deep Neural Network (M2-DNN) in this study, which models the correlations of visual images, contextual tags, and multi-concept semantics. Unlike traditional AIA methods, our M2-DNN approach takes into account not only single-concept context semantics, but also multi-concept context semantics with abstract scenes. In our model, a multi-concept such as {‘‘plane",‘‘buildings"} is viewed as one holistic scene concept for concept learning. Specifically, we first construct a multi-modal Deep Neural Network (DNN) as a concept classifier for visual images and contextual tags, and then employ it to annotate unlabeled images. Second, real-world databases commonly include many difficult concepts that are hard to be recognized, such as concepts with similar appearances, concepts with abstract scenes, and rare concepts. To effectively recognize them, we utilize multi-concept semantics inference and multi-modal correlation learning to refine semantic annotations. Finally, we estimate the most relevant labels for each of unlabeled images through a new strategy of label decision. The results of our comprehensive experiments on two publicly available datasets have shown that our method performs favourably compared with several other state-of-the-art methods.
AB - Automatic Image Annotation (AIA) remains as a challenge in computer vision with real-world applications, due to the semantic gap between high-level semantic concepts and low-level visual appearances. Contextual tags attached to visual images and context semantics among semantic concepts can provide further semantic information to bridge this gap. In order to effectively capture these semantic correlations, we present a novel approach called Multi-modal Multi-concept-based Deep Neural Network (M2-DNN) in this study, which models the correlations of visual images, contextual tags, and multi-concept semantics. Unlike traditional AIA methods, our M2-DNN approach takes into account not only single-concept context semantics, but also multi-concept context semantics with abstract scenes. In our model, a multi-concept such as {‘‘plane",‘‘buildings"} is viewed as one holistic scene concept for concept learning. Specifically, we first construct a multi-modal Deep Neural Network (DNN) as a concept classifier for visual images and contextual tags, and then employ it to annotate unlabeled images. Second, real-world databases commonly include many difficult concepts that are hard to be recognized, such as concepts with similar appearances, concepts with abstract scenes, and rare concepts. To effectively recognize them, we utilize multi-concept semantics inference and multi-modal correlation learning to refine semantic annotations. Finally, we estimate the most relevant labels for each of unlabeled images through a new strategy of label decision. The results of our comprehensive experiments on two publicly available datasets have shown that our method performs favourably compared with several other state-of-the-art methods.
KW - Automatic image annotation
KW - Deep neural network
KW - Multi-concept semantics
KW - Machine learning
U2 - 10.1007/s11042-018-6555-7
DO - 10.1007/s11042-018-6555-7
M3 - Article
SN - 1380-7501
VL - 78
SP - 30651
EP - 30675
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 21
ER -