TY - JOUR
T1 - Specificity and latent correlation learning for action recognition using synthetic multi-view data from depth maps
AU - Liang, Bin
AU - Zheng, Lihong
N1 - Includes bibliographical references.
PY - 2017/12
Y1 - 2017/12
N2 - This paper presents a novel approach to action recognition using synthetic multi-view data from depth maps. Specifically, multiple views are first generated by rotating 3D point clouds from depth maps. A pyramid multi-view depth motion template is then adopted for multi-view action representation, characterizing the multi-scale motion and shape patterns in 3D. Empirically, despite the view-specific information, the latent information between multiple views often provides important cues for action recognition. Concentrating on this observation and motivated by the success of the dictionary learning framework, this paper proposes to explicitly learn a view-specific dictionary (called specificity) for each view, and simultaneously learn a latent dictionary (called latent correlation) across multiple views. Thus, a novel method, specificity and latent correlation learning, is put forward to learn the specificity that captures the most discriminative features of each view, and learn the latent correlation that contributes the inherent 3D information to multiple views. In this way, a compact and discriminative dictionary is constructed by specificity and latent correlation for feature representation of actions. The proposed method is evaluated on the MSR Action3D, the MSR Gesture3D, the MSR Action Pairs, and the ChaLearn multi-modal data sets, consistently achieving promising results compared with the state-of-the-art methods based on depth data.
AB - This paper presents a novel approach to action recognition using synthetic multi-view data from depth maps. Specifically, multiple views are first generated by rotating 3D point clouds from depth maps. A pyramid multi-view depth motion template is then adopted for multi-view action representation, characterizing the multi-scale motion and shape patterns in 3D. Empirically, despite the view-specific information, the latent information between multiple views often provides important cues for action recognition. Concentrating on this observation and motivated by the success of the dictionary learning framework, this paper proposes to explicitly learn a view-specific dictionary (called specificity) for each view, and simultaneously learn a latent dictionary (called latent correlation) across multiple views. Thus, a novel method, specificity and latent correlation learning, is put forward to learn the specificity that captures the most discriminative features of each view, and learn the latent correlation that contributes the inherent 3D information to multiple views. In this way, a compact and discriminative dictionary is constructed by specificity and latent correlation for feature representation of actions. The proposed method is evaluated on the MSR Action3D, the MSR Gesture3D, the MSR Action Pairs, and the ChaLearn multi-modal data sets, consistently achieving promising results compared with the state-of-the-art methods based on depth data.
KW - Depth maps
KW - Multi-view action recognition
KW - Sparse coding
KW - Specificity and latent correlation learning
UR - http://www.scopus.com/inward/record.url?scp=85028453477&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028453477&partnerID=8YFLogxK
U2 - 10.1109/TIP.2017.2740122
DO - 10.1109/TIP.2017.2740122
M3 - Article
C2 - 28816663
AN - SCOPUS:85028453477
SN - 1057-7149
VL - 26
SP - 5560
EP - 5574
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
IS - 12
M1 - 8010423
ER -