On space-time filtering framework for matching human actions across multiple viewpoints

Anwaar Ul-Haq, Xiaoxia Yin, Jing He, Yanchun Zhang

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)


Space-time template matching is considered as a promising approach for human action recognition. However, a major drawback of template-based methods is computational overhead due to matching in spatial domain. Recently, spacetime correlation-based action filters have been proposed for recognizing human actions in frequency domain. These action filters present reduction in time complexity as Fourier transformbased matching is faster than spatial template matching. However, the utility of such action filters is challenged due to a number of factors: 1) inability to deal with view variations due to implicit lack of support for view-invariance; 2) these filters can be trained only for one action class at a time, and separate filters are required for each action class with increased computational overhead; 3) these filters simply take average of similar action instances and behave no better than average filters; and 4) slightly misaligned action data sets create problems as these filters are not shift-invariant. In this paper, we try to address these shortcomings by proposing an advanced space-time filtering framework for recognizing human actions despite large viewpoint variations. Rather than using crude intensity values, we use 3D tensor structure at each pixel, which characterizes the most common local motion in action sequences. Discrete tensor Fourier transform is then applied to achieve frequency domain representations. Then, we form view clusters from multiple view action data and use space-time correlation filtering to achieve discriminative view representations. These representations are used in an innovative way to achieve action recognition despite viewpoint variations. Extensive experimentation is performed on well-known multiple view action data sets, including IXMAS, WVU, and N-UCLA action data set. A detailed performance comparison with the existing view-invariant action recognition techniques indicates that our approach works equally well for RGB and RGB-D video data with increased accuracy and efficiency.
Original languageEnglish
Pages (from-to)1230-1242
Number of pages13
JournalIEEE Transactions on Image Processing
Issue number3
Early online date23 Oct 2017
Publication statusPublished - Mar 2018


Dive into the research topics of 'On space-time filtering framework for matching human actions across multiple viewpoints'. Together they form a unique fingerprint.

Cite this