Audio-visual recognition system in compression domain

Yee Wan Wong, Kah Phooi Seng, Li-Minn Ang

Research output: Contribution to journalArticle

5 Citations (Scopus)
5 Downloads (Pure)

Abstract

This paper presents a highly efficient audio-visual recognition system in compression domain. For face recognition systems, the multiband feature fusion method selects the wavelet subbands that are invariant to illumination and facial expression variations. These subbands will be extracted directly from the inverse quantization in the compression system. By taking the inverse quantized wavelet coefficient of the video as the input, the inverse wavelet transform which corresponds to image reconstruction is omitted. As a result, the computational complexity of the conventional video-based face recognition system is reduced. We also present a set of new face localization methods to localize the facial wavelet coefficients from the wavelet subband image. The dual optimal multiband feature fusion method is then used to fuse the two set of wavelet coefficients and generate the visual scores. Experimental results show that with low computational complexity, the proposed system achieves high recognition accuracy in UNMC-VIER, CUAVE, and XM2VTS audio-visual database.
Original languageEnglish
Pages (from-to)637-646
Number of pages10
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume21
Issue number5
DOIs
Publication statusPublished - May 2011

Cite this

Wong, Yee Wan ; Seng, Kah Phooi ; Ang, Li-Minn. / Audio-visual recognition system in compression domain. In: IEEE Transactions on Circuits and Systems for Video Technology. 2011 ; Vol. 21, No. 5. pp. 637-646.
@article{8dd19e4312354e92adf6f02d3748fdd9,
title = "Audio-visual recognition system in compression domain",
abstract = "This paper presents a highly efficient audio-visual recognition system in compression domain. For face recognition systems, the multiband feature fusion method selects the wavelet subbands that are invariant to illumination and facial expression variations. These subbands will be extracted directly from the inverse quantization in the compression system. By taking the inverse quantized wavelet coefficient of the video as the input, the inverse wavelet transform which corresponds to image reconstruction is omitted. As a result, the computational complexity of the conventional video-based face recognition system is reduced. We also present a set of new face localization methods to localize the facial wavelet coefficients from the wavelet subband image. The dual optimal multiband feature fusion method is then used to fuse the two set of wavelet coefficients and generate the visual scores. Experimental results show that with low computational complexity, the proposed system achieves high recognition accuracy in UNMC-VIER, CUAVE, and XM2VTS audio-visual database.",
keywords = "Audio-visual recognition, computational complexity, face localization, face segmentation, video-based face recognition, wavelet transform",
author = "Wong, {Yee Wan} and Seng, {Kah Phooi} and Li-Minn Ang",
year = "2011",
month = "5",
doi = "10.1109/TCSVT.2011.2129670",
language = "English",
volume = "21",
pages = "637--646",
journal = "IEEE Transactions on Circuits and Systems for Video Technology",
issn = "1051-8215",
publisher = "IEEE, Institute of Electrical and Electronics Engineers",
number = "5",

}

Audio-visual recognition system in compression domain. / Wong, Yee Wan; Seng, Kah Phooi; Ang, Li-Minn.

In: IEEE Transactions on Circuits and Systems for Video Technology, Vol. 21, No. 5, 05.2011, p. 637-646.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Audio-visual recognition system in compression domain

AU - Wong, Yee Wan

AU - Seng, Kah Phooi

AU - Ang, Li-Minn

PY - 2011/5

Y1 - 2011/5

N2 - This paper presents a highly efficient audio-visual recognition system in compression domain. For face recognition systems, the multiband feature fusion method selects the wavelet subbands that are invariant to illumination and facial expression variations. These subbands will be extracted directly from the inverse quantization in the compression system. By taking the inverse quantized wavelet coefficient of the video as the input, the inverse wavelet transform which corresponds to image reconstruction is omitted. As a result, the computational complexity of the conventional video-based face recognition system is reduced. We also present a set of new face localization methods to localize the facial wavelet coefficients from the wavelet subband image. The dual optimal multiband feature fusion method is then used to fuse the two set of wavelet coefficients and generate the visual scores. Experimental results show that with low computational complexity, the proposed system achieves high recognition accuracy in UNMC-VIER, CUAVE, and XM2VTS audio-visual database.

AB - This paper presents a highly efficient audio-visual recognition system in compression domain. For face recognition systems, the multiband feature fusion method selects the wavelet subbands that are invariant to illumination and facial expression variations. These subbands will be extracted directly from the inverse quantization in the compression system. By taking the inverse quantized wavelet coefficient of the video as the input, the inverse wavelet transform which corresponds to image reconstruction is omitted. As a result, the computational complexity of the conventional video-based face recognition system is reduced. We also present a set of new face localization methods to localize the facial wavelet coefficients from the wavelet subband image. The dual optimal multiband feature fusion method is then used to fuse the two set of wavelet coefficients and generate the visual scores. Experimental results show that with low computational complexity, the proposed system achieves high recognition accuracy in UNMC-VIER, CUAVE, and XM2VTS audio-visual database.

KW - Audio-visual recognition

KW - computational complexity

KW - face localization

KW - face segmentation

KW - video-based face recognition

KW - wavelet transform

U2 - 10.1109/TCSVT.2011.2129670

DO - 10.1109/TCSVT.2011.2129670

M3 - Article

VL - 21

SP - 637

EP - 646

JO - IEEE Transactions on Circuits and Systems for Video Technology

JF - IEEE Transactions on Circuits and Systems for Video Technology

SN - 1051-8215

IS - 5

ER -