A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities

Yee Wan Wong, Sue Inn Ch'ng, Kah Phooi Seng, Li-Minn Ang, Siew Wen Chin, Wei Jen Chew, King Hann Lim

    Research output: Contribution to journalArticlepeer-review

    13 Citations (Scopus)

    Abstract

    Audio-visual recognition system is becoming popular because it overcomes certain problems of traditional audio-only recognition system. However, difficulties due to visual variations in video sequence can significantly degrade the recognition performance of the system. This problem can be further complicated when more than one visual variation happen at the same time. Although several databases have been created in this area, none of them includes realistic visual variations in video sequence. With the aim to facilitate the development of robust audio-visual recognition systems, the new audio-visual UNMC-VIER database is created. This database contains various visual variations including illumination, facial expression, head pose, and image resolution variations. The most unique aspect of this database is that it includes more than one visual variation in the same video recording. For the audio part, the utterances are spoken in slow and normal speech pace to improve the learning process of audio-visual speech recognition system. Hence, this database is useful for the development of robust audio-visual person, speech recognition and face recognition systems.
    Original languageEnglish
    Pages (from-to)1503-1510
    Number of pages8
    JournalPattern Recognition Letters
    Volume32
    Issue number13
    Early online dateJun 2011
    DOIs
    Publication statusPublished - 01 Oct 2011

    Fingerprint

    Dive into the research topics of 'A new multi-purpose audio-visual UNMC-VIER database with multiple variabilities'. Together they form a unique fingerprint.

    Cite this