Audio-visual speech processing for human computer interaction

S.W. Chin, K.P. Seng, L.-M. Ang

    Research output: Book chapter/Published conference paperChapter (peer-reviewed)peer-review

    14 Citations (Scopus)
    4 Downloads (Pure)

    Abstract

    This chapter presents an audio-visual speech recognition (AVSR) for Human Computer Interaction (HCI) that mainly focuses on 3 modules: (i) the radial basis function neural network (RBF-NN) voice activity detection (VAD) (ii) the watershed lips detection and H∞ lips tracking and (iii) the multi-stream audio-visual back-end processing. The importance of the AVSR as the pipeline for the HCI and the background studies of the respective modules are first discussed follow by the design details of the overall proposed AVSR system. Compared to the conventional lips detection approach which needs a prerequisite skin/non-skin detection and face localization, the proposed watershed lips detection with the aid of H∞ lips tracking approach provides a potentially time saving direct lips detection technique, rendering the preliminary criterion obsolete. Alternatively, with a better noise compensation and a more precise speech localization offered by the proposed RBF-NN VAD compared to the conventional zero-crossing rate and short-term signal energy, it has yield to a higher performance capability for the recognition process through the audio modality. Lastly, the developed AVSR system which integrates the audio and visual information, as well the temporal synchrony audiovisual data stream has proved to obtain a significant improvement compared to the unimodal speech recognition, also the decision and feature integration approaches. © Springer-Verlag Berlin Heidelberg 2012.
    Original languageEnglish
    Title of host publicationAdvances in robotics and virtual reality
    EditorsTauseef Gulrez , Aboul Ella Hassanien
    Place of PublicationBerlin, Heidelberg
    PublisherSpringer
    Pages135-165
    Number of pages31
    Volume26
    Edition1
    ISBN (Electronic)9783642233630
    ISBN (Print)9783642233623
    DOIs
    Publication statusPublished - 2012

    Publication series

    NameIntelligent Systems Reference Library
    PublisherSpringer
    Volume26
    ISSN (Print)1868-4394

    Fingerprint

    Dive into the research topics of 'Audio-visual speech processing for human computer interaction'. Together they form a unique fingerprint.

    Cite this