Audio and visual speech recognition recent trends

Lee Hao Wei, Kah Phooi Seng, Li-Minn Ang

    Research output: Book chapter/Published conference paperChapter (peer-reviewed)peer-review

    1 Citation (Scopus)


    This chapter focuses on a brief introduction on the origins of the audio-visual speech recognition process and relevant techniques often used by researchers in the field. Brief background theory regarding commonly used methods for feature extraction and classification for both audio and visual processing are discussed with highlights pertaining to Mel-Frequency Cepstral Coefficient, and contour/geometric based lips feature extraction with corresponding tracking methods (Yingjie, Haiyan, Yingjie, & Jinyang, 2011; Liu & Cheung, 2011). Proposed solution concepts will include time derivatives of mel-frequency cepstral coefficients for audio feature extraction, Chroma-colour-based (YCbCr) Face segmentation, Feature Point extraction, Localized Active Contour tracking algorithm, and Hidden Markov Models with Vitebri algorithm incorporated. Information contained in this chapter focuses on being informative for novice speech processing candidates but insufficient mastery knowledge. Additional suggested reading materials should assist in expediting field mastery.
    Original languageEnglish
    Title of host publicationIntelligent image and video interpretation
    Subtitle of host publicationAlgorithms and applications
    EditorsJing Tian, Li Chen
    Place of Publication Hershey, PA
    PublisherInformation Science Reference
    Number of pages45
    ISBN (Electronic)9781466639591
    ISBN (Print)9781466639584, 9781466639607
    Publication statusPublished - 2013


    Dive into the research topics of 'Audio and visual speech recognition recent trends'. Together they form a unique fingerprint.

    Cite this