Bengali Audio-Visual Corpus for Visual Speech Recognition

  • Ashish Pondit (Creator)
  • Muhammad Eshaque Ali Rukon (Creator)
  • Anik Das (Creator)
  • Ashad Kabir (Supervisor)

Dataset

Description of Data

The BenAV dataset contains a lexicon of 50 words from 128 speakers (107 male and 21 female) with 26,300 utterances. The average number of speakers for each word is 18 (max 20, min 12, and standard deviation 1.826). The total duration of the dataset is 7.3 hours. This is the first Bengali audio-visual dataset that can be used for various research, including acoustic speech recognition and audio-visual speech recognition.
Date made available10 Mar 2021
PublisherSpringer
Date of data production2021 -
  • BenAV: A Bengali audio-visual corpus for visual speech recognition

    Pondit, A., Rukon, M. E. A., Das, A. & Kabir, A., 07 Dec 2021, (E-pub ahead of print) Neural Information Processing: 28th International Conference, ICONIP 2021, Proceedings, Part II. Mantoro, T., Lee, M., Ayu, M. A., Wong, K. W. & Hidayanto, A. N. (eds.). Cham, Switzerland: Springer, Vol. 13109. p. 526-535 10 p. (Lecture Notes in Computer Science; vol. 13109).

    Research output: Book chapter/Published conference paperConference paperpeer-review

Cite this