Abstract
Visual speech recognition (VSR) is a challenging task. It has many applications such as facilitating speech recognition when the acoustic data is noisy or missing, assisting hearing impaired people, etc. Modern VSR systems require a large amount of data to achieve a good performance. Popular VSR datasets are mostly available for the English language and none in Bengali. In this paper, we present a large-scale Bengali audio-visual dataset, named "BenAV". To the best of our knowledge, BenAV is the first publicly available large-scale dataset in the Bengali language. BenAV contains a lexicon of 50 words from 128 speakers with a total number of 26,300 utterances. We have also applied three existing deep learning based VSR models to provide a baseline performance of our BenAV dataset. We run extensive experiments in two different configurations of the dataset to study the robustness of those models and achieved 98.70% and 82.5% accuracy, respectively. We believe that this research provides a basis to develop Bengali lip reading systems and opens the doors to conduct further research on this topic.
Original language | English |
---|---|
Title of host publication | Neural Information Processing |
Subtitle of host publication | 28th International Conference, ICONIP 2021, Proceedings, Part II |
Editors | Teddy Mantoro, Minho Lee, Media Anugerah Ayu, Kok Wai Wong, Achmad Nizar Hidayanto |
Place of Publication | Cham, Switzerland |
Publisher | Springer |
Pages | 526-535 |
Number of pages | 10 |
Volume | 13109 |
ISBN (Electronic) | 9783030922702 |
ISBN (Print) | 9783030922696 |
DOIs | |
Publication status | E-pub ahead of print - 07 Dec 2021 |
Event | The 28th International Conference on Neural Information Processing (ICONIP 2021): ICONIP 2021 - Online, Bali, Indonesia Duration: 08 Dec 2021 → 12 Dec 2021 https://web.archive.org/web/20220108032705/https://iconip2021.apnns.org/ (Conference website) |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Publisher | Springer, Cham |
Volume | 13109 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | The 28th International Conference on Neural Information Processing (ICONIP 2021) |
---|---|
Country/Territory | Indonesia |
City | Bali |
Period | 08/12/21 → 12/12/21 |
Other | The 28th International Conference on Neural Information Processing (ICONIP2021) aims to provide a leading international forum for researchers, scientists, and industry professionals who are working in neuroscience, neural networks, deep learning, and related fields to share their new ideas, progresses and achievement, through its regular sessions, special sessions, tutorials, and workshops. ICONIP 2021 will be held in online mode during December 8-12, 2021. |
Internet address |
|
Fingerprint
Dive into the research topics of 'BenAV: A Bengali audio-visual corpus for visual speech recognition'. Together they form a unique fingerprint.Datasets
-
Bengali Audio-Visual Corpus for Visual Speech Recognition
Pondit, A. (Creator), Rukon, M. E. A. (Creator), Das, A. (Creator) & Kabir, A. (Supervisor), Springer, 10 Mar 2021
DOI: 10.1007/978-3-030-92270-2_45, https://github.com/AnikNicks/BenAV-A-New-Bengali-Audio-Visual-Corpus
Dataset