Abstract
Visual speech recognition (VSR) is a challenging task. It has many applications such as facilitating speech recognition when the acoustic data is noisy or missing, assisting hearing impaired people, etc. Modern VSR systems require a large amount of data to achieve a good performance. Popular VSR datasets are mostly available for the English language and none in Bengali. In this paper, we present a large-scale Bengali audio-visual dataset, named "BenAV". To the best of our knowledge, BenAV is the first publicly available large-scale dataset in the Bengali language. BenAV contains a lexicon of 50 words from 128 speakers with a total number of 26,300 utterances. We have also applied three existing deep learning based VSR models to provide a baseline performance of our BenAV dataset. We run extensive experiments in two different configurations of the dataset to study the robustness of those models and achieved 98.70% and 82.5% accuracy, respectively. We believe that this research provides a basis to develop Bengali lip reading systems and opens the doors to conduct further research on this topic.
Original language | English |
---|---|
Title of host publication | The 28th International Conference on Neural Information Processing (ICONIP2021) |
Publisher | Springer |
Pages | 526-535 |
Number of pages | 10 |
Volume | 13109 |
ISBN (Electronic) | 9783030922702 |
ISBN (Print) | 9783030922696 |
DOIs | |
Publication status | E-pub ahead of print - 07 Dec 2021 |
Event | The 28th International Conference on Neural Information Processing: ICONIP 2021 - Online Duration: 08 Dec 2021 → 12 Dec 2021 https://iconip2021.apnns.org/ |
Publication series
Name | lecture note of computer science (LNCS) |
---|---|
Publisher | Springer, Cham |
Conference
Conference | The 28th International Conference on Neural Information Processing |
---|---|
Period | 08/12/21 → 12/12/21 |
Internet address |