TY - JOUR
T1 - Multimodal marvels of deep learning in medical diagnosis using image, speech, and text
T2 - A comprehensive review of COVID-19 detection
AU - Islam, Md Shofiqul
AU - Hasan, Khondokar Fida
AU - Shajeeb, Hasibul Hossain
AU - Rana, Humayan Kabir
AU - Rahman, Md Saifur
AU - Hasan, Md Munirul
AU - Azad, A. K.M.
AU - Abdullah, Ibrahim
AU - Moni, Mohammad Ali
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/1
Y1 - 2025/1
N2 - This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.
AB - This study presents a comprehensive review of the potential of multimodal deep learning (DL) in medical diagnosis, using COVID-19 as a case example. Motivated by the success of artificial intelligence applications during the COVID-19 pandemic, this research aims to uncover the capabilities of DL in disease screening, prediction, and classification, and to derive insights that enhance the resilience, sustainability, and inclusiveness of science, technology, and innovation systems. Adopting a systematic approach, we investigate the fundamental methodologies, data sources, preprocessing steps, and challenges encountered in various studies and implementations. We explore the architecture of deep learning models, emphasising their data-specific structures and underlying algorithms. Subsequently, we compare different deep learning strategies utilised in COVID-19 analysis, evaluating them based on methodology, data, performance, and prerequisites for future research. By examining diverse data types and diagnostic modalities, this research contributes to scientific understanding and knowledge of the multimodal application of DL and its effectiveness in diagnosis. We have implemented and analysed 11 deep learning models using COVID-19 image, text, and speech (ie, cough) data. Our analysis revealed that the MobileNet model achieved the highest accuracy of 99.97% for COVID-19 image data and 93.73% for speech data (i.e., cough). However, the BiGRU model demonstrated superior performance in COVID-19 text classification with an accuracy of 99.89%. The broader implications of this research suggest potential benefits for other domains and disciplines that could leverage deep learning techniques for image, text, and speech analysis.
KW - Deep learning
KW - Image processing
KW - Medical diagnosis
KW - Speech
KW - Text
UR - http://www.scopus.com/inward/record.url?scp=85216630223&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85216630223&partnerID=8YFLogxK
U2 - 10.1016/j.aiopen.2025.01.003
DO - 10.1016/j.aiopen.2025.01.003
M3 - Review article
SN - 2666-6510
VL - 6
SP - 12
EP - 44
JO - AI Open
JF - AI Open
ER -