The transmission of the entire video and audio sequences over an internal or external network during the implementation of audio-visual recognition over internet protocol is inefficient especially when only selected data out of the entire video and audio sequences are actually used for the recognition process. Hence, in this paper, we propose an efficient method of implementing audio-visual recognition over internet protocol whereby only the extracted audio-visual features are transmitted over internet protocol. To extract the robust features from the video sequence, a multiband curvelet-based technique is employed at the client whereas a late multi-modal fusion scheme using RBF neural network is employed at the server to perform the recognition across both modalities. The proposed audio-visual recognition system is implemented on several standard audio-visual databases to showcase the efficiency of the system. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.
|Number of pages||7|
|Journal||Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering|
|Publication status||Published - 2012|