TY - JOUR
T1 - Adaptive fusion of human visual sensitive features for surveillance video summarization
AU - Salehin, Md Musfequs
AU - Paul, Manoranjan
N1 - Includes bibliographical references.
PY - 2017/5/1
Y1 - 2017/5/1
N2 - Surveillance video cameras capture large amounts of continuous video streams every day. To analyze or investigate any significant events, it is a laborious and boring job to identify these events from the huge video data if it is done manually. Existing approaches sometimes neglect key frames with significant visual contents and/or select some unimportant frames with low/no activity. To solve this problem, in this paper, a video summarization technique is proposed by combining three multimodal human visual sensitive features, such as foreground objects, motion information, and visual saliency. In a video stream, foreground objects are one of the most important pieces of a video as they contain more detailed information and play a major role in important events. Moreover, motion is another stimulus of a video that significantly attracts human visual attention. To obtain this, motion information is calculated in the spatial domain as well as the frequency domain. Spatial motion information can select object motion accurately; however, it is sensitive to illumination changes. On the other hand, frequency motion information is robust to illumination change, although it is easily affected by noise. Therefore, motion information in both the spatial and the frequency domains is employed. Furthermore, the visual attention cue is a sensitive feature to measure the indication of a user's attraction label for determining key frames. As these features individually cannot perform very well, they are combined to obtain better results. For this purpose, an adaptive linear weighted fusion scheme is proposed to combine the features to rank video frames for summarization. Experimental results reveal that the proposed method outperforms the state-of-the-art methods.
AB - Surveillance video cameras capture large amounts of continuous video streams every day. To analyze or investigate any significant events, it is a laborious and boring job to identify these events from the huge video data if it is done manually. Existing approaches sometimes neglect key frames with significant visual contents and/or select some unimportant frames with low/no activity. To solve this problem, in this paper, a video summarization technique is proposed by combining three multimodal human visual sensitive features, such as foreground objects, motion information, and visual saliency. In a video stream, foreground objects are one of the most important pieces of a video as they contain more detailed information and play a major role in important events. Moreover, motion is another stimulus of a video that significantly attracts human visual attention. To obtain this, motion information is calculated in the spatial domain as well as the frequency domain. Spatial motion information can select object motion accurately; however, it is sensitive to illumination changes. On the other hand, frequency motion information is robust to illumination change, although it is easily affected by noise. Therefore, motion information in both the spatial and the frequency domains is employed. Furthermore, the visual attention cue is a sensitive feature to measure the indication of a user's attraction label for determining key frames. As these features individually cannot perform very well, they are combined to obtain better results. For this purpose, an adaptive linear weighted fusion scheme is proposed to combine the features to rank video frames for summarization. Experimental results reveal that the proposed method outperforms the state-of-the-art methods.
UR - http://www.scopus.com/inward/record.url?scp=85018373568&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018373568&partnerID=8YFLogxK
U2 - 10.1364/JOSAA.34.000814
DO - 10.1364/JOSAA.34.000814
M3 - Article
C2 - 28463326
AN - SCOPUS:85018373568
SN - 1084-7529
VL - 34
SP - 814
EP - 826
JO - Journal of the Optical Society of America A: Optics and Image Science, and Vision
JF - Journal of the Optical Society of America A: Optics and Image Science, and Vision
IS - 5
ER -