Spatial and motion saliency prediction method using eye tracker data for video summarization

Manoranjan Paul, Md Musfequs Salehin

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)


Video summarization is the process to extract the most significant contents of a video and to represent it in a concise form. The existing methods for video summarization could not achieve a satisfactory result for a video with camera movement, and significant illumination changes. To solve these problems, in this paper a new framework for video summarization is proposed based on Eye Tracker data as human eyes can track moving object accurately in these cases. The smooth pursuit is the state of eye movement when a user follows a moving object in a video. This motivates us to implement a new method to distinguish smooth pursuit from other type of gaze points, such as fixation and saccade. The smooth pursuit provides only the location of moving objects in a video frame; however, it does not indicate whether the located moving objects are very attractive (i.e. salient regions) to viewers or not, as well as the amount of motion of the moving objects. The amount of salient regions and object motions are two important features to measure the viewer’s attention level for determining the key frames for video summarization. To find out the most attractive objects, a new spatial saliency prediction method is also proposed by constructing a saliency map around each smooth pursuit gaze point based on human visual field, such as fovea, parafoveal, and perifovea regions. To identify the amount of object motions, the total distances between the current and the previous gaze points of viewers during smooth pursuit is measured as a motion saliency score. The motivation is that the movement of eye gaze is related to the motion of the objects during smooth pursuit. Finally, both spatial and motion saliency maps are combined to obtain an aggregated saliency score for each frame and a set of key frames are selected based on user selected or system default skimming ratio. The proposed method is implemented on Office video dataset that contains videos with camera movements and illuminatio
Original languageEnglish
Pages (from-to)1856-1867
Number of pages12
JournalIEEE Transactions on Circuits and Systems for Video Technology
Issue number6
Early online dateJun 2018
Publication statusPublished - 07 Jun 2018


Dive into the research topics of 'Spatial and motion saliency prediction method using eye tracker data for video summarization'. Together they form a unique fingerprint.

Cite this