Human pose based video compression via forward-referencing using deep learning

S. M.Ataul Karim Rajin, Manzur Murshed, Manoranjan Paul, Shyh Wei Teng, Jiangang Ma

Research output: Book chapter/Published conference paperConference paperpeer-review

1 Citation (Scopus)

Abstract

To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored 'big' surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding.

Original languageEnglish
Title of host publication2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)
Place of PublicationUnited States
PublisherIEEE
Number of pages5
ISBN (Electronic)9781665475921
ISBN (Print)9781665475938
DOIs
Publication statusE-pub ahead of print - 16 Jan 2023
Event2022 IEEE International Conference on Visual Communications and Image Processing (VCIP) - Virtual, Suzhou, China
Duration: 13 Dec 202216 Dec 2022
https://ieeexplore-ieee-org.ezproxy.csu.edu.au/xpl/conhome/10008391/proceeding (Proceedings)
https://web.archive.org/web/20221114191144/http://vcip2022.org/ (Conference website)

Publication series

Name2022 IEEE International Conference on Visual Communications and Image Processing, VCIP 2022

Conference

Conference2022 IEEE International Conference on Visual Communications and Image Processing (VCIP)
Country/TerritoryChina
CitySuzhou
Period13/12/2216/12/22
OtherThe IEEE Visual Communications and Image Processing (VCIP) Conference, sponsored by the IEEE Circuits and Systems Society, will be held in Suzhou, China, during December 13 – 16, 2022. VCIP is the oldest conference in the field and one of the flagship conferences of the IEEE CAS Visual Signal Processing and Communications. Since 1986 VCIP has served as a premier forum for the exchange of fundamental and applied research in the field of visual communications and image processing.
VCIP has a long tradition in showcasing pioneering technologies in visual communication and processing, and many landmark papers first appeared in VCIP. VCIP 2022 will carry on this tradition of VCIP in disseminating the state of art of visual communication technology, brainstorming and envisioning the future of visual communication technology and applications. The main theme would be new media, including VR, point cloud capture and playback, and new visual processing tools including deep learning for intelligence distilling in visual information pre- and post-processing such as de-blurring, super resolution, 3D understanding, and content based image enhancement. High quality papers will be recommended to TCSVT for journal extension!
Internet address

Fingerprint

Dive into the research topics of 'Human pose based video compression via forward-referencing using deep learning'. Together they form a unique fingerprint.

Cite this