Efficient Video Coding and Quality Assessment by Exploiting Human Visual Features

Pallab Podder

Research output: ThesisDoctoral Thesis

82 Downloads (Pure)

Abstract

The growing diffusions of high resolution cameras and on-going reproduction of high resolution videos are extremely increasing the data volume. To transmit or store the huge volume of data through the limited bandwidth of the video communication medium requires compression without any quality loss impact. The most recent emerging compression standard is the High Efficiency Video Coding (HEVC) which impressively reduces approximately 50% bit-rate requirement compared to its predecessor H.264. In opposition, the computational complexity of the HEVC has been increased multiple times that deprives numerous electronic devices with limited processing and computational resources to utilize different of its features in real-time. To address this limitation, a number of research works have been carried out which could successfully decrease encoding time of the HEVC reference test model (HM). However, the introduced approaches incur with the limitation of sacrificing coding quality.
The contributions of this thesis therefore focus on reducing the computational complexity of the HM, while providing relatively improved coding quality by exploiting the human visual features. Since human vision is susceptible to any motion and salient region in a video, different motion and saliency features have been organized to develop a number of novel coding frameworks to reach the goal. Experimental outcomes reveal the coding quality improvement of 0.12dB Bjontegaard Delta peak signal-to-noise ratio (BD-PSNR) while reducing 32% average encoding time of the HM for a wide range of texture sequences. By incorporating an additional motion estimation mode namely the pattern mode, and developing an independent coding framework for depth sequences from the texture sequences, the proposed procedure could also improve 0.10dB BD-PSNR with a reduction of 30% average encoding time of the HM.
Interestingly, the multiview video utilizes both texture and depth video information for a more realistic view of a scene, and the usual practice of their quality assessment is characterized by employing the reference required objective metrics such as the PSNR or structural similarity index (SSIM). However, the subjective studies could yield valuable data to evaluate the performance of objective methods towards aiming the ultimate goal of matching human perception. The most widely used subjective estimator the mean opinion score (MOS) is often biased by numerous factors which may undesirably influence the effectiveness of actual assessment. To address this limitation, a novel no-reference subjective quality assessment metric (QMET) has been developed by discovering the human eye browsing nature on videos and exploiting a number of quality correlation features. Tested results reveal that the quality evaluation carried out by the proposed QMET performs better than the MOS in terms of assessing different aspects of coded video quality for a wide range of single view video contents. For the free viewpoint video where the reference frame is not available, the QMET could also better distinguish different qualities compared to the MOS.
Therefore, the contributions in overall improvement of rate-distortion performance, computational complexity reduction and the quality assessment impact could be influential for a number of electronic devices with limited processing, storage and computational resources to use different features of the HEVC with relatively improved coding quality.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Charles Sturt University
Supervisors/Advisors
  • Paul, Manoranjan, Principal Supervisor
  • Chakraborty, Subrata, Principal Supervisor, External person
Award date06 Nov 2017
Publisher
Publication statusPublished - 06 Nov 2017

Grant Number

  • DP130103670

Fingerprint Dive into the research topics of 'Efficient Video Coding and Quality Assessment by Exploiting Human Visual Features'. Together they form a unique fingerprint.

Cite this