Visual Attention Guided Adaptive Quantization for x265 using Deep Learning

Detta är en Master-uppsats från KTH/Skolan för elektroteknik och datavetenskap (EECS)

Sammanfattning: The video on demand streaming is raising drastically in popularity, bringing new challenges to the video coding field. There is a need for new video coding techniques that improve performance and reduce the bitrates. One of the most promising areas of research is perceptual video coding where attributes of the human visual system are considered to minimize visual redundancy. The visual attention only makes it possible for humans to focus on a smaller region at the time, which is led by different cues, and with deep neural networks it has become possible to create high-accuracy models of this. The purpose of this study is therefore to investigate how adaptive quantization (AQ) based on a deep visual attention model can be used to improve the subjective video quality for low bitrates. A deep visual attention model was integrated into the encoder x265 to control how the bits are distributed on frame level by adaptively setting the quantization parameter. The effect on the subjective video quality was evaluated through A/B testing where the solution was compared to one of the standard methods for AQ in x265. The results show that the ROI-based AQ was perceived to be of better quality in one out of ten cases. The results can partly be explained by certain methodological choices, but also highlights a need for more research on how to make use of visual attention modeling in more complex real-world streaming scenarios to make streaming content more accessible and reduce bitrates.

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)