Grupo de Tratamiento de Imágenes

"Complexity and Quality Optimization for Multi-View plus Depth Video Coding"

Gianluca Cernigliaro

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, July 2019, "Sobresaliente".

Ph.D. thesis Director: Fernando Jaureguizar.

3D Video, Free Viewpoint TV and other three-dimensional imaging systems have represented, and still represent, the emerging trend for digital video technologies. Multi View plus Depth (MVD) is one of the most typical 3D video representations. An MVD scene is recorded from several viewpoints, capturing many different representations from a wide amount of directions. For each viewpoint, two video components are captured: the scene texture, represented as a traditional 2D video with the usual color components (RGB or similar), and the scene geometry, represented as a gray-level image, called depth map, containing the information related to the distance of the scene objects from the viewpoint. Thanks to the multiple texture and depth representations, a 3D scene can be fully reconstructed, providing to the user the perception of immersion.

As for the previous imaging technologies, given that the compression is one of the most important steps of a digital video representation pipeline, also in 3D video has risen the need of encoding efficiently the information used to represent the scene. Considering that an MVD scenario involves an increasing amount of data due to the multiple viewpoints, and also includes new information like the depth maps, the encoding techniques have evolved in order to minimize the impact of the data increasing and to adapt to the depth characteristics. The work presented in this thesis focuses on adapting the traditional compression methods based on AVC/H.264 to the MVD environment, aiming to reduce the computational load, dramatically increased by the high amount of video representations, but also to increase the efficiency of the encoding process in terms of rate-distortion, focusing on the quality of the 3D video rendered through the multiple texture and depth representations.

The ﬁrst area of research has been the reduction of the computational load of the Mode Decision (MD) stage, which is one of the most computationally expensive of the encoding process. The geometry information provided by the depth maps has been exploited and used to predict geometry and motion of the objects in the scene. On the other hand, analyzing the depth in order to have a knowledge about the motion of the scene has provided an understanding of how the motion information of texture and depth components are correlated to each other. Then, the work has focused on the reduction of the computational load of the depth maps compression, this time involving both MD and Motion Estimation (ME), exploiting the correlation between the motion of the texture and of the depth. The computational load has been considerably reduced in the compression process of both texture and depth maps, reaching up to 40% of reduction in time consumption in the compression of the texture, and up to 58% of reduction in the compression of the depth, when compared to the full search of modes and motion vector of a traditional AVC/H.264 encoder. In both cases, the quality loss has been negligible.

However, the computational load reduction has not been the only goal of the work presented in this thesis. A considerably novel area has been explored, introducing new perceptual encoding paradigms for the compression of the depth. The last part of this thesis focuses on the application of perceptual methodologies, widely exploited in traditional 2D video compression techniques, but for the compression of the depth. The depth is used only for 3D reconstruction purposes as the generation of the synthetic views, and as it is never shown to the audience, the compression artifacts would affect only the reconstructed representations. The perceptual work shown in this thesis has then focused on adapting traditional 2D perceptual compression techniques to the MVD representation, optimizing the perceptual quality of the synthetic views. The performance of the proposed perceptual techniques applied to depth compression has been evaluated using perceptual quality metrics, reaching a reduction of the bit-rate up to 13% with an improvement of up to 0.3 dB according to the Bjontergaard measurements.

News and Events