Ph.D thesis César Díaz

GTI Data

Open databases created and software developed by the GTI and supplemental material to papers.

Databases

SportCLIP (2025): Multi-sport dataset for text-guided video summarization.
Ficosa (2024): The FNTVD dataset has been generated using the Ficosa's recording car.
MATDAT (2023): More than 90K labeled images of martial arts tricking.
SEAW – DATASET (2022): 3 stereoscopic contents in 4K resolution at 30 fps.
UPM-GTI-Face dataset (2022): 11 different subjects captured in 4K, under 2 scenarios, and 2 face mask conditions.
LaSoDa (2022): 60 annotated images from soccer matches in five stadiums with different characteristics and light conditions.
PIROPO Database (2021):People in Indoor ROoms with Perspective and Omnidirectional cameras.
EVENT-CLASS (2021): High-quality 360-degree videos in the context of tele-education.
Parking Lot Occupancy Database (2020)
Nighttime Vehicle Detection database (NVD) (2019)
Hand gesture dataset (2019): Multi-modal Leap Motion dataset for Hand Gesture Recognition.
ViCoCoS-3D (2016): VideoConference Common Scenes in 3D.
LASIESTA database (2016): More than 20 sequences to test moving object detection and tracking algorithms.
Hand gesture database (2015): Hand-gesture database composed by high-resolution color images acquired with the Senz3D sensor.
HRRFaceD database (2014):Face database composed by high resolution images acquired with Microsoft Kinect 2 (second generation).
Lab database (2012): Set of 6 sequences to test moving object detection strategies.
Vehicle image database (2012): More than 7000 images of vehicles and roads.

Software

Empowering Computer Vision in Higher Education(2024): A Novel Tool for Enhancing Video Coding Comprehension.
Engaging students in audiovisual coding through interactive MATLAB GUIs (2024)
TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem (2023)
Solving Routing Problems for Multiple Cooperative Unmanned Aerial Vehicles using Transformer Networks (2023)
Vision Transformers and Traditional Convolutional Neural Networks for Face Recognition Tasks (2023)
Faster GSAC-DNN (2023): A Deep Learning Approach to Nighttime Vehicle Detection Using a Fast Grid of Spatial Aware Classifiers.
SETForSeQ (2020): Subjective Evaluation Tool for Foreground Segmentation Quality.
SMV Player for Oculus Rift (2016)
Bag-D3P (2016): Face recognition using depth information.
TSLAB (2015): Tool for Semiautomatic LABeling.

Supplementary material

Soccer line mark segmentation and classification with stochastic watershed transform (2022)
A fully automatic method for segmentation of soccer playing fields (2022)
Grass band detection in soccer images for improved image registration (2022)
Evaluating the Influence of the HMD, Usability, and Fatigue in 360VR Video Quality Assessments (2020)
Automatic soccer field of play registration (2020)
Augmented reality tool for the situational awareness improvement of UAV operators (2017)
Detection of static moving objects using multiple nonparametric background-foreground models on a Finite State Machine (2015)
Real-time nonparametric background subtraction with tracking-based foreground update (2015)
Camera localization using trajectories and maps (2014)

"Design and Optimization of Protection Strategies Based on the Pro-MPEG COP3 Codes for Time-sensitive Multimedia Streams"

César Díaz

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, Jun 2017, "Cum Laude".

Ph.D. thesis Director: Julián Cabrera Quesada.

In the recent years, an ever-increasing portion of the services related to video data transmission are supplied through IP networks, as they present numerous advantageous characteristics: ubiquity, easy service integration and synchronization, a far higher possibility for interactivity and many more. However, IP networks are not particularly well suited for video distribution. Thus, QoS management mechanisms are commonly employed to increase their reliability, particularly in server-driven, time-sensitive transmission scenarios. In these scenarios, the Pro-MPEG COP3 codes are commonly used, due to their capability to deal with packet loss bursts and to their very low complexity. However, their performance decreases with the packet loss rate. On the other hand, protection strategies that are aware of the uneven relevance of the different packets in the video data stream typically provide better performance, since they are capable of better distributing and using the available protection resources. However, their complexity usually exceeds that of the Pro-MPEG COP3 codes.

This thesis examines strategies to improve the performance of the Pro-MPEG COP3 codes without compromising their simplicity and preserving their encoding and decoding procedures. These strategies aim at a generalization and optimization of these codes, through, among other ways, enabling that they can be used to protect unequally the data packet stream.

First, we propose two extensions to the standard Pro-MPEG COP3 codes. The first one is an equal error protection (EEP) scheme that allows the use of up to three interleaving depths to boost further channel adaptation and error decorrelation. The second one is a low-complexity unequal error protection (UEP) framework that allows to allocate unequally the available protection resources among different sets of data packets regarding their relevance. To do that, the proposed approach enables the use of not only one, as in the standard Pro-MPEG COP3 codes, but a number of matrices of dissimilar dimensions per protection block. The use of this extension allows to apply uneven code rates to unevenly important data units.

Then, we present a procedure for optimizing the selection of suitable configurations to protect data when using the proposed UEP framework. This optimization strategy is based on the hybridization of two very well known metaheuristics: simulated annealing (SA) and tabu search (TS), whose core procedures are modified to fit the characteristics of the considered scenario, and so find near-optimal solutions when strict time restrictions apply. This strategy takes as input the importance of the video data packets and the available resources, and finds in real time near-optimal configurations to protect them, in terms of number of matrices to be used and their dimensions.

Finally, we introduce a full-context-aware, yet lightweight distortion model. The proposed approach estimates the contribution of any given video packet to the resulting overall expected distortion of the sequence, and thus, its actual relevance, by taking into account upper-level characteristics of the encoded video streams and the behavior of the communication channel. This contribution is estimated considering not only the image degradation that its loss will potentially introduce in the sequence and the likelihood of this event occurring, but also the effect that the potential loss of data in reference frames/slices have on the actual importance of that packet. Download here

Research

Projects

Publications

GTI Blog

GTI Data

Quality of Experience tests