Ph.D thesis Carlos Cortés

 

Research  

 

GTI Data   

 

Open databases created and software developed by the GTI and supplemental material to papers.  

 

Databases  


SportCLIP (2025): Multi-sport dataset for text-guided video summarization.
Ficosa (2024):
The FNTVD dataset has been generated using the Ficosa's recording car.
MATDAT (2023):  More than 90K labeled images of martial arts tricking.
SEAW – DATASET (2022): 3 stereoscopic contents in 4K resolution at 30 fps.
UPM-GTI-Face dataset (2022): 11 different subjects captured in 4K, under 2 scenarios, and 2 face mask conditions.
LaSoDa (2022): 60 annotated images from soccer matches in five stadiums with different characteristics and light conditions.
PIROPO Database (2021):People in Indoor ROoms with Perspective and Omnidirectional cameras.
EVENT-CLASS (2021): High-quality 360-degree videos in the context of tele-education.
Parking Lot Occupancy Database (2020)
Nighttime Vehicle Detection database (NVD) (2019)
Hand gesture dataset (2019): Multi-modal Leap Motion dataset for Hand Gesture Recognition.
ViCoCoS-3D (2016): VideoConference Common Scenes in 3D.
LASIESTA database (2016): More than 20 sequences to test moving object detection and tracking algorithms.
Hand gesture database (2015): Hand-gesture database composed by high-resolution color images acquired with the Senz3D sensor.
HRRFaceD database (2014):Face database composed by high resolution images acquired with Microsoft Kinect 2 (second generation).
Lab database (2012): Set of 6 sequences to test moving object detection strategies.
Vehicle image database (2012)More than 7000 images of vehicles and roads.           

 

Software  


Empowering Computer Vision in Higher Education(2024)A Novel Tool for Enhancing Video Coding Comprehension.
Engaging students in audiovisual coding through interactive MATLAB GUIs (2024)

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem (2023)

Solving Routing Problems for Multiple Cooperative Unmanned Aerial Vehicles using Transformer Networks (2023)
Vision Transformers and Traditional Convolutional Neural Networks for Face Recognition Tasks (2023)
Faster GSAC-DNN (2023): A Deep Learning Approach to Nighttime Vehicle Detection Using a Fast Grid of Spatial Aware Classifiers.
SETForSeQ (2020): Subjective Evaluation Tool for Foreground Segmentation Quality. 
SMV Player for Oculus Rift (2016)

Bag-D3P (2016): 
Face recognition using depth information. 
TSLAB (2015): 
Tool for Semiautomatic LABeling.   
 

   

Supplementary material  


Soccer line mark segmentation and classification with stochastic watershed transform (2022)
A fully automatic method for segmentation of soccer playing fields (2022)
Grass band detection in soccer images for improved image registration (2022)
Evaluating the Influence of the HMD, Usability, and Fatigue in 360VR Video Quality Assessments (2020)
Automatic soccer field of play registration (2020)   
Augmented reality tool for the situational awareness improvement of UAV operators (2017)
Detection of static moving objects using multiple nonparametric background-foreground models on a Finite State Machine (2015)
Real-time nonparametric background subtraction with tracking-based foreground update (2015)  
Camera localization using trajectories and maps (2014)

 

                                                                                                                                                                                                                             
 
                                                                   
 
                                                                                                                                                             
 
      

 

 

"Interaction in Social Extended Reality: A Quality of Experience Approach" 

Carlos Cortés

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, June 2024, "Cum Laude".

Ph.D. Thesis Advisors: Narciso García and Pablo Pérez.

 

TesisCarlosC2

 

The rise of immersive technologies has led to an increase in the number of use cases that adapt this type of technology within the telecommunications area. Some examples are: industrial training, multimedia content consumption and tele-training. Among all the immersive technologies, eXtended Reality through the use of Head-Mounted Displays (HMD) is the one that focuses the majority of current developments. Specifically, the Social XR paradigm frames the use of immersive technologies in a multi-user or social context. Among the decisive factors for using immersive technology in communications use cases, two stand out: the possibility of making the user believe that they has been transported to another place (sensation of presence) and the possibility of increasing interactions by allowing displacements through space (6 degrees of freedom) as well as the possibility of interacting in a more natural way. Such improvements are ultimately improvements in user experience (UX). Therefore, UX evaluation is crucial for effective XR development. In a telecommunications context, this is known as quality of experience (QoE) evaluation.

In the initial stages of the thesis development, the focus was primarily on exploring possible areas of scientific contribution. The first significant area that emerged was the proposal of a methodology for evaluating the QoE of immersive environments based on 360 video. To this end, an inter-laboratory experiment was conducted within the video quality expert group (VQEG) of the International Telecommunications Union (ITU). As a result of this experiment, the ITU-T P.919 Recommendation was published.

As the thesis progressed, another key area of exploration was the development and evaluation of natural user interfaces (NUI) in the context of industrial training. Within a public-private partnership, we developed a training environment for fiber optic review with specific object manipulation requirements. In this section of the thesis, NUI-based manipulation solutions with subjective evaluation by subject matter experts are presented. Thanks to these contributions, we have been able to confirm that such natural interfaces allow the development of training that reduce cost and environmental impact while maintaining high user satisfaction values.

As we performed interaction development for Social XR, we identified that delay appeared to be a key element in guaranteeing QoE. Therefore, the third area of scientific contribution focused on investigating the impact of latency in different processing loops within the Social XR domain. In this sense the thesis presents two major contributions, a first contribution that focuses on the study of the different delays perceptible by users and how these affect them differently. Within this same contribution, a processing framework common to different existing Social XR systems is presented. Finally, a state of the art of different studies that identify allowable latencies in different use cases involving XR communication is presented. Using these values, a QoE prediction model adapted from an ITU recommendation is presented in order to be flexible to new use cases.


The second major contribution presents three novel QoE studies investigating the impact of delays on: environment updates, self-view perception, and video conferencing within Social XR environments.
This doctoral thesis has significantly advanced our understanding of immersive video-based environments. We can now effectively assess the QoE within these environments using novel methods. Furthermore, the thesis explores the development of natural interfaces for interaction in XR, allowing us to evaluate XR interaction environments from a QoE perspective. This includes pinpointing the impact and location of delays within Social XR systems. By understanding how different delay values influence UX for various use cases, we can establish acceptable delay thresholds for optimal QoE in video-based Social XR.



TesisCarlosC1