Ph.D thesis José Luis Herrera

 

Research  

 

GTI Data   

 

Open databases created and software developed by the GTI and supplemental material to papers.  

 

Databases  


SportCLIP (2025): Multi-sport dataset for text-guided video summarization.
Ficosa (2024):
The FNTVD dataset has been generated using the Ficosa's recording car.
MATDAT (2023):  More than 90K labeled images of martial arts tricking.
SEAW – DATASET (2022): 3 stereoscopic contents in 4K resolution at 30 fps.
UPM-GTI-Face dataset (2022): 11 different subjects captured in 4K, under 2 scenarios, and 2 face mask conditions.
LaSoDa (2022): 60 annotated images from soccer matches in five stadiums with different characteristics and light conditions.
PIROPO Database (2021):People in Indoor ROoms with Perspective and Omnidirectional cameras.
EVENT-CLASS (2021): High-quality 360-degree videos in the context of tele-education.
Parking Lot Occupancy Database (2020)
Nighttime Vehicle Detection database (NVD) (2019)
Hand gesture dataset (2019): Multi-modal Leap Motion dataset for Hand Gesture Recognition.
ViCoCoS-3D (2016): VideoConference Common Scenes in 3D.
LASIESTA database (2016): More than 20 sequences to test moving object detection and tracking algorithms.
Hand gesture database (2015): Hand-gesture database composed by high-resolution color images acquired with the Senz3D sensor.
HRRFaceD database (2014):Face database composed by high resolution images acquired with Microsoft Kinect 2 (second generation).
Lab database (2012): Set of 6 sequences to test moving object detection strategies.
Vehicle image database (2012)More than 7000 images of vehicles and roads.           

 

Software  


Empowering Computer Vision in Higher Education(2024)A Novel Tool for Enhancing Video Coding Comprehension.
Engaging students in audiovisual coding through interactive MATLAB GUIs (2024)

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem (2023)

Solving Routing Problems for Multiple Cooperative Unmanned Aerial Vehicles using Transformer Networks (2023)
Vision Transformers and Traditional Convolutional Neural Networks for Face Recognition Tasks (2023)
Faster GSAC-DNN (2023): A Deep Learning Approach to Nighttime Vehicle Detection Using a Fast Grid of Spatial Aware Classifiers.
SETForSeQ (2020): Subjective Evaluation Tool for Foreground Segmentation Quality. 
SMV Player for Oculus Rift (2016)

Bag-D3P (2016): 
Face recognition using depth information. 
TSLAB (2015): 
Tool for Semiautomatic LABeling.   
 

   

Supplementary material  


Soccer line mark segmentation and classification with stochastic watershed transform (2022)
A fully automatic method for segmentation of soccer playing fields (2022)
Grass band detection in soccer images for improved image registration (2022)
Evaluating the Influence of the HMD, Usability, and Fatigue in 360VR Video Quality Assessments (2020)
Automatic soccer field of play registration (2020)   
Augmented reality tool for the situational awareness improvement of UAV operators (2017)
Detection of static moving objects using multiple nonparametric background-foreground models on a Finite State Machine (2015)
Real-time nonparametric background subtraction with tracking-based foreground update (2015)  
Camera localization using trajectories and maps (2014)

 

                                                                                                                                                                                                                             
 
                                                                   
 
                                                                                                                                                             
 
      

 

 

"2D to 3D Image and Video Conversion using Machine Learning" 

José Luis Herrera

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, March 2021, "Sobresaliente".

Ph.D. Thesis Advisors: Narciso García and Carlos Roberto del Blanco.

Despite the concept of three-dimensional (3D) image and video was introduced many years ago, there has been recently a significant increase in the number of available 3D displays and players in the last years. Nevertheless, the amount of 3D content has not increased in the same magnitude, creating a gap between 3D offer and demand. To reduce this difference, many algorithms have appeared that perform 2D-to-3D image and video conversion. While many of this techniques usually required several images from the same scene to perform the conversion, the most recent family of these techniques, which corresponds to machine learning- based algorithms, is not restricted by this limitation computing the 3D image with a single view of the scene. Machine learning-based methods require the use of databases of 2D and 3D images to learn how to perform this conversion. Since the number of available datasets has recently increased significantly, these algorithms have become very popular. However, the quality achieved by the current 2D to 3D conversion techniques is far to be fully satisfying and they need to be improved in order be used for producing good quality 3D content.

This thesis proposes two systems for 2D-to-3D conversion, one for image and another one for video, belonging to the machine learning family of methods. With respect to the image conversion system, a new approach is proposed that makes the algorithm more robust and adaptive to different types of scenarios by using a combination of feature descriptors. At the same time, the proposed clustering of the dataset makes the solution faster and more efficient to deal with large datasets. The system also learns how to automatically adapt the value of the different parameters involved in the conversion, resulting in a fully automatic solution.

With respect to video conversion, the learning based approach for images is extended to video sequences. The algorithm is divided in three main parts. In the first one, a depth estimation of the background is computed using the previous presented approach. Then, the foreground is segmented analyzing the optical flow to manage the different objects individually. Finally, the depth estimation of the background is combined with the foreground information and filtered to obtain the final depth estimation.

The developed algorithms have been tested with different publicly available datasets of 3D images and video sequences.

 

JoseLuisTesis2  JoseLuisTesis3