Ph.D thesis Tomás Mantecón

 

Research  

 

GTI Data   

 

Open databases created and software developed by the GTI and supplemental material to papers.  

 

Databases  


SportCLIP (2025): Multi-sport dataset for text-guided video summarization.
Ficosa (2024):
The FNTVD dataset has been generated using the Ficosa's recording car.
MATDAT (2023):  More than 90K labeled images of martial arts tricking.
SEAW – DATASET (2022): 3 stereoscopic contents in 4K resolution at 30 fps.
UPM-GTI-Face dataset (2022): 11 different subjects captured in 4K, under 2 scenarios, and 2 face mask conditions.
LaSoDa (2022): 60 annotated images from soccer matches in five stadiums with different characteristics and light conditions.
PIROPO Database (2021):People in Indoor ROoms with Perspective and Omnidirectional cameras.
EVENT-CLASS (2021): High-quality 360-degree videos in the context of tele-education.
Parking Lot Occupancy Database (2020)
Nighttime Vehicle Detection database (NVD) (2019)
Hand gesture dataset (2019): Multi-modal Leap Motion dataset for Hand Gesture Recognition.
ViCoCoS-3D (2016): VideoConference Common Scenes in 3D.
LASIESTA database (2016): More than 20 sequences to test moving object detection and tracking algorithms.
Hand gesture database (2015): Hand-gesture database composed by high-resolution color images acquired with the Senz3D sensor.
HRRFaceD database (2014):Face database composed by high resolution images acquired with Microsoft Kinect 2 (second generation).
Lab database (2012): Set of 6 sequences to test moving object detection strategies.
Vehicle image database (2012)More than 7000 images of vehicles and roads.           

 

Software  


Empowering Computer Vision in Higher Education(2024)A Novel Tool for Enhancing Video Coding Comprehension.
Engaging students in audiovisual coding through interactive MATLAB GUIs (2024)

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem (2023)

Solving Routing Problems for Multiple Cooperative Unmanned Aerial Vehicles using Transformer Networks (2023)
Vision Transformers and Traditional Convolutional Neural Networks for Face Recognition Tasks (2023)
Faster GSAC-DNN (2023): A Deep Learning Approach to Nighttime Vehicle Detection Using a Fast Grid of Spatial Aware Classifiers.
SETForSeQ (2020): Subjective Evaluation Tool for Foreground Segmentation Quality. 
SMV Player for Oculus Rift (2016)

Bag-D3P (2016): 
Face recognition using depth information. 
TSLAB (2015): 
Tool for Semiautomatic LABeling.   
 

   

Supplementary material  


Soccer line mark segmentation and classification with stochastic watershed transform (2022)
A fully automatic method for segmentation of soccer playing fields (2022)
Grass band detection in soccer images for improved image registration (2022)
Evaluating the Influence of the HMD, Usability, and Fatigue in 360VR Video Quality Assessments (2020)
Automatic soccer field of play registration (2020)   
Augmented reality tool for the situational awareness improvement of UAV operators (2017)
Detection of static moving objects using multiple nonparametric background-foreground models on a Finite State Machine (2015)
Real-time nonparametric background subtraction with tracking-based foreground update (2015)  
Camera localization using trajectories and maps (2014)

 

                                                                                                                                                                                                                             
 
                                                                   
 
                                                                                                                                                             
 
      

 

 

"Advanced face and gesture recognition for visual HMI" 

Tomás Mantecón

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, November 2018, "Cum Laude".

Ph.D. thesis Directors: Fernando Jaureguizar Núñez y Carlos Roberto del Blanco Adán.

In the last few years, many solutions have been proposed to allow a more natural and intuitive human-machine interaction thanks to the advent of new devices that improve the quality of interaction of keyboards and mouses. Different systems have been designed that make use of different human parts to offer a human-machine interaction as similar as possible to the interaction between humans, using hands or voice. Of special interest are the systems based on hand gestures and visual information, since they are non-intrusive (no sensor is wore by the user) unlike other alternatives as inertial sensors. On the other hand, new authentication systems for these mechanisms of interaction are required in substitution of passwords introduced by keyboard, such as fingerprint recognition, iris identification, or face recognition. The increase of the number of cameras in surveillance environments and embedded in electronic devices (mobiles, tablets, TVs, etc.), has awakened interest in face recognition system based on visual imagery, since no additional sensor is required for the authentication.

This thesis proposes new solutions to solve both face and hand gesture recognition using visual information. With respect to the face recognition systems, three solutions based on the design of feature descriptors adapted to the characteristics of the human face using high-resolution depth-images have been proposed. They allow the face recognition from different perspectives, unlike most of existing works that only accept frontal faces. Depth information makes more difficult the identity theft as a 3D model of the face would be needed for the identification. Two new databases have been created, and made publicly available, to properly evaluate the system, since no high-resolution image databases of faces are available.

With respect to hand gesture recognition, novel solutions are proposed to recognize both static and dynamic hand gestures, which include new descriptors specially designed for depth information that are highly discriminative. These descriptors have been combined with dimensionality reduction techniques to reduce the memory requirements and favor the operation in real time. The proposed systems have been integrated within an Airbus demonstrator as part of the project SAVIER. The demonstrator implements a hand gesture human-machine interaction for a ground control station that commands unmanned aerial vehicles. New databases have been created, and made publicly available, composed by depth and infrared imagery to properly evaluate the system performance. Download here