Ph.D thesis Ana Maqueda

GTI Data

Open databases created and software developed by the GTI and supplemental material to papers.

Databases

SportCLIP (2025): Multi-sport dataset for text-guided video summarization.
Ficosa (2024): The FNTVD dataset has been generated using the Ficosa's recording car.
MATDAT (2023): More than 90K labeled images of martial arts tricking.
SEAW – DATASET (2022): 3 stereoscopic contents in 4K resolution at 30 fps.
UPM-GTI-Face dataset (2022): 11 different subjects captured in 4K, under 2 scenarios, and 2 face mask conditions.
LaSoDa (2022): 60 annotated images from soccer matches in five stadiums with different characteristics and light conditions.
PIROPO Database (2021):People in Indoor ROoms with Perspective and Omnidirectional cameras.
EVENT-CLASS (2021): High-quality 360-degree videos in the context of tele-education.
Parking Lot Occupancy Database (2020)
Nighttime Vehicle Detection database (NVD) (2019)
Hand gesture dataset (2019): Multi-modal Leap Motion dataset for Hand Gesture Recognition.
ViCoCoS-3D (2016): VideoConference Common Scenes in 3D.
LASIESTA database (2016): More than 20 sequences to test moving object detection and tracking algorithms.
Hand gesture database (2015): Hand-gesture database composed by high-resolution color images acquired with the Senz3D sensor.
HRRFaceD database (2014):Face database composed by high resolution images acquired with Microsoft Kinect 2 (second generation).
Lab database (2012): Set of 6 sequences to test moving object detection strategies.
Vehicle image database (2012): More than 7000 images of vehicles and roads.

Software

Empowering Computer Vision in Higher Education(2024): A Novel Tool for Enhancing Video Coding Comprehension.
Engaging students in audiovisual coding through interactive MATLAB GUIs (2024)
TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem (2023)
Solving Routing Problems for Multiple Cooperative Unmanned Aerial Vehicles using Transformer Networks (2023)
Vision Transformers and Traditional Convolutional Neural Networks for Face Recognition Tasks (2023)
Faster GSAC-DNN (2023): A Deep Learning Approach to Nighttime Vehicle Detection Using a Fast Grid of Spatial Aware Classifiers.
SETForSeQ (2020): Subjective Evaluation Tool for Foreground Segmentation Quality.
SMV Player for Oculus Rift (2016)
Bag-D3P (2016): Face recognition using depth information.
TSLAB (2015): Tool for Semiautomatic LABeling.

Supplementary material

Soccer line mark segmentation and classification with stochastic watershed transform (2022)
A fully automatic method for segmentation of soccer playing fields (2022)
Grass band detection in soccer images for improved image registration (2022)
Evaluating the Influence of the HMD, Usability, and Fatigue in 360VR Video Quality Assessments (2020)
Automatic soccer field of play registration (2020)
Augmented reality tool for the situational awareness improvement of UAV operators (2017)
Detection of static moving objects using multiple nonparametric background-foreground models on a Finite State Machine (2015)
Real-time nonparametric background subtraction with tracking-based foreground update (2015)
Camera localization using trajectories and maps (2014)

"From traditional multi-stage machine learning to end-to-end deep learning for computer vision applications"

Ana Maqueda

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, Sept 2018, "Cum Laude".

Ph.D. thesis Directors: Narciso García Santos y Carlos Roberto del Blanco Adán.

The renaissance of Deep Neural Networks in the era of big data, along with the use of high performance hardware that reduces computational time, have changed the paradigm of machine learning, specially in the field of computer vision. Whereas systems based on traditional machine learning rely on multiple stages and hand-crafted features to get the insight of the problem, Convolutional Neural Networks automatically learn the features that maximize the learning accuracy directly from raw images in an end-to-end manner. The purpose of this dissertation is to show the gap between traditional multi-stage learning systems and end-to-end deep learning systems, addressing different applications for a qualitative comparison.

First, an expert-knowledge recognition system has been developed to deal with dynamic hand gestures. The key aspects of this system are hand-crafted image and video descriptors, and also the pipeline of the whole system. These descriptors have been designed to face difficulties of vision-based approaches such as illumination changes, intra-class and inter-class variances, and multiple scales. The design of the multiple stages of the system solve intermediate steps that are necessary to successfully apply the previous descriptors. Since the proposed hand-gesture recognition system has been designed for a human-computer interface, it comprises detection and tracking stages to localize the object of interest, and a recognition stage to categorize the performed gesture.

Second, DL approaches have been proposed for different computer vision applications. Research efforts have focused on building these types of end-to-end systems to face the weaknesses present in traditional learning. Unlike previous approach, they do not need multiple stages to perform the target task, nor feature engineering. Their architecture designs rely on the task to be solved, its complexity, and the available amount of data. These guidelines have been applied to common vision-based applications such vehicle detection, and hand-gesture recognition, but also to more challenging situations, such as robotics applications.

Research

Projects

Publications

GTI Blog

GTI Data

Quality of Experience tests