End of Degree Projects in GTI

 

Research  

 

GTI Data   

 

Open databases created and software developed by the GTI and supplemental material to papers.  

 

Databases  


SportCLIP (2025): Multi-sport dataset for text-guided video summarization.
Ficosa (2024):
The FNTVD dataset has been generated using the Ficosa's recording car.
MATDAT (2023):  More than 90K labeled images of martial arts tricking.
SEAW – DATASET (2022): 3 stereoscopic contents in 4K resolution at 30 fps.
UPM-GTI-Face dataset (2022): 11 different subjects captured in 4K, under 2 scenarios, and 2 face mask conditions.
LaSoDa (2022): 60 annotated images from soccer matches in five stadiums with different characteristics and light conditions.
PIROPO Database (2021):People in Indoor ROoms with Perspective and Omnidirectional cameras.
EVENT-CLASS (2021): High-quality 360-degree videos in the context of tele-education.
Parking Lot Occupancy Database (2020)
Nighttime Vehicle Detection database (NVD) (2019)
Hand gesture dataset (2019): Multi-modal Leap Motion dataset for Hand Gesture Recognition.
ViCoCoS-3D (2016): VideoConference Common Scenes in 3D.
LASIESTA database (2016): More than 20 sequences to test moving object detection and tracking algorithms.
Hand gesture database (2015): Hand-gesture database composed by high-resolution color images acquired with the Senz3D sensor.
HRRFaceD database (2014):Face database composed by high resolution images acquired with Microsoft Kinect 2 (second generation).
Lab database (2012): Set of 6 sequences to test moving object detection strategies.
Vehicle image database (2012)More than 7000 images of vehicles and roads.           

 

Software  


Empowering Computer Vision in Higher Education(2024)A Novel Tool for Enhancing Video Coding Comprehension.
Engaging students in audiovisual coding through interactive MATLAB GUIs (2024)

TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem (2023)

Solving Routing Problems for Multiple Cooperative Unmanned Aerial Vehicles using Transformer Networks (2023)
Vision Transformers and Traditional Convolutional Neural Networks for Face Recognition Tasks (2023)
Faster GSAC-DNN (2023): A Deep Learning Approach to Nighttime Vehicle Detection Using a Fast Grid of Spatial Aware Classifiers.
SETForSeQ (2020): Subjective Evaluation Tool for Foreground Segmentation Quality. 
SMV Player for Oculus Rift (2016)

Bag-D3P (2016): 
Face recognition using depth information. 
TSLAB (2015): 
Tool for Semiautomatic LABeling.   
 

   

Supplementary material  


Soccer line mark segmentation and classification with stochastic watershed transform (2022)
A fully automatic method for segmentation of soccer playing fields (2022)
Grass band detection in soccer images for improved image registration (2022)
Evaluating the Influence of the HMD, Usability, and Fatigue in 360VR Video Quality Assessments (2020)
Automatic soccer field of play registration (2020)   
Augmented reality tool for the situational awareness improvement of UAV operators (2017)
Detection of static moving objects using multiple nonparametric background-foreground models on a Finite State Machine (2015)
Real-time nonparametric background subtraction with tracking-based foreground update (2015)  
Camera localization using trajectories and maps (2014)

 

                                                                                                                                                                                                                             
 
                                                                   
 
                                                                                                                                                             
 
      

 

 

End of Degree Projects in GTI 

 

Development of virtual reality 3D graphics visualization tools   

Miguel Rodríguez Millán

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, Julio 2016, "Sobresaliente, 9,5".

Advisor: Rafael Pagés Scasso.

Overseer: Francisco Morán Burgos

Nowadays the number and diversity of applications using 3D models have risen. This has been mainly caused by the evolution of computer science and the improvement of modelling and 3D graphic visualization techniques.

As an alternative to traditional visualization systems, Head Mounted Display (HMD) devices have become highly popular. These devices let the user visualize images through displays placed close to the eyes, while tracking his/her movement and generating a view that is correspondent to his/her position, creating a very superior immersive experience.
This Final Degree Project focuses on the implementation of an application with a interface to let the user load his/her own 3D models in OBJ format. It also allows him/her to visualize these models through a virtual reality device, particularly, the Oculus Rift. To do this, a study of the different development choices has been conducted, selecting the game engine Unity as the most convenient option, using C# programming language.

The developed system let the user experience a virtual reality scene after selecting the 3D model to load, in such a way as to ensure that the user can interact with that synthetic reality viewing the model from different angles or even walking virtually around the object itself, all in a superior immersive way.

  

Study and implementation of strategies for geo-registration of aerial images   

Alfredo Tendero Casanova

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, Julio 2016, "Sobresaliente, 10".

Advisor: Carlos Cuevas.

This work has been developed in order to do a comparative study of the strategies currently used for image registration, particularly in the geo-registration or registration of aerial images on the surface of the earth, identifying the most promising methods according to their robustness, computational efficiency and usability.

The study shows that the registration strategies most commonly used in this field are based on matching certain features of the images, called keypoints. In this thesis the Scale Invariant Feature Transform algorithm (SIFT) is studied, as it is the most robust to scale, rotation and lighting changes.

Finally, based on this strategy, an application has been developed to registrate simulated flight sequences under certain conditions on georeferenced image areas. The obtained results have been discussed and analyzed.  

 

Study and implementation of strategies for automatic discrimination of abandoned and subtracted objects in video surveillance environments   

Lara Muñoz Sánchez

E.T.S. Ing. Telecomunicación, Universidad Politécnica de Madrid, Julio 2016, "Sobresaliente, 10".

Advisor: Carlos Cuevas.

In this Trabajo Fin de Grado, different strategies for classifying abandoned and subtracted objects are analyzed. The analyzed strategies correspond to the third stage of many of the works that carry out this type of classification, after appying a background subtraction and after detecting stationary foreground objects in the scene. The first stage, background subtraction, extracts from the background every pixel belonging to the foreground. That is, not only moving pixels but stationary ones. 

The second stage, by analyzing the stability of the changes produced in the pixels, classifies the foreground in pixels belonging to moving objects and pixels belonging to stationary foreground objects. These stationary objects are regions on the image that have changed (temporally or permanently). Therefore, this situation is produced both when an object is abandoned and when an object is removed from the scene. This is the reason why an additional stage to discriminate between both situations is added in the previously mentioned works. An analysis of the most relevant proposals of the last years has been performed.

According to this analysis, different strategies covering the main types of criteria typically used to classify the stationary objects have been implemented. Finally, a study of the main advantages and disadvantages of each of the implemented algorithms has been performed, with the aim of providing enough information to be able of selecting those strategies that are the most appropriate according to the characteristics of the analyzed sequences.