Lecture by Shrikanth Narayanan

AVATAR_TEMPLATE_CORE_TOGGLE_MENU

GTI Data

Open databases created and software developed by the GTI and supplemental material to papers.

Databases

SportCLIP (2025): Multi-sport dataset for text-guided video summarization.
Ficosa (2024): The FNTVD dataset has been generated using the Ficosa's recording car.
MATDAT (2023): More than 90K labeled images of martial arts tricking.
SEAW – DATASET (2022): 3 stereoscopic contents in 4K resolution at 30 fps.
UPM-GTI-Face dataset (2022): 11 different subjects captured in 4K, under 2 scenarios, and 2 face mask conditions.
LaSoDa (2022): 60 annotated images from soccer matches in five stadiums with different characteristics and light conditions.
PIROPO Database (2021):People in Indoor ROoms with Perspective and Omnidirectional cameras.
EVENT-CLASS (2021): High-quality 360-degree videos in the context of tele-education.
Parking Lot Occupancy Database (2020)
Nighttime Vehicle Detection database (NVD) (2019)
Hand gesture dataset (2019): Multi-modal Leap Motion dataset for Hand Gesture Recognition.
ViCoCoS-3D (2016): VideoConference Common Scenes in 3D.
LASIESTA database (2016): More than 20 sequences to test moving object detection and tracking algorithms.
Hand gesture database (2015): Hand-gesture database composed by high-resolution color images acquired with the Senz3D sensor.
HRRFaceD database (2014):Face database composed by high resolution images acquired with Microsoft Kinect 2 (second generation).
Lab database (2012): Set of 6 sequences to test moving object detection strategies.
Vehicle image database (2012): More than 7000 images of vehicles and roads.

Software

NaviFormer (2025): A Deep Reinforcement Learning Transformer-like Model to Holistically Solve the Navigation Problem.
Empowering Computer Vision in Higher Education(2024): A Novel Tool for Enhancing Video Coding Comprehension.
Engaging students in audiovisual coding through interactive MATLAB GUIs (2024)
TOP-Former: A Multi-Agent Transformer Approach for the Team Orienteering Problem (2023)
Solving Routing Problems for Multiple Cooperative Unmanned Aerial Vehicles using Transformer Networks (2023)
Vision Transformers and Traditional Convolutional Neural Networks for Face Recognition Tasks (2023)
Faster GSAC-DNN (2023): A Deep Learning Approach to Nighttime Vehicle Detection Using a Fast Grid of Spatial Aware Classifiers.
SETForSeQ (2020): Subjective Evaluation Tool for Foreground Segmentation Quality.
SMV Player for Oculus Rift (2016)
Bag-D3P (2016): Face recognition using depth information.
TSLAB (2015): Tool for Semiautomatic LABeling.

Supplementary material

Viewpoint-Invariant Soccer Pitch Registration Using Geometric and Learned Features (2025)
Soccer line mark segmentation and classification with stochastic watershed transform (2022)
A fully automatic method for segmentation of soccer playing fields (2022)
Grass band detection in soccer images for improved image registration (2022)
Evaluating the Influence of the HMD, Usability, and Fatigue in 360VR Video Quality Assessments (2020)
Automatic soccer field of play registration (2020)
Augmented reality tool for the situational awareness improvement of UAV operators (2017)
Detection of static moving objects using multiple nonparametric background-foreground models on a Finite State Machine (2015)
Real-time nonparametric background subtraction with tracking-based foreground update (2015)
Camera localization using trajectories and maps (2014)

"Behavioral Signal Processing: Enabling human-centered behavioral informatics" by Shrikanth Narayanan

Room: B-223

12:00 p.m

Escuela Técnica Superior de Ingenieros de Telecomunicación (ETSIT)

Universidad Politécnica de Madrid (UPM)

Monday, 27 June 2016

Shrikanth (Shri) Narayanan

University of Southern California, Los Angeles, CA

Signal Analysis and Interpretation Laboratory

http://sail.usc.edu

The confluence of sensing, communication and computing technologies is allowing capture and access to data, in diverse forms and modalities, in ways that were unimaginable even a few years ago. These include data that afford the analysis and interpretation of multimodal cues of verbal and non-verbal human behavior to facilitate human behavioral research and its translational applications. They carry crucial information about a person's intent, identity and trait but also underlying attitudes and emotions. Automatically capturing these cues, although vastly challenging, offers the promise of not just efficient data processing but in tools for discovery that enable hitherto unimagined scientific insights, and means for supporting diagnostics and interventions.

Recent computational approaches that have leveraged judicious use of both data and knowledge have yielded significant advances in this regards, for example in deriving rich, context-aware information from multimodal signal sources including human speech, language, and videos of behavior. These are even complemented and integrated with data about human brain and body physiology. This talk will focus on some of the advances and challenges in gathering such data and creating algorithms for machine processing of such cues. It will highlight some of our ongoing efforts in Behavioral Signal Processing (BSP)-technology and algorithms for quantitatively and objectively understanding typical, atypical and distressed human behavior-with a specific focus on communicative, affective and social behavior. The talk will illustrate Behavioral Informatics applications of these techniques that contribute to quantifying higher-level, often subjectively described, human behavior in a domain-sensitive fashion. Examples will be drawn from mental health and well being realms such as Autism Spectrum Disorders, Couple therapy, Depression and Addiction counseling.

Biography of the Speaker

Shrikanth (Shri) Narayanan is Andrew J. Viterbi Professor of Engineering at the University of Southern California, where he is Professor of Electrical Engineering, and jointly in Computer Science, Linguistics, Psychology, Neuroscience and Pediatrics, and Director of the Ming Hsieh Institute. Prior to use he was with AT&T Bell Labs and AT&T Research. His research focuses on human-centered information processing and communication technologies. He is a Fellow of the Acoustical Society of America, IEEE, and the American Association for the Advancement of Science (AAAS).

Shri Narayanan is Editor in Chief for IEEE Journal on Selected Topics in Signal Processing, an Editor for the Computer, Speech and Language Journal and an Associate Editor for the IEEE Transactions on Affective Computing, the Journal of Acoustical Society of America, and the APISPA Transactions on Signal and Information Processing having previously served an Associate Editor for the IEEE Transactions of Speech and Audio Processing (2000-2004 ), the IEEE Signal Processing Magazine (2005-2008) and the IEEE Transactions on Multimedia (2008-2012). He is a recipient of several honors including the 2015 Engineers Council's Distinguished Educator Award, the 2005 and 2009 Best Transactions Paper awards from the IEEE Signal Processing Society and serving as its Distinguished Lecturer for 2010-11, and as an ISCA Distinguished Lecturer for 2015-16. With his students, he has received a number of best paper awards including a 2014 Ten-year Technical Impact Award from ACM ICMI and Interspeech Challenges in 2009 (Emotion classification), 2011 (Speaker state classification), 2012 (Speaker trait classification), 2013 (Paralinguistics/Social Signals), 2014 (Paralinguistics/Cognitive Load) and in 2015 (Non-nativeness detection). He has published over 650 papers and has been granted 17 U.S. patents.

Research

Projects

Publications

GTI Blog

GTI Data

Quality of Experience tests