Olympic Highlights Dataset

 

Research  

 

Olympic Highlights Dataset 

Description


Olympic Highlights is a publicly available benchmark of full-length broadcast track-and-field competition videos with frame-level temporal annotations and athlete-identity labels. It is designed to support reproducible evaluation of identity-aware sports video summarization systems.

The dataset consists of 20 videos spanning four events: High Jump, Javelin, Long Jump, and Pole Vault, with five videos per event. Each video is a full broadcast recording of approximately 30 minutes (roughly 10 hours of footage in total), collected from publicly available Olympic and World Athletics broadcasts on YouTube.

Each video is annotated at the frame level with three segment types:

  • Highlight (HL): Frames covering the core athletic action/attempt and its immediate outcome.
  • Non-Highlight (NHL): Background footage between highlight events (transitions, crowd shots, etc.).
  • Uncertainty (UN): Boundary frames immediately surrounding each highlight, accounting for natural ambiguity in annotation boundaries (approximately 15 frames before and 30 frames after each highlight).

In addition, every highlight segment is labeled with the athlete responsible for the action, enabling evaluation of identity-aware summarization systems.

  

Contents


The dataset is distributed as a ZIP archive with the following structure:

OlympicHighlights/

|-- videos.csv

|-- High Jump/

| |-- high_jump_1.csv ... high_jump_5.csv

|-- Javelin/

| |-- javelin_1.csv ... javelin_5.csv

|-- Long Jump/

| |-- long_jump_1.csv ... long_jump_5.csv

‘-- Pole Vault/

‘-- pole_vault_1.csv ... pole_vault_5.csv

 

Dataset Statistics


Table 1 summarizes the ground-truth annotation statistics for every video in the dataset. For each segment type, the table reports the average event duration (Avg, in seconds), the total cumulative duration (Tot, in minutes), and the percentage of video time occupied by that category.

Table 1: Ground-truth statistics for the Olympic Highlights dataset (20 videos; 5 per event).

OlympicHighlightsDataset 1

 

Ground Truth File Format


Each ground truth file (e.g., high_jump_1.csv) contains frame-level annotations in CSV format with the following columns:

OlympicHighlightsDataset 2

The segments are listed in chronological order and together span the full video from frame 0 to the last frame. Frame indices assume a fixed 30 fps frame rate.

Example:

Event type, First frame, Last frame, Num. frames

Not a highlight, 0, 6087, 6088

Uncertainty, 6088, 6102, 15

Highlight, 6103, 6298, 196

Uncertainty, 6299, 6328, 30

Not a highlight: 6,329,638,658

...

 

Video Links


The videos are not distributed directly due to copyright. The file videos.csv (included in this archive) lists the YouTube URL for each of the 20 videos. All videos are from the World Athletics and European Athletics YouTube channels and were publicly available at the time of annotation.

videos.csv columns: sport, ground_truth_file, url.

  

 

Evaluation Protocol


The dataset was used in the associated paper to evaluate identity-aware highlight selection at the event level. A predicted highlight segment is counted as a true positive if:

  1. Its temporal Intersection-over-Union (IoU) with a ground-truth highlight is ≥ 0.3, and
  2. It is attributed to the correct athlete, and
  3. No other prediction has already been matched to that ground-truth event (one-to-one matching).

Recall, precision, and F-score are computed per video, then averaged per event and across the full dataset. Uncertainty frames are excluded from frame-level evaluation.

Citation


If you use this dataset in your research, please cite:

M. Rodrigo, C. Cuevas, and N. García, “Automatic Sports Video Summarization with Identity-Aware Highlight Selection,” Image and Vision Computing, under review.

@article{rodrigo2025pvs,

title = {Automatic Sports Video Summarization with

Identity-Aware Highlight Selection},

author = {Rodrigo, Marcos and Cuevas, Carlos and

Garcia, Narciso},

journal = {Image and Vision Computing},

note = {Under review}

}

Code and additional resources are available at https://github.com/MarcosRodrigoT/PVS.