Action/Event Detection
许可协议: Unknown


Automatically recognizing and localizing a large number of action categories from videos in the wild of significant importance for video understanding and multimedia event detection. THUMOS workshop and challenge aims at exploring new challenges and approaches for large-scale action recognition with large number of classes from open source videos in a realistic setting.

Most of the existing action recognition datasets are composed of videos that have been manually trimmed to bound the action of interest. This has been identified to be a considerable limitation as it poorly matches how action recognition is applied in practical settings. Therefore, THUMOS 2014 will conduct the challenge on temporally untrimmed videos. The participants may train their methods using trimmed clips but will be required to test their systems on untrimmed data.

It includes 1,010 videos and 1,574 videos with 20 action classes in the validation and test sets, respectively. There are 200 and 212 videos with temporal annotations of actions labeled in the validation and testing sets, respectively.

A new forward-looking dataset containing over 254 hours of video data and 25 million frames with the following components is made available under this challenge:

  • Training Set: over 13,000 temporally trimmed videos from 101 action classes.
  • Validation Set: Over 1000 temporally untrimmed videos with temporal annotations of actions.
  • Background Set: Over 2500 relevant videos guaranteed to not include any instance of the 101 actions.
  • Test Set: Over 1500 temporally untrimmed videos with withheld ground truth.
  • Spatio-temporal Annotations: Bounding box annotation for 24 action classes.

All videos are collected from YouTube, and their pre-extracted low-level features (Improved Dense Trajectory Features) are made available.

The entries to the challenge will be evaluated using the new THUMOS 2014 Dataset in two tasks:

  1. Action Recognition: accepts submissions for whole-clip action recognition over 101 classes.
  2. Temporal Action Detection: accepts submissions on action recognition and temporal localization on 20 action classes.

For more details, please see the Evaluation Setup document or the released resources.


The password of THUMOS14 is "THUMOS14_REGISTERED".


If you make use of the data and resources shared for the competition, e.g., extracted low-level features or the attribute lexicon, or want to cite the THUMOS challenge, please use the following reference:

   author = "Jiang, Y.-G. and Liu, J. and Roshan Zamir, A. and Toderici, G. and Laptev, I. and Shah, M. and Sukthankar, R.",
   title = "{THUMOS} Challenge: Action Recognition with a Large
   Number of Classes",
   howpublished = "\url{}",
   Year = {2014}}

UCF101 Dataset can be cited as:

   author = {Soomro, K. and Roshan Zamir, A. and Shah, M.},
   booktitle = {CRCV-TR-12-01},
   title = {{UCF101}: A Dataset of 101 Human Actions Classes From Videos in The Wild},
   year = {2012}}
National Institute for Research in Digital Science and Technology
National Institute for Research in Digital Science and Technology