Action/Event Detection
许可协议: Unknown


Automatically recognizing and localizing a large number of action categories from videos in the wild of significant importance for video understanding and multimedia event detection. THUMOS workshop and challenge aims at exploring new challenges and approaches for large-scale action recognition with large number of classes from open source videos in a realistic setting.

Most of the existing action recognition datasets are composed of videos that have been manually trimmed to bound the action of interest. This has been identified to be a considerable limitation as it poorly matches how action recognition is applied in practical settings. Therefore, THUMOS 2015 will conduct the challenge on temporally untrimmed videos. The participants may train their methods using trimmed clips but will be required to test their systems on untrimmed data

A new forward-looking dataset containing over 430 hours of video data and 45 million frames (70% larger than THUMOS'14) with the following components is made available under this challenge:

  • Training Set: over 13,000 temporally trimmed videos from 101 action classes.
  • Validation Set: Over 2100 temporally untrimmed videos with temporal annotations of actions.
  • Background Set: Approximately 3000 relevant videos guaranteed to not include any instance of the 101 actions.
  • Test Set: Over 5600 temporally untrimmed videos with withheld ground truth.

All videos are collected from YouTube, and will evaluate the success of the proposed methods based on their performance on the new THUMOS 2015 Dataset in two tasks:

  1. Action Classification: this task accepts submissions for whole-clip action classification on 101 action classes.
  2. Temporal Action Localization: this task accepts submissions on action recognition and temporal localization on a subset of 20 action classes.

Participants may either submit a notebook paper that briefly describes their system, or a research paper detailing their approach. All of the submission results will be summarized during the workshop and included in the workshop\conference proceedings. Additionally, the top performers will be invited to give oral presentations, with remaining entries encouraged to present their work in the poster session.

For more details, please see the Evaluation Setup document or the released resources.


The password of THUMOS15 is "THUMOS15_challenge_REGISTERED".


If you make use of the data and resources shared for the competition, e.g., annotations or the attribute lexicon, or want to cite the THUMOS challenge, please use the following references:

  title={The THUMOS challenge on action recognition for videos “in the wild”},
  author={Idrees, H. and Zamir, A. R. and Jiang, Y. and Gorban, A. and Laptev, I. and Sukthankar, R.
  and Shah, M.},
  journal={Computer Vision and Image Understanding},
   author = "Gorban, A. and Idrees, H. and Jiang, Y.-G. and Roshan Zamir, A. and Laptev,
   I. and Shah, M. and Sukthankar, R.",
   title = "{THUMOS} Challenge: Action Recognition with a Large Number of Classes",
   howpublished = "\url{}",
   Year = {2015}}

UCF101 Dataset can be cited as:

   author = {Soomro, K. and Roshan Zamir, A. and Shah, M.},
   booktitle = {CRCV-TR-12-01},
   title = {{UCF101}: A Dataset of 101 Human Actions Classes From
   Videos in The Wild},
   year = {2012}}
The Inria project Pervasive Interaction develops theories and models for context aware, sociable interaction with systems and services that are composed from ordinary objects that have been augmented with abilities to sense, act, communicate and interact with humans and with the environment (smart objects). The ability to interconnect smart objects makes it possible to assemble new forms of systems and services in ordinary human environments.
National Institute for Research in Digital Science and Technology