graviti
产品服务
解决方案
知识库
公开数据集
关于我们
NII Okutama-Action
2D Box
Person
|Autonomous Driving
|...
许可协议: Unknown

Overview

We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications.HIGHLIGHTSAn aerial view dataset that contains representative samples of actions in real-world airborne scenariosDynamic transition of actions where, in each video, up to 9 actors sequentially perform a diverse set of actionsA real-world challenge of multi-labeled actors where an actor performs more than one action at the same time.A significant increase compared to previous datasets, in number of actors and concurrent actions (up to 10 actions/actors), as well as video resolution (3840x2160) and sequence length (one minute on average).Dataset can be used for multiple tasks: 1- pedestrian detection 2- spatio-temporal action detection 3- (under development) multi-human tracking.DATASET STRUCTUREVideo Names: each video name consists of 3 integers separated by dots. The definition of these integers from left to right are:Drone number. Each scenario, with the exception of one, was captured using 2 drones (of different configuration) at the same time.Part of the day. “1” indicates morning and “2” indicates noon.Scenario number.Hence, the pair of videos with the same last two integers are the same scenario with different drones configuration.Labels: Each line contains 10+ columns, separated by spaces. The definition of these columns are:Track ID. All rows with the same ID belong to the same person for 180 frames. Then the person gets a new idea for the next 180 frames. We will soon release an update to make the IDs consistant.xmin. The top left x-coordinate of the bounding box.ymin. The top left y-coordinate of the bounding box.xmax. The bottom right x-coordinate of the bounding box.ymax. The bottom right y-coordinate of the bounding box.frame. The frame that this annotation represents.lost. If 1, the annotation is outside of the view screen.occluded. If 1, the annotation is occluded.generated. If 1, the annotation was automatically interpolated.label. The label for this annotation, enclosed in quotation marks. This field is always “Person”.(+) actions. Each column after this is an action.There are two label files for each video; one for single-action detection and one for multi-action detection. Note that labels for single-action detection has been created from the multi-action detection labels (for more details please refer to our publication). For pedestrian detection task, the columns describing the actions should be ignored.If you find this dataset useful, please cite the following paper arXiv:Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action DetectionM. Barekatain, M. Martí, H. Shih, S. Murray, K. Nakayama, Y. Matsuo, and H. PrendingerarXiv:1706.03038, 2017

数据概要
数据格式
image,
数据量
--
文件大小
--
| 数据量 -- | 大小 --
NII Okutama-Action
2D Box
Person | Autonomous Driving
许可协议: Unknown

Overview

We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in current datasets, including dynamic transition of actions, significant changes in scale and aspect ratio, abrupt camera movement, as well as multi-labeled actors. As a result, our dataset is more challenging than existing ones, and will help push the field forward to enable real-world applications.HIGHLIGHTSAn aerial view dataset that contains representative samples of actions in real-world airborne scenariosDynamic transition of actions where, in each video, up to 9 actors sequentially perform a diverse set of actionsA real-world challenge of multi-labeled actors where an actor performs more than one action at the same time.A significant increase compared to previous datasets, in number of actors and concurrent actions (up to 10 actions/actors), as well as video resolution (3840x2160) and sequence length (one minute on average).Dataset can be used for multiple tasks: 1- pedestrian detection 2- spatio-temporal action detection 3- (under development) multi-human tracking.DATASET STRUCTUREVideo Names: each video name consists of 3 integers separated by dots. The definition of these integers from left to right are:Drone number. Each scenario, with the exception of one, was captured using 2 drones (of different configuration) at the same time.Part of the day. “1” indicates morning and “2” indicates noon.Scenario number.Hence, the pair of videos with the same last two integers are the same scenario with different drones configuration.Labels: Each line contains 10+ columns, separated by spaces. The definition of these columns are:Track ID. All rows with the same ID belong to the same person for 180 frames. Then the person gets a new idea for the next 180 frames. We will soon release an update to make the IDs consistant.xmin. The top left x-coordinate of the bounding box.ymin. The top left y-coordinate of the bounding box.xmax. The bottom right x-coordinate of the bounding box.ymax. The bottom right y-coordinate of the bounding box.frame. The frame that this annotation represents.lost. If 1, the annotation is outside of the view screen.occluded. If 1, the annotation is occluded.generated. If 1, the annotation was automatically interpolated.label. The label for this annotation, enclosed in quotation marks. This field is always “Person”.(+) actions. Each column after this is an action.There are two label files for each video; one for single-action detection and one for multi-action detection. Note that labels for single-action detection has been created from the multi-action detection labels (for more details please refer to our publication). For pedestrian detection task, the columns describing the actions should be ignored.If you find this dataset useful, please cite the following paper arXiv:Okutama-Action: An Aerial View Video Dataset for Concurrent Human Action DetectionM. Barekatain, M. Martí, H. Shih, S. Murray, K. Nakayama, Y. Matsuo, and H. PrendingerarXiv:1706.03038, 2017

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号