2D Polygon
许可协议: Unknown


The ADE20K dataset covers a wide range of scenes and object categories with dense and detailed annotations for scene parsing, spanning diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. A scene parsing benchmark is built upon the ADE20K with 150 object and stuff classes included. Several segmentation baseline models are evaluated on the benchmark. A novel network design called Cascade Segmentation Module is proposed to parse a scene into stuff, objects, and object parts in a cascade and improve over the baselines. We further show that the trained scene parsing networks can lead to applications such as image content removal and scene synthesis.

Data Collection

Each folder contains images separated by scene category (same scene categories than the Places Database). For each image, the object and part segmentations are stored in two different png files. All object and part instances are annotated sparately.

For each image there are the following files:

*.jpg: RGB image.

*_seg.png: object segmentation mask. This image contains information about the object class segmentation masks and also separates each class into instances. The channels R and G encode the objects class masks. The channel B encodes the instance object masks. The function loadAde20K.m extracts both masks.

*_seg_parts_N.png: parts segmentation mask, where N is a number (1,2,3,...) indicating the level in the part hierarchy. Parts are organized in a tree where objects are composed of parts, and parts can be composed of parts too, and parts of parts can have parts too. The level N indicates the depth in the part tree. Level N=1 corresponds to parts of objects. All the part segmentations have the same encoding as in the object segmentation masks, classes are coded in the RG channels and instances in the B channel. Use the function loadAde20K.m to extract part segmentation mask and to separate instances of the same class.

*_.txt: text file describing the content of each image (describing objects and parts). This information is redundant with other files. But in addition contains also information about object attributes. The function loadAde20K.m also parses the content of this file. Each line in the text file contains: column 1=instance number, column 2=part level (0 for objects), column 3=occluded (1 for true), column 4=class name (parsed using wordnet), column 5=original raw name (might provide a more detailed categorization), column 6=comma separated attributes list.

Data Annotation

The following example has two part levels. The first segmentation shows the object masks. The second segmentation corresponds to object parts (body parts, mug parts, table parts, ...). The third segmentation shows parts of the heads (eyes, mouth, nose, ...):

img img img img img

Matlab file: index_ade20k_2015.mat

filename: cell array of length N=22210 with the image file names.

folder: cell array of length N with the image folder names.

scene: cell array of length N providing the scene name (same classes as the Places database) for each image.

objectnames: cell array of length C with the object class names.

wordnet_found: array of length C. It indicates if the objectname was found in Wordnet.

wordnet_hypernym: cell array of length C. WordNet hypernyms for each object name.

wordnet_gloss: cell array of length C. WordNet definition.

objectcounts: array of length C with the number of instances for each object class.

objectPresence: array of size [length C, N] with the object counts per image. objectPresence(c,i)=n if in image i there are n instances of object class c.

objectIsPart: array of size [length C, N] counting how many times an object is a part in each image. objectIsPart(c,i)=m if in image i object class c is a part of another object m times. For objects, objectIsPart(c,i)=0, and for parts we will find: objectIsPart(c,i) ≈ objectPresence(c,i).

proportionClassIsPart: array of length C with the proportion of times that class c behaves as a part. If proportionClassIsPart(c)=0 then it means that this is a main object (e.g., car, chair, ...). See bellow for a discussion on the utility of this variable.


  title={Scene parsing through ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela
and Torralba, Antonio},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
MIT Computer Science & Artificial Intelligence Lab
Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts. The institute is a land-grant, sea-grant, and space-grant university, with an urban campus that extends more than a mile (1.6 km) alongside the Charles River.