OOD Detection
许可协议: Custom


NICO dataset is dedicatedly designed for Non-I.I.D. or OOD (Out-of-Distribution) image classification. It simulates a real world setting that the testing distribution may induce arbitrary shifting from the training distribution, which violates the traditional I.I.D. hypothesis of most ML methods. The typical research directions that the dataset can well support include but are not limited to transfer learning or domain adaptation (when testing distribution is known) and stable learning or domain generalization (when testing distribution is unknown).
The basic idea of constructing the dataset is to label images with both main concepts (e.g. dog) and the contexts (e.g. on grass) that visual concepts appear in. By adjusting the proportions of different contexts in training and testing data, one can control the degree of distribution shift flexibly and conduct studies on different kinds of Non-I.I.D. settings.
Till now, there are two superclasses: Animal and Vehicle, with 10 classes for Animal and 9 classes for Vehicle. Each class has 9 or 10 contexts. The average number of images per context ranges from 83 to 215, and the average number of images per class is about 1300 images (similar to ImageNet). In total, NICO contains 19 classes, 188 contexts and nearly 25,000 images. The current version has been able to support the training of deep convolution networks (e.g. ResNet-18) from scratch. The scale is still increasing, and is easy to be expanded because of the hierarchical structure.

Data Collection

Referring to ImageNet, MSCOCO and other classical datasets, we first confirm two superclasses: Animal and Vehicle. For each superclass, we select classes from the 272 candidates in MSCOCO, with the criterion that the selected classes in a super-class should have large inter-class differences. For context selection, we exploit YFCC100m broswer and first derive the frequently co-occurred tag list for a given concept(i.e. classlabel). We then filter out the tags that occur in only a few concepts. Finally, we manually screen all tags and select the ones that are consistent with our definition of contexts(i.e. object attributes or back-grounds and scenes).
After obtaining the conceptual and contextual tags, we concatenate a given conceptual tag and each of its contextual tags to form a query, input the query into the API of Google and Bing image search, and collect the top-ranked images as candidates. Finally, in the phase of screening, we select images into the final dataset according to the following criteria:

  • The content of an image should correctly reflect its concept and context.
  • Given a class, the number of images in each context should be adequate and as balance as possible across contexts.

Note that we don not conduct image registration or filtering by object centralization, so that the selected images are more realistic and in wild than those in ImageNet.


  title={Towards Non-IID Image Classification: A Dataset and Baselines},
  author={He, Yue and Shen, Zheyan and Cui, Peng},
  journal={Pattern Recognition},



Lab of Media and Network, Department of Computer Science and Technology, Tsinghua University