Overview
NICO dataset is dedicatedly designed for Non-I.I.D. or OOD (Out-of-Distribution) image
classification. It simulates a real world setting that the testing distribution may induce arbitrary
shifting from the training distribution, which violates the traditional I.I.D. hypothesis of most ML
methods. The typical research directions that the dataset can well support include but are not
limited to transfer learning or domain adaptation (when testing distribution is known) and stable
learning or domain generalization (when testing distribution is unknown).
The basic idea of constructing the dataset is to label images with both main concepts (e.g. dog) and
the contexts (e.g. on grass) that visual concepts appear in. By adjusting the proportions of
different contexts in training and testing data, one can control the degree of distribution shift
flexibly and conduct studies on different kinds of Non-I.I.D. settings.
Till now, there are two superclasses: Animal and Vehicle, with 10 classes for Animal and 9 classes
for Vehicle. Each class has 9 or 10 contexts. The average number of images per context ranges from
83 to 215, and the average number of images per class is about 1300 images (similar to ImageNet).
In total, NICO contains 19 classes, 188 contexts and nearly 25,000 images. The current version has
been able to support the training of deep convolution networks (e.g. ResNet-18) from scratch. The
scale is still increasing, and is easy to be expanded because of the hierarchical structure.
Data Collection
Referring to ImageNet, MSCOCO and other classical datasets, we first confirm two
superclasses: Animal and Vehicle. For each superclass, we select classes from
the 272 candidates in MSCOCO, with the criterion that the selected classes in a
super-class should have large inter-class differences. For context selection, we
exploit YFCC100m broswer and first derive the frequently co-occurred tag list
for a given concept(i.e. classlabel). We then filter out the tags that occur in
only a few concepts. Finally, we manually screen all tags and select the ones
that are consistent with our definition of contexts(i.e. object attributes or
back-grounds and scenes).
After obtaining the conceptual and contextual tags, we concatenate a given
conceptual tag and each of its contextual tags to form a query, input the query
into the API of Google and Bing image search, and collect the top-ranked
images as candidates. Finally, in the phase of screening, we select images into
the final dataset according to the following criteria:
- The content of an image should correctly reflect its concept and context.
- Given a class, the number of images in each context should be adequate and as balance as possible across contexts.
Note that we don not conduct image registration or filtering by object centralization, so that the selected images are more realistic and in wild than those in ImageNet.
Citation
@article{he2020towards,
title={Towards Non-IID Image Classification: A Dataset and Baselines},
author={He, Yue and Shen, Zheyan and Cui, Peng},
journal={Pattern Recognition},
pages={107383},
year={2020},
publisher={Elsevier}}