Smart Retailing
许可协议: Custom


We introduce RP2K, a new large-scale retail product dataset for fine-grained image classification. Unlike previous datasets focusing on relatively few products, we collect more than 500,000 images of retail products on shelves belonging to 2000 different products. Our dataset aims to advance the research in retail object recognition, which has massive applications such as automatic shelf auditing and image-based product information retrieval.

Our dataset enjoys following properties: (1) It is by far the largest scale dataset in terms of product categories. (2) All images are captured manually in physical retail stores with natural lightings, matching the scenario of real applications. (3) We provide rich annotations to each object, including the sizes, shapes and flavors/scents. We believe our dataset could benefit both computer vision research and retail industry.

Overview information of the RP2K dataset


Categorized information of the RP2K dataset


Data Collection

Pipeline of our data collection process. Our photo collectors were first distributed in over 500 different retail stores and collected over 10k high-resolution shelf images. Then we use a pre-trained detection model to extract the bounding boxes of potential objects of interests. After that, our human annotators discard the incorrect bounding boxes, including heavily occluded images and images that is not a valid retail product. The remaining images are annotated by the annotators.


Data Preview

Label Distribution

Data Annotation

Our dataset contains two components: the original shelf images and the individual object images cropped from the shelf images. The shelf images are labeled with the shelf type, store ID, and a list of bounding boxes of objects of interest. For each image cropped from its bounding box, we provide rich annotations include the SKU ID, product name, brand, product type, shape, size, flavor/scent and the bounding box reference to its corresponding shelf image. Fig. 5 demonstrates some sample attributes of the object images. Note that some attributes may not be applicable to particular products.

We also provide meta category label for each object image, in two different ways. One is categorized by its product type, which reflflects the placement of the products, i.e., products with the same type usually placed on the same or nearby shelves. We include 6 meta categories by product types: dairy, liquor, beer, cosmetics, non-alcoholic drinks and seasoning.

Another categorization method is by its product shape. We include 7 shapes, bottle, can, box, bag, jar, handled bottle and pack , which covers all possible shapes that appeared in our dataset. These 7 shapes are also used in training our pre-annotation detector. The sample images for different meta-categories are shown in Fig. 4.

Besides these two meta categorization method, our rich labels provide an option for the users to evaluate their algorithms on a customized fifine-grained level.

Data Format

Sample images from our dataset. Precise retail product recognition on shelves is considered highly challenging because (a) Products from the same line may have different sizes, and they usually have similar appearances but different prices. The image size could not reflect the real size of the products.(b) The manufacturer usually make multiple flavors for one product line, but their appearance only have subtle differences on the labels.(c) Product images may be captured at different camera angles according to its placement location on shelves. The image can also be stretched due to camera distortion.


This dataset and code packages are free for academic usage. You can run them at your own risk. For other purposes, please contact the corresponding author Jingtian Peng (pjt@pinlandata.com)

 title={RP2K: A Large-Scale Retail Product Dataset forFine-Grained Image Classification},
 author={Peng, Jingtian and Xiao, Chang and Wei, Xun and Li, Yifan},
 journal={arXiv preprint arXiv:2006.12634},



Pinlan is an AI item recognition expert. Our products combine the cognitive recognition capabilities of AI, the strong computing power of Cloud, and the edge support of IoT.
Founded in 2011, Testin is an enterprise service platform driven by artificial intelligence technology. Testin Cloud Test provides cloud testing services, AI data annotation services, and security services for more than one million companies and developers worldwide. The mission of Testin Cloud Test is to help the industry intelligence, that is, in the global industrialization upgrade wave, Testin Cloud Test accelerates the process of enterprise intelligence, digitalization, and technology through the sharing of tools, technology, talents, and services. Intelligent upgrading and commercialization of enterprises in various industries.