Overview
we have presented a diverse dataset for autonomous driving by capturing a wide range of interesting scenarios, which shows the domain diversity among different cities, including Beijing, Shenzhen and Xi’an. The dataset are open to the public; anyone interested in training is welcome. We believe that this dataset will be highly useful in many areas of computer vision and autonomous driving.
Data Collection
We collect the data in different cities, including Beijing, Shenzhen and Xi’an. The data collection covers multiple regions, and various driving conditions, including day, night, dawn, dusk and sunny day. We sample the raw data in different sample frequency according to the different velocity of the collection car. Then our annotators discard the data whose scene is the same with other files. The remaining files are annotated by the annotators.
Data Preview
Label Distribution
Data Annotation
According to the operational scene, we define the classes 'Adult', 'Child', 'Car', 'Cyclist', 'Unknown', 'Tricycle', 'Barrier', 'Bicycle', 'Bicycles', 'Bus', 'Truck', 'Motorcycle', 'Motorcyclist' and 'Animal'. There are only the first eight classes of objects in our dataset so far.
For each object in a frame point clouds file, we provide rich annotations, including the object type, the location of the box's center, the dimension of the box and the rotate angle of object's heading.
Data Format
Our dataset contains two parts: the point clouds files and their annotations.
Our point clouds is composed of the point clouds from three lidars. The three lidars are distributed on the top, left and right of the car respectively. The label includs many rows and each row discribes a 3D box.
Instruction
1. Read point cloud file
import numpy as np
## The fields of point cloud contains four attributes: x, y, z and intensity.
## And the intensity of each point is set forcely zero.
points = np.fromfile(pc_path, dtype=np.float32).reshape(-1, 4)
2. The statistics of dataset
def count_labels(l_path, cls):
"""
size: h,w,l
z_axis: the coordinate of the bounding box in z-axis
"""
size = []
z_axis = []
for l in os.listdir(l_path):
with open(l_path+l) as f:
label_lines = f.readlines()
for label_line in label_lines:
label_line = label_line.split(" ")
if label_line[0] == cls:
size.append([float(label_line[8]), float(label_line[9]), float(label_line[10])])
z_axis.append(float(label_line[13]))
np_size = np.array(size)
np_z_axis = np.array(z_axis)
return np_size.shape[0]
def visu_class(class_path):
plt.figure(figsize=(10, 10), dpi=80)
class_ls = ['Bus', 'Dontcare', 'Barrier', 'Animal', 'Bicycles', 'Cyclist', 'Car', 'Child', 'Adult','Truck', 'Motorcycle', 'Bicycle', 'Motorcyclist', 'Tricycle']
N = len(class_ls)
objects_num = []
for c in class_ls:
objects_num.append(count_labels(class_path, c))
values = tuple(objects_num)
index = np.arange(N)
width = 0.45
label_content = ""
for c, n in zip(class_ls, objects_num):
label_content += "%s: %d" % (c, n) + "\n"
print(label_content)
print(index, values)
p2 = plt.bar(index, values, width, label=label_content, color="BLUE")
plt.xlabel('class')
plt.ylabel('number of bounding box')
plt.title(' Bounding boxes distributions')
plt.xticks(index, ('Bus', 'Dontcare', 'Barrier', 'Animal', 'Bicycles', 'Cyclist', 'Car', 'Child', 'Adult', 'Truck', 'Motorcycle', 'Bicycle', 'Motorcyclist', 'Tricycle'))
plt.legend(loc="upper right")
plt.show()
Citation
This dataset are free for academic usage. You can run them at your own risk. For other purposes, please contact the corresponding author Lichao Wang (wanglichao@neolix.cn)
@misc{2011.13528,
Author = {Lichao Wang and Lanxin Lei and Hongli Song and Weibao Wang},
Title = {The NEOLIX Open Dataset for Autonomous Driving},
Year = {2020},
Eprint = {arXiv:2011.13528},
}