Overview
We present a large-scale dataset based on the KITTI Vision Benchmark and we used all sequences provided by the odometry task. We provide dense annotations for each individual scan of sequences 00-10, which enables the usage of multiple sequential scans for semantic scene interpretation, like semantic segmentation and semantic scene completion.
The remaining sequences, i.e., sequences 11-21, are used as a test set showing a large variety of challenging traffic situations and environment types. Labels for the test set are not provided and we use an evaluation service that scores submissions and provides test set results.
Classes
The dataset contains 28 classes including classes distinguishing non-moving and moving objects. Overall, our classes cover traffic participants, but also functional classes for ground, like parking areas, sidewalks.
Folder structure and format
Semantic Segmentation and Panoptic Segmentation
We provide for each scan XXXXXX.bin
of the velodyne folder in the sequence folder of
the original KITTI Odometry Benchmark, a file XXXXXX.label
in the labels folder that contains
for each point a label in binary format. The label is a 32-bit unsigned integer (aka uint32_t
)
for each point, where the lower 16 bits correspond to the label. The upper 16 bits encode the
instance id, which is temporally consistent over the whole sequence, i.e., the same object
in two different scans gets the same id. This also holds for moving cars, but also static objects
seen after loop closures.
We furthermore provide the poses.txt
file that contains the poses,
which we used to annotate the data, estimated by a surfel-based SLAM approach (SuMa).
Semantic Scene Completion
We provide for each scan XXXXXX.bin
of the velodyne folder in the sequence folder of
the original KITTI Odometry Benchmark, we provide in the voxel folder:
- a file
XXXXXX.bin
in a packed binary format that contains for each voxel if that voxel is occupied by laser measurements. This is the input to the semantic scene completion task and it corresponds to the voxelization of a single LiDAR scan. - a file
XXXXXX.label
that contains for each voxel of the completed scene a label in binary format. The label is a 16-bit unsigned integer (akauint16_t
) for each voxel. - a file
XXXXXX.invalid
in a packed binary format that contains for each voxel a flag indicating if that voxel is considered invalid, i.e., the voxel is never directly seen from any position to generate the voxels. These voxels are also not considered in the evaluation. - a file
XXXXXX.occluded
in a packed binary format that contains for each voxel a flag that specifies if this voxel is either occupied by LiDAR measurements or occluded by a voxel in line of sight of all poses used to generate the completed scene.
The blue files ()
are only given for the training data and the label file must be predicted for the semantic
segmentation task.
To allow a higher compression rate, we store the binary flags in a custom format, where we store the flags as bit flags,i.e., each byte of the file corresponds to 8 voxels in the unpacked voxel grid. Please see the development kit for further information on how to efficiently read these files using numpy.
See also our development kit for further information on the labels and the reading of the labels using Python. The development kit also provides tools for visualizing the point clouds.
Citation
Please use the following citation when referencing the dataset:
@inproceedings{behley2019iccv,
author = {J. Behley and M. Garbade and A. Milioto and J. Quenzel and S. Behnke and C. Stachniss
and J. Gall},
title = {{SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences}},
booktitle = {Proc. of the IEEE/CVF International Conf.~on Computer Vision (ICCV)},
year = {2019}
}
But also cite the original KITTI Vision Benchmark:
@inproceedings{geiger2012cvpr,
author = {A. Geiger and P. Lenz and R. Urtasun},
title = {{Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite}},
booktitle = {Proc.~of the IEEE Conf.~on Computer Vision and Pattern Recognition (CVPR)},
pages = {3354--3361},
year = {2012}
}