nuScenes
3D Semantic Segmentation
3D Box Tracking
Urban
|Autonomous Driving
|...
许可协议: CC BY-NC-SA 4.0

Overview

The nuScenes dataset is a large-scale autonomous driving dataset with 3d object annotations. It features:

● Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS)

● 1000 scenes of 20s each

● 1,400,000 camera images

● 390,000 lidar sweeps

● Two diverse cities: Boston and Singapore

● Left versus right hand traffic

● Detailed map information

● 1.4M 3D bounding boxes manually annotated for 23 object classes

● Attributes such as visibility, activity and pose

New: 1.1B lidar points manually annotated for 32 classes

New: Explore nuScenes on SiaSearch

● Free to use for non-commercial use

● For a commercial license contact nuScenes@motional.com

Data Collection

Scene planning

For the nuScenes dataset we collect approximately 15h of driving data in Boston and Singapore. For the full nuScenes dataset, we publish data from Boston Seaport and Singapore’s One North, Queenstown and Holland Village districts. Driving routes are carefully chosen to capture challenging scenarios. We aim for a diverse set of locations, times and weather conditions. To balance the class frequency distribution, we include more scenes with rare classes (such as bicycles). Using these criteria, we manually select 1000 scenes of 20s duration each. These scenes are carefully annotated using human experts. The annotator instructions can be found in the devkit repository.

Car setup

We use two Renault Zoe cars with an identical sensor layout to drive in Boston and Singapore. The data was gathered from a research platform and is not indicative of the setup used in Motional products. Please refer to the above figure for the placement of the sensors. We release data from the following sensors:

data
  • 1x spinning LIDAR:
    • 20Hz capture frequency
    • 32 channels
    • 360° Horizontal FOV, +10° to -30° Vertical FOV
    • 80m-100m Range, Usable returns up to 70 meters, ± 2 cm accuracy
    • Up to ~1.39 Million Points per Second
  • 5x long range RADAR sensor:
    • 77GHz
    • 13Hz capture frequency
    • Independently measures distance and velocity in one cycle using Frequency Modulated Continuous Wave
    • Up to 250m distance
    • Velocity accuracy of ±0.1 km/h
  • 6x camera:
    • 12Hz capture frequency
    • 1/1.8'' CMOS sensor of 1600x1200 resolution
    • Bayer8 format for 1 byte per pixel encoding
    • 1600x900 ROI is cropped from the original resolution to reduce processing and transmission bandwidth
    • Auto exposure with exposure time limited to the maximum of 20 ms
    • Images are unpacked to BGR format and compressed to JPEG
    • See camera orientation and overlap in the figure below.
camera

Sensor calibration

To achieve a high quality multi-sensor dataset, it is essential to calibrate the extrinsics and intrinsics of every sensor. We express extrinsic coordinates relative to the ego frame, i.e. the midpoint of the rear vehicle axle. The most relevant steps are described below:

  • LIDAR extrinsics:

    We use a laser liner to accurately measure the relative location of the LIDAR to the ego frame.

  • Camera extrinsics:

    We place a cube-shaped calibration target in front of the camera and LIDAR sensors. The calibration target consists of three orthogonal planes with known patterns. After detecting the patterns we compute the transformation matrix from camera to LIDAR by aligning the planes of the calibration target. Given the LIDAR to ego frame transformation computed above, we can then compute the camera to ego frame transformation and the resulting extrinsic parameters.

  • RADAR extrinsics

    We mount the radar in a horizontal position. Then we collect radar measurements by driving in an urban environment. After filtering radar returns for moving objects, we calibrate the yaw angle using a brute force approach to minimize the compensated range rates for static objects.

  • Camera intrinsic calibration

    We use a calibration target board with a known set of patterns to infer the intrinsic and distortion parameters of the camera.

Sensor synchronization

In order to achieve good cross-modality data alignment between the LIDAR and the cameras, the exposure of a camera is triggered when the top LIDAR sweeps across the center of the camera’s FOV. The timestamp of the image is the exposure trigger time; and the timestamp of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved. Given that the camera’s exposure time is nearly instantaneous, this method generally yields good data alignment. Note that the cameras run at 12Hz while the LIDAR runs at 20Hz. The 12 camera exposures are spread as evenly as possible across the 20 LIDAR scans, so not all LIDAR scans have a corresponding camera frame. Reducing the frame rate of the cameras to 12Hz helps to reduce the compute, bandwidth and storage requirement of the perception system.

Privacy protection

It is our priority to protect the privacy of third parties. For this purpose we use state-of-the-art object detection techniques to detect license plates and faces. We aim for a high recall and remove false positives that do not overlap with the reprojections of the known person and car boxes. Eventually we use the output of the object detectors to blur faces and license plates in the images of nuScenes.

Data Preview

Label Distribution

Data Annotation

After collecting the driving data, we sample well synchronized keyframes (image, LIDAR, RADAR) at 2Hz and send them to our annotation partner Scale for annotation. Using expert annotators and multiple validation steps, we achieve highly accurate annotations. All objects in the nuScenes dataset come with a semantic category, as well as a a 3D bounding box and attributes for each frame they occur in. Compared to 2D bounding boxes, this allows us to accurately infer an object’s position and orientation in space.

We provide ground truth labels for 23 object classes. For a detailed definition of every class and example images, please see the annotator instructions. For the full nuScenes dataset we provide annotations for the following categories (excl. test set):

For nuScenes-lidarseg, we annotate every point in the lidar pointcloud with a semantic label. In addition to the 23 foreground classes (things) from nuScenes, we have included 9 background classes (stuff). For a detailed definition of every class and example images, please see the annotator instructions for nuScenes and nuScenes-lidarseg. We provide annotations for the following categories (excl. test set):

Category nuScenes cuboids Cuboid ratio Lidarseg points Point ratio
animal 787 0.07% 5,385 0.01%
human.pedestrian.adult 208,240 17.86% 2,156,470 2.73%
human.pedestrian.child 2,066 0.18% 9,655 0.01%
human.pedestrian.construction_worker 9,161 0.79% 139,443 0.18%
human.pedestrian.personal_mobility 395 0.03% 8,723 0.01%
human.pedestrian.police_officer 727 0.06% 9,159 0.01%
human.pedestrian.stroller 1,072 0.09% 8,809 0.01%
human.pedestrian.wheelchair 503 0.04% 12,168 0.02%
movable_object.barrier 152,087 13.04% 9,305,106 11.79%
movable_object.debris 3,016 0.26% 66,861 0.08%
movable_object.pushable_pullable 24,605 2.11% 718,641 0.91%
movable_object.trafficcone 97,959 8.40% 736,239 0.93%
static_object.bicycle_rack * 2,713 0.23% 163,126 0.21%
vehicle.bicycle 11,859 1.02% 141,351 0.18%
vehicle.bus.bendy 1,820 0.16% 357,463 0.45%
vehicle.bus.rigid 14,501 1.24% 4,247,297 5.38%
vehicle.car 493,322 42.30% 38,104,219 48.27%
vehicle.construction 14,671 1.26% 1,514,414 1.92%
vehicle.emergency.ambulance 49 0.00% 2,218 0.00%
vehicle.emergency.police 638 0.05% 59,590 0.08%
vehicle.motorcycle 12,617 1.08% 427,391 0.54%
vehicle.trailer 24,860 2.13% 4,907,511 6.22%
vehicle.truck 88,519 7.59% 15,841,384 20.07%
Total 1,166,187 100.00% 78,942,623 100.00%
flat.driveable_surface - - 316,958,899 28.64%
flat.other - - 8,559,216 0.77%
flat.sidewalk - - 70,197,461 6.34%
flat.terrain - - 70,289,730 6.35%
static.manmade - - 178,178,063 16.10%
static.other - - 817,150 0.07%
static.vegetation - - 122,581,273 11.08%
vehicle.ego - - 337,070,621 30.46%
noise - - 2,061,156 0.19%
Total - - 1,106,713,569 100.00%

* Note that the static_object.bicycle_rack category can include bicycles that are not annotated individually. We use it to ignore large groups of shared bicycles during training to avoid biasing our object detector towards these less interesting bicycles.

Furthermore certain classes in nuScenes have special attributes:

Attribute Annotations
vehicle.moving 149,203
vehicle.stopped 65,975
vehicle.parked 420,226
cycle.with_rider 7,331
cycle.without_rider 17,345
pedestrian.sitting_lying_down 13,939
pedestrian.standing 46,530
pedestrian.moving 157,444
Total 877,993

Data Format

This document describes the database schema used in nuScenes. All annotations and meta data (including calibration, maps, vehicle coordinates etc.) are covered in a relational database. The database tables are listed below. Every row can be identified by its unique primary key token. Foreign keys such as sample_token may be used to link to the token of the table sample. Please refer to the tutorial for an introduction to the most important database tables.

img

attribute

An attribute is a property of an instance that can change while the category remains the same. Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.

attribute {
   "token":                   <str> -- Unique record identifier.
   "name":                    <str> -- Attribute name.
   "description":             <str> -- Attribute description.
}

calibrated_sensor

Definition of a particular sensor (lidar/radar/camera) as calibrated on a particular vehicle. All extrinsic parameters are given with respect to the ego vehicle body frame. All camera images come undistorted and rectified.

calibrated_sensor {
   "token":                   <str> -- Unique record identifier.
   "sensor_token":            <str> -- Foreign key pointing to the sensor type.
   "translation":             <float> [3] -- Coordinate system origin in meters: x, y, z.
   "rotation":
              <float> [4] -- Coordinate system orientation as quaternion: w, x, y, z.
   "camera_intrinsic":
       <float> [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
}

category

Taxonomy of object categories (e.g. vehicle, human). Subcategories are delineated by a period (e.g. human.pedestrian.adult).

category {
   "token":                   <str> -- Unique record identifier.
   "name":                    <str> -- Category name. Subcategories indicated by period.
   "description":             <str> -- Category description.
   "index":                   <int> -- The index of the label
used for efficiency reasons in the .bin label files of nuScenes-lidarseg. This field did not
exist previously.
}

ego_pose

Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map. The ego_pose is the output of a lidar map-based localization algorithm described in our paper. The localization is 2-dimensional in the x-y plane.

ego_pose {
   "token":                   <str> -- Unique record identifier.
   "translation":
            <float> [3] -- Coordinate system origin in meters: x, y, z. Note that z is always
0.
   "rotation":                <float> [4] -- Coordinate system orientation as quaternion:
w, x, y, z.
   "timestamp":               <int> -- Unix time stamp.
}

instance

An object instance, e.g. particular vehicle. This table is an enumeration of all object instances we observed. Note that instances are not tracked across scenes.

instance {
   "token":                   <str> -- Unique record identifier.
   "category_token":          <str> -- Foreign key pointing to the object category.
   "nbr_annotations":         <int> -- Number of annotations of this instance.
   "first_annotation_token":
 <str> -- Foreign key. Points to the first annotation of this instance.
   "last_annotation_token":
  <str> -- Foreign key. Points to the last annotation of this instance.
}

lidarseg

Mapping between nuScenes-lidarseg annotations and sample_datas corresponding to the lidar pointcloud associated with a keyframe.

lidarseg {
   "token":                   <str> -- Unique record identifier.
   "filename":                <str> -- The name of the .bin files containing the nuScenes-lidarseg
labels. These are numpy arrays of uint8 stored in binary format using numpy.
   "sample_data_token":
      <str> -- Foreign key. Sample_data corresponding to the annotated lidar pointcloud with
is_key_frame=True.
}

log

Information about the log from which the data was extracted.

log {
   "token":                   <str> -- Unique record identifier.
   "logfile":                 <str> -- Log file name.
   "vehicle":                 <str> -- Vehicle name.
   "date_captured":           <str> -- Date (YYYY-MM-DD).
   "location":                <str> -- Area where log was captured, e.g. singapore-onenorth.
}

map

Map data that is stored as binary semantic masks from a top-down view.

map {
   "token":                   <str> -- Unique record identifier.
   "log_tokens":              <str> [n] -- Foreign keys.
   "category":                <str> -- Map category, currently only semantic_prior
for drivable surface and sidewalk.
   "filename":                <str> -- Relative path to the file with the map mask.
}

sample

A sample is an annotated keyframe at 2 Hz. The data is collected at (approximately) the same timestamp as part of a single LIDAR sweep.

sample {
   "token":                   <str> -- Unique record identifier.
   "timestamp":               <int> -- Unix time stamp.
   "scene_token":             <str> -- Foreign key pointing to the scene.
   "next":
                   <str> -- Foreign key. Sample that follows this in time. Empty if end of
scene.
   "prev":                    <str> -- Foreign key. Sample that precedes this in time.
Empty if start of scene.
}

sample_annotation

A bounding box defining the position of an object seen in a sample. All location data is given with respect to the global coordinate system.

sample_annotation {
   "token":                   <str> -- Unique record identifier.
   "sample_token":
           <str> -- Foreign key. NOTE: this points to a sample NOT a sample_data since annotations
are done on the sample level taking all relevant sample_data into account.
   "instance_token":
         <str> -- Foreign key. Which object instance is this annotating. An instance can have
multiple annotations over time.
   "attribute_tokens":        <str> [n] -- Foreign keys. List
of attributes for this annotation. Attributes can change over time, so they belong here, not
in the instance table.
   "visibility_token":        <str> -- Foreign key. Visibility may
also change over time. If no visibility is annotated, the token is an empty string.
   "translation":
            <float> [3] -- Bounding box location in meters as center_x, center_y, center_z.
   "size":                    <float> [3] -- Bounding box size in meters as width, length, height.
   "rotation":                <float> [4] -- Bounding box orientation as quaternion: w, x, y, z.
   "num_lidar_pts":           <int> -- Number of lidar points in this box. Points are counted
during the lidar sweep identified with this sample.
   "num_radar_pts":           <int> --
Number of radar points in this box. Points are counted during the radar sweep identified with
this sample. This number is summed across all radar sensors without any invalid point filtering.
   "next":                    <str> -- Foreign key. Sample annotation from the same object
instance that follows this in time. Empty if this is the last annotation for this object.
   "prev":                    <str> -- Foreign key. Sample annotation from the same object
instance that precedes this in time. Empty if this is the first annotation for this object.
}

sample_data

A sensor data e.g. image, point cloud or radar return. For sample_data with is_key_frame=True, the time-stamps should be very close to the sample it points to. For non key-frames the sample_data points to the sample that follows closest in time.

sample_data {
   "token":                   <str> -- Unique record identifier.
   "sample_token":            <str> --
Foreign key. Sample to which this sample_data is associated.
   "ego_pose_token":          <str> -- Foreign key.
   "calibrated_sensor_token": <str> -- Foreign key.
   "filename":                <str> -- Relative path to data-blob on disk.
   "fileformat":              <str> -- Data file format.
   "width":
 <int> -- If the sample data is an image, this is the image width in pixels.
   "height":
                 <int> -- If the sample data is an image, this is the image height in pixels.
   "timestamp":               <int> -- Unix time stamp.
   "is_key_frame":            <bool> -- True if sample_data is part of key_frame, else False.
   "next":                    <str> -- Foreign key. Sample data from the same sensor that follows
this in time. Empty if end of scene.
   "prev":                    <str> -- Foreign key. Sample
data from the same sensor that precedes this in time. Empty if start of scene.
}

scene

A scene is a 20s long sequence of consecutive frames extracted from a log. Multiple scenes can come from the same log. Note that object identities (instance tokens) are not preserved across scenes.

scene {
   "token":                   <str> -- Unique record identifier.
   "name":                    <str> -- Short string identifier.
   "description":             <str> -- Longer description of the scene.
   "log_token":               <str> -- Foreign key. Points to log from where the data was extracted.
   "nbr_samples":             <int> -- Number of samples in this scene.
   "first_sample_token":      <str> -- Foreign key. Points to the first sample in scene.
   "last_sample_token":       <str> -- Foreign key. Points to the last sample in scene.
}

sensor

A specific sensor type.

sensor {
   "token":                   <str> -- Unique record identifier.
   "channel":                 <str> -- Sensor channel name.
   "modality":                <str> {camera, lidar, radar} -- Sensor modality. Supports category(ies)
in brackets.
}

visibility

The visibility of an instance is the fraction of annotation visible in all 6 images. Binned into 4 bins 0-40%, 40-60%, 60-80% and 80-100%.

visibility {
   "token":                   <str> -- Unique record identifier.
   "level":                   <str> -- Visibility level.
   "description":             <str> -- Description of visibility level.
}

Tutorials

We provide a number of tutorials for nuScenes as interactive Jupyter Notebooks in the devkit. The tutorials are shown here as static pages for users that do not want to download the dataset. These tutorials cover the basic usage of nuScenes, nuScenes-lidarseg, the map and CAN bus expansions, as well as the prediction challenge. Use the dropdown menu below to select the tutorial you want to view. Alternatively, you can run the tutorials interactively on Colab: Open InColab

Lidarseg

In the first nuScenes release, bounding boxes or cuboids are used to represent 3D objects. While useful in many cases, cuboids lack the ability to capture fine shape details of articulated objects. nuScenes-lidarseg, which stands for lidar semantic segmentation, has higher levels of granularity by containing annotations for every single lidar point in the 40,000 keyframes of the nuScenes dataset with a semantic label – an astonishing 1,400,000,000 lidar points annotated with one of 32 labels. In addition to the 23 foreground classes (things) from nuScenes, we have included 9 background classes (stuff). For a detailed definition of every class and example images, please see the annotator instructions for nuScenes and nuScenes-lidarseg.

The taxonomy of nuScenes-lidarseg is compatible with the rest of nuScenes and nuImages, thus enabling a wide range of research across multiple sensor modalities. This is a major step forward for industry and academia alike, as it allows researchers to study and quantify novel problems such as lidar point cloud segmentation, foreground extraction, sensor calibration and mapping using point-level semantics. In the future, we plan to organize various public challenges around these tasks.

nuScenes-lidarseg is standing on the shoulders of giants. The academic SemanticKITTI dataset annotates the famous KITTI dataset with lidar segmentation labels for 28 classes. KITTI primarily consists of suburban streets with low traffic density and less challenging traffic situations. Its annotations only cover the front camera, rather than the entire 360 degree view. Furthermore it does not contain radar and is strictly for non-commercial use. nuScenes set out to improve on these aspects, featuring dense data from urban and suburban scenes in Singapore and Boston. It is a multimodal dataset that covers the entire 360 degree view and can be used by commercial entities. Following the initial announcement of nuScenes-lidarseg in October 2019, we have seen a number of other lidar segmentation datasets emerge, such as Hesai's Pandaset and we are looking forward to more companies sharing their data with the community.

Just like nuScenes, the nuScenes-lidarseg annotations are available as free to use strictly for non-commercial purposes. Non-commercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Examples of non-commercial use include but are not limited to personal use, educational use, such as in schools, academies, universities etc., and some research use. If you intend to use the nuScenes dataset for commercial purposes, we encourage you to contact us for commercial licensing options by sending an email to nuScenes@motional.com.

We hope that this dataset will allow researchers across the world to go even further in the quest to develop safe autonomous driving technology.

chart

Citation

Please use the following citation when referencing the dataset:

@ARTICLE{nuscenes2019,
  title={nuScenes: A multimodal dataset for autonomous driving},
  author={Holger Caesar and Varun Bankiti and Alex H. Lang and Sourabh Vora and
          Venice Erin Liong and Qiang Xu and Anush Krishnan and Yu Pan and
          Giancarlo Baldan and Oscar Beijbom},
  journal={arXiv preprint arXiv:1903.11027},
  year={2019}
}

License

CC BY-NC-SA 4.0

数据概要
数据格式
Point Cloud, Image,
数据量
1400K
文件大小
547.98GB
发布方
Motional
We’re making driverless vehicles a safe, reliable, and accessible reality.
标注方
Scale AI, Inc
Trusted by world class companies, Scale delivers high quality training data for AI applications such as self-driving cars, mapping, AR/VR, robotics, and more
数据集反馈
| 1445 | 数据量 1400K | 大小 547.98GB
nuScenes
3D Semantic Segmentation 3D Box Tracking
Urban | Autonomous Driving
许可协议: CC BY-NC-SA 4.0

Overview

The nuScenes dataset is a large-scale autonomous driving dataset with 3d object annotations. It features:

● Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS)

● 1000 scenes of 20s each

● 1,400,000 camera images

● 390,000 lidar sweeps

● Two diverse cities: Boston and Singapore

● Left versus right hand traffic

● Detailed map information

● 1.4M 3D bounding boxes manually annotated for 23 object classes

● Attributes such as visibility, activity and pose

New: 1.1B lidar points manually annotated for 32 classes

New: Explore nuScenes on SiaSearch

● Free to use for non-commercial use

● For a commercial license contact nuScenes@motional.com

Data Collection

Scene planning

For the nuScenes dataset we collect approximately 15h of driving data in Boston and Singapore. For the full nuScenes dataset, we publish data from Boston Seaport and Singapore’s One North, Queenstown and Holland Village districts. Driving routes are carefully chosen to capture challenging scenarios. We aim for a diverse set of locations, times and weather conditions. To balance the class frequency distribution, we include more scenes with rare classes (such as bicycles). Using these criteria, we manually select 1000 scenes of 20s duration each. These scenes are carefully annotated using human experts. The annotator instructions can be found in the devkit repository.

Car setup

We use two Renault Zoe cars with an identical sensor layout to drive in Boston and Singapore. The data was gathered from a research platform and is not indicative of the setup used in Motional products. Please refer to the above figure for the placement of the sensors. We release data from the following sensors:

data
  • 1x spinning LIDAR:
    • 20Hz capture frequency
    • 32 channels
    • 360° Horizontal FOV, +10° to -30° Vertical FOV
    • 80m-100m Range, Usable returns up to 70 meters, ± 2 cm accuracy
    • Up to ~1.39 Million Points per Second
  • 5x long range RADAR sensor:
    • 77GHz
    • 13Hz capture frequency
    • Independently measures distance and velocity in one cycle using Frequency Modulated Continuous Wave
    • Up to 250m distance
    • Velocity accuracy of ±0.1 km/h
  • 6x camera:
    • 12Hz capture frequency
    • 1/1.8'' CMOS sensor of 1600x1200 resolution
    • Bayer8 format for 1 byte per pixel encoding
    • 1600x900 ROI is cropped from the original resolution to reduce processing and transmission bandwidth
    • Auto exposure with exposure time limited to the maximum of 20 ms
    • Images are unpacked to BGR format and compressed to JPEG
    • See camera orientation and overlap in the figure below.
camera

Sensor calibration

To achieve a high quality multi-sensor dataset, it is essential to calibrate the extrinsics and intrinsics of every sensor. We express extrinsic coordinates relative to the ego frame, i.e. the midpoint of the rear vehicle axle. The most relevant steps are described below:

  • LIDAR extrinsics:

    We use a laser liner to accurately measure the relative location of the LIDAR to the ego frame.

  • Camera extrinsics:

    We place a cube-shaped calibration target in front of the camera and LIDAR sensors. The calibration target consists of three orthogonal planes with known patterns. After detecting the patterns we compute the transformation matrix from camera to LIDAR by aligning the planes of the calibration target. Given the LIDAR to ego frame transformation computed above, we can then compute the camera to ego frame transformation and the resulting extrinsic parameters.

  • RADAR extrinsics

    We mount the radar in a horizontal position. Then we collect radar measurements by driving in an urban environment. After filtering radar returns for moving objects, we calibrate the yaw angle using a brute force approach to minimize the compensated range rates for static objects.

  • Camera intrinsic calibration

    We use a calibration target board with a known set of patterns to infer the intrinsic and distortion parameters of the camera.

Sensor synchronization

In order to achieve good cross-modality data alignment between the LIDAR and the cameras, the exposure of a camera is triggered when the top LIDAR sweeps across the center of the camera’s FOV. The timestamp of the image is the exposure trigger time; and the timestamp of the LIDAR scan is the time when the full rotation of the current LIDAR frame is achieved. Given that the camera’s exposure time is nearly instantaneous, this method generally yields good data alignment. Note that the cameras run at 12Hz while the LIDAR runs at 20Hz. The 12 camera exposures are spread as evenly as possible across the 20 LIDAR scans, so not all LIDAR scans have a corresponding camera frame. Reducing the frame rate of the cameras to 12Hz helps to reduce the compute, bandwidth and storage requirement of the perception system.

Privacy protection

It is our priority to protect the privacy of third parties. For this purpose we use state-of-the-art object detection techniques to detect license plates and faces. We aim for a high recall and remove false positives that do not overlap with the reprojections of the known person and car boxes. Eventually we use the output of the object detectors to blur faces and license plates in the images of nuScenes.

Data Preview

Label Distribution

Data Annotation

After collecting the driving data, we sample well synchronized keyframes (image, LIDAR, RADAR) at 2Hz and send them to our annotation partner Scale for annotation. Using expert annotators and multiple validation steps, we achieve highly accurate annotations. All objects in the nuScenes dataset come with a semantic category, as well as a a 3D bounding box and attributes for each frame they occur in. Compared to 2D bounding boxes, this allows us to accurately infer an object’s position and orientation in space.

We provide ground truth labels for 23 object classes. For a detailed definition of every class and example images, please see the annotator instructions. For the full nuScenes dataset we provide annotations for the following categories (excl. test set):

For nuScenes-lidarseg, we annotate every point in the lidar pointcloud with a semantic label. In addition to the 23 foreground classes (things) from nuScenes, we have included 9 background classes (stuff). For a detailed definition of every class and example images, please see the annotator instructions for nuScenes and nuScenes-lidarseg. We provide annotations for the following categories (excl. test set):

Category nuScenes cuboids Cuboid ratio Lidarseg points Point ratio
animal 787 0.07% 5,385 0.01%
human.pedestrian.adult 208,240 17.86% 2,156,470 2.73%
human.pedestrian.child 2,066 0.18% 9,655 0.01%
human.pedestrian.construction_worker 9,161 0.79% 139,443 0.18%
human.pedestrian.personal_mobility 395 0.03% 8,723 0.01%
human.pedestrian.police_officer 727 0.06% 9,159 0.01%
human.pedestrian.stroller 1,072 0.09% 8,809 0.01%
human.pedestrian.wheelchair 503 0.04% 12,168 0.02%
movable_object.barrier 152,087 13.04% 9,305,106 11.79%
movable_object.debris 3,016 0.26% 66,861 0.08%
movable_object.pushable_pullable 24,605 2.11% 718,641 0.91%
movable_object.trafficcone 97,959 8.40% 736,239 0.93%
static_object.bicycle_rack * 2,713 0.23% 163,126 0.21%
vehicle.bicycle 11,859 1.02% 141,351 0.18%
vehicle.bus.bendy 1,820 0.16% 357,463 0.45%
vehicle.bus.rigid 14,501 1.24% 4,247,297 5.38%
vehicle.car 493,322 42.30% 38,104,219 48.27%
vehicle.construction 14,671 1.26% 1,514,414 1.92%
vehicle.emergency.ambulance 49 0.00% 2,218 0.00%
vehicle.emergency.police 638 0.05% 59,590 0.08%
vehicle.motorcycle 12,617 1.08% 427,391 0.54%
vehicle.trailer 24,860 2.13% 4,907,511 6.22%
vehicle.truck 88,519 7.59% 15,841,384 20.07%
Total 1,166,187 100.00% 78,942,623 100.00%
flat.driveable_surface - - 316,958,899 28.64%
flat.other - - 8,559,216 0.77%
flat.sidewalk - - 70,197,461 6.34%
flat.terrain - - 70,289,730 6.35%
static.manmade - - 178,178,063 16.10%
static.other - - 817,150 0.07%
static.vegetation - - 122,581,273 11.08%
vehicle.ego - - 337,070,621 30.46%
noise - - 2,061,156 0.19%
Total - - 1,106,713,569 100.00%

* Note that the static_object.bicycle_rack category can include bicycles that are not annotated individually. We use it to ignore large groups of shared bicycles during training to avoid biasing our object detector towards these less interesting bicycles.

Furthermore certain classes in nuScenes have special attributes:

Attribute Annotations
vehicle.moving 149,203
vehicle.stopped 65,975
vehicle.parked 420,226
cycle.with_rider 7,331
cycle.without_rider 17,345
pedestrian.sitting_lying_down 13,939
pedestrian.standing 46,530
pedestrian.moving 157,444
Total 877,993

Data Format

This document describes the database schema used in nuScenes. All annotations and meta data (including calibration, maps, vehicle coordinates etc.) are covered in a relational database. The database tables are listed below. Every row can be identified by its unique primary key token. Foreign keys such as sample_token may be used to link to the token of the table sample. Please refer to the tutorial for an introduction to the most important database tables.

img

attribute

An attribute is a property of an instance that can change while the category remains the same. Example: a vehicle being parked/stopped/moving, and whether or not a bicycle has a rider.

attribute {
   "token":                   <str> -- Unique record identifier.
   "name":                    <str> -- Attribute name.
   "description":             <str> -- Attribute description.
}

calibrated_sensor

Definition of a particular sensor (lidar/radar/camera) as calibrated on a particular vehicle. All extrinsic parameters are given with respect to the ego vehicle body frame. All camera images come undistorted and rectified.

calibrated_sensor {
   "token":                   <str> -- Unique record identifier.
   "sensor_token":            <str> -- Foreign key pointing to the sensor type.
   "translation":             <float> [3] -- Coordinate system origin in meters: x, y, z.
   "rotation":
              <float> [4] -- Coordinate system orientation as quaternion: w, x, y, z.
   "camera_intrinsic":
       <float> [3, 3] -- Intrinsic camera calibration. Empty for sensors that are not cameras.
}

category

Taxonomy of object categories (e.g. vehicle, human). Subcategories are delineated by a period (e.g. human.pedestrian.adult).

category {
   "token":                   <str> -- Unique record identifier.
   "name":                    <str> -- Category name. Subcategories indicated by period.
   "description":             <str> -- Category description.
   "index":                   <int> -- The index of the label
used for efficiency reasons in the .bin label files of nuScenes-lidarseg. This field did not
exist previously.
}

ego_pose

Ego vehicle pose at a particular timestamp. Given with respect to global coordinate system of the log's map. The ego_pose is the output of a lidar map-based localization algorithm described in our paper. The localization is 2-dimensional in the x-y plane.

ego_pose {
   "token":                   <str> -- Unique record identifier.
   "translation":
            <float> [3] -- Coordinate system origin in meters: x, y, z. Note that z is always
0.
   "rotation":                <float> [4] -- Coordinate system orientation as quaternion:
w, x, y, z.
   "timestamp":               <int> -- Unix time stamp.
}

instance

An object instance, e.g. particular vehicle. This table is an enumeration of all object instances we observed. Note that instances are not tracked across scenes.

instance {
   "token":                   <str> -- Unique record identifier.
   "category_token":          <str> -- Foreign key pointing to the object category.
   "nbr_annotations":         <int> -- Number of annotations of this instance.
   "first_annotation_token":
 <str> -- Foreign key. Points to the first annotation of this instance.
   "last_annotation_token":
  <str> -- Foreign key. Points to the last annotation of this instance.
}

lidarseg

Mapping between nuScenes-lidarseg annotations and sample_datas corresponding to the lidar pointcloud associated with a keyframe.

lidarseg {
   "token":                   <str> -- Unique record identifier.
   "filename":                <str> -- The name of the .bin files containing the nuScenes-lidarseg
labels. These are numpy arrays of uint8 stored in binary format using numpy.
   "sample_data_token":
      <str> -- Foreign key. Sample_data corresponding to the annotated lidar pointcloud with
is_key_frame=True.
}

log

Information about the log from which the data was extracted.

log {
   "token":                   <str> -- Unique record identifier.
   "logfile":                 <str> -- Log file name.
   "vehicle":                 <str> -- Vehicle name.
   "date_captured":           <str> -- Date (YYYY-MM-DD).
   "location":                <str> -- Area where log was captured, e.g. singapore-onenorth.
}

map

Map data that is stored as binary semantic masks from a top-down view.

map {
   "token":                   <str> -- Unique record identifier.
   "log_tokens":              <str> [n] -- Foreign keys.
   "category":                <str> -- Map category, currently only semantic_prior
for drivable surface and sidewalk.
   "filename":                <str> -- Relative path to the file with the map mask.
}

sample

A sample is an annotated keyframe at 2 Hz. The data is collected at (approximately) the same timestamp as part of a single LIDAR sweep.

sample {
   "token":                   <str> -- Unique record identifier.
   "timestamp":               <int> -- Unix time stamp.
   "scene_token":             <str> -- Foreign key pointing to the scene.
   "next":
                   <str> -- Foreign key. Sample that follows this in time. Empty if end of
scene.
   "prev":                    <str> -- Foreign key. Sample that precedes this in time.
Empty if start of scene.
}

sample_annotation

A bounding box defining the position of an object seen in a sample. All location data is given with respect to the global coordinate system.

sample_annotation {
   "token":                   <str> -- Unique record identifier.
   "sample_token":
           <str> -- Foreign key. NOTE: this points to a sample NOT a sample_data since annotations
are done on the sample level taking all relevant sample_data into account.
   "instance_token":
         <str> -- Foreign key. Which object instance is this annotating. An instance can have
multiple annotations over time.
   "attribute_tokens":        <str> [n] -- Foreign keys. List
of attributes for this annotation. Attributes can change over time, so they belong here, not
in the instance table.
   "visibility_token":        <str> -- Foreign key. Visibility may
also change over time. If no visibility is annotated, the token is an empty string.
   "translation":
            <float> [3] -- Bounding box location in meters as center_x, center_y, center_z.
   "size":                    <float> [3] -- Bounding box size in meters as width, length, height.
   "rotation":                <float> [4] -- Bounding box orientation as quaternion: w, x, y, z.
   "num_lidar_pts":           <int> -- Number of lidar points in this box. Points are counted
during the lidar sweep identified with this sample.
   "num_radar_pts":           <int> --
Number of radar points in this box. Points are counted during the radar sweep identified with
this sample. This number is summed across all radar sensors without any invalid point filtering.
   "next":                    <str> -- Foreign key. Sample annotation from the same object
instance that follows this in time. Empty if this is the last annotation for this object.
   "prev":                    <str> -- Foreign key. Sample annotation from the same object
instance that precedes this in time. Empty if this is the first annotation for this object.
}

sample_data

A sensor data e.g. image, point cloud or radar return. For sample_data with is_key_frame=True, the time-stamps should be very close to the sample it points to. For non key-frames the sample_data points to the sample that follows closest in time.

sample_data {
   "token":                   <str> -- Unique record identifier.
   "sample_token":            <str> --
Foreign key. Sample to which this sample_data is associated.
   "ego_pose_token":          <str> -- Foreign key.
   "calibrated_sensor_token": <str> -- Foreign key.
   "filename":                <str> -- Relative path to data-blob on disk.
   "fileformat":              <str> -- Data file format.
   "width":
 <int> -- If the sample data is an image, this is the image width in pixels.
   "height":
                 <int> -- If the sample data is an image, this is the image height in pixels.
   "timestamp":               <int> -- Unix time stamp.
   "is_key_frame":            <bool> -- True if sample_data is part of key_frame, else False.
   "next":                    <str> -- Foreign key. Sample data from the same sensor that follows
this in time. Empty if end of scene.
   "prev":                    <str> -- Foreign key. Sample
data from the same sensor that precedes this in time. Empty if start of scene.
}

scene

A scene is a 20s long sequence of consecutive frames extracted from a log. Multiple scenes can come from the same log. Note that object identities (instance tokens) are not preserved across scenes.

scene {
   "token":                   <str> -- Unique record identifier.
   "name":                    <str> -- Short string identifier.
   "description":             <str> -- Longer description of the scene.
   "log_token":               <str> -- Foreign key. Points to log from where the data was extracted.
   "nbr_samples":             <int> -- Number of samples in this scene.
   "first_sample_token":      <str> -- Foreign key. Points to the first sample in scene.
   "last_sample_token":       <str> -- Foreign key. Points to the last sample in scene.
}

sensor

A specific sensor type.

sensor {
   "token":                   <str> -- Unique record identifier.
   "channel":                 <str> -- Sensor channel name.
   "modality":                <str> {camera, lidar, radar} -- Sensor modality. Supports category(ies)
in brackets.
}

visibility

The visibility of an instance is the fraction of annotation visible in all 6 images. Binned into 4 bins 0-40%, 40-60%, 60-80% and 80-100%.

visibility {
   "token":                   <str> -- Unique record identifier.
   "level":                   <str> -- Visibility level.
   "description":             <str> -- Description of visibility level.
}

Tutorials

We provide a number of tutorials for nuScenes as interactive Jupyter Notebooks in the devkit. The tutorials are shown here as static pages for users that do not want to download the dataset. These tutorials cover the basic usage of nuScenes, nuScenes-lidarseg, the map and CAN bus expansions, as well as the prediction challenge. Use the dropdown menu below to select the tutorial you want to view. Alternatively, you can run the tutorials interactively on Colab: Open InColab

Lidarseg

In the first nuScenes release, bounding boxes or cuboids are used to represent 3D objects. While useful in many cases, cuboids lack the ability to capture fine shape details of articulated objects. nuScenes-lidarseg, which stands for lidar semantic segmentation, has higher levels of granularity by containing annotations for every single lidar point in the 40,000 keyframes of the nuScenes dataset with a semantic label – an astonishing 1,400,000,000 lidar points annotated with one of 32 labels. In addition to the 23 foreground classes (things) from nuScenes, we have included 9 background classes (stuff). For a detailed definition of every class and example images, please see the annotator instructions for nuScenes and nuScenes-lidarseg.

The taxonomy of nuScenes-lidarseg is compatible with the rest of nuScenes and nuImages, thus enabling a wide range of research across multiple sensor modalities. This is a major step forward for industry and academia alike, as it allows researchers to study and quantify novel problems such as lidar point cloud segmentation, foreground extraction, sensor calibration and mapping using point-level semantics. In the future, we plan to organize various public challenges around these tasks.

nuScenes-lidarseg is standing on the shoulders of giants. The academic SemanticKITTI dataset annotates the famous KITTI dataset with lidar segmentation labels for 28 classes. KITTI primarily consists of suburban streets with low traffic density and less challenging traffic situations. Its annotations only cover the front camera, rather than the entire 360 degree view. Furthermore it does not contain radar and is strictly for non-commercial use. nuScenes set out to improve on these aspects, featuring dense data from urban and suburban scenes in Singapore and Boston. It is a multimodal dataset that covers the entire 360 degree view and can be used by commercial entities. Following the initial announcement of nuScenes-lidarseg in October 2019, we have seen a number of other lidar segmentation datasets emerge, such as Hesai's Pandaset and we are looking forward to more companies sharing their data with the community.

Just like nuScenes, the nuScenes-lidarseg annotations are available as free to use strictly for non-commercial purposes. Non-commercial means not primarily intended for or directed towards commercial advantage or monetary compensation. Examples of non-commercial use include but are not limited to personal use, educational use, such as in schools, academies, universities etc., and some research use. If you intend to use the nuScenes dataset for commercial purposes, we encourage you to contact us for commercial licensing options by sending an email to nuScenes@motional.com.

We hope that this dataset will allow researchers across the world to go even further in the quest to develop safe autonomous driving technology.

chart

Citation

Please use the following citation when referencing the dataset:

@ARTICLE{nuscenes2019,
  title={nuScenes: A multimodal dataset for autonomous driving},
  author={Holger Caesar and Varun Bankiti and Alex H. Lang and Sourabh Vora and
          Venice Erin Liong and Qiang Xu and Anush Krishnan and Yu Pan and
          Giancarlo Baldan and Oscar Beijbom},
  journal={arXiv preprint arXiv:1903.11027},
  year={2019}
}

License

CC BY-NC-SA 4.0

数据集反馈
1
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号