NEAR
Pose
AR/MR
|...
许可协议: Unknown

Overview

The existing datasets for evaluating Visual Inertial Odometry (VIO) have boosted the research of autonomous agents, but they don’t meet the prosperous research of Augmented Reality (AR) or Mixed Reality (MR) given that they are not collected at real AR scenes and do not account for affecting factors of mobile devices. This paper presents the NEAR dataset, an AR oriented visual-inertial dataset collected with commodity handheld phones with ground truthrts. The dataset has a total of 113 sequences in 49 elaborately designed collection cases at two typical indoor scenes, i.e. the living area and the table area. It also covers plenty of setting adjustments for comparison, including the comparisons of different level textures, illuminations, motion patterns, camera settings and the difference between the rolling shutter and the global shutter.

Data Annotation

To enable evaluation of VIO on the NEAR dataset, we also provide the calibration parameters along with the dataset. Here, we brief the calibration procedure of the intrinsics and extrinsics.

Intrinsic Calibration

The calibration sequences are recorded with the phone camera viewing towards a 12×8 chessboard with a grid size of 50mm×50mm for camera intrinsics. Afterward, the sufficient high-quality images are chosen empirically to feed to MATLAB calibration toolbox with the 4 parametric radial and tangential distortion model. Then the intrinsics of all phone cameras and the MYNT camera are calibrated with respective calibration sequences for once since the autofocus mode is turned off during the complete data collection except for the comparison cases. As for the intrinsics of IMU, we follow to calibrate the variances of noise, bias and random walk of both gyroscope and accelerometer of each phone via Allan Variance Tool5.

Extrinsic Calibration

There are 3 moving coordinate frames but relatively invariant with each other in data collection including the phone camera frame C, the phone IMU frame B and the rig frame R. Related 2 fixed frames are the frame of the ChArUco board6 W and the global frame of motion capture system G. The whole frame system is shown in figure 3. Firstly, IMU-camera extrinsic i.e. Tbc and time offset were calibrated with Kalibr 7 . The camera-rig transformation is also needed since the motion capture system provides G R T, but the VIO can only provide W C T. Fol lowing the identical transformation

截屏2020-09-24下午4.22.34

We design a calibration sequence to get the relationship of transfor mation. First, the 6×8 ChArUco board (square length of 48mm and marker length of 36mm) is set in the test fifield. Then we move the rig slowly with the camera facing towards the board while recording images and the ground truth G R Tj simultaneously. Afterward, the chessboard corners in every camera images are detected and the camera poses W C Ti are computed with the PnP solver. Finally, we form the transformation issue as the least square problem

截屏2020-09-24下午4.22.34

where j(i) is the corresponding index of i aligned on the times tamp. We solve this optimization problem with the alternate iteration method. In every iteration, we have two steps. Firstly, we solve the best G W T with the fifixed R C T from the last iteration or prior identity SE(3) according to Umeyama method [16]. Secondly, we fifix the ‘best’ G W T and solve R C T by a similar approach with the fifirst step. The iteration is terminated until the error converges.

Data Format

The average precision rate and F-score are used for evaluation. A detection is correct only if the intersection of unit (IoU) between the detected bounding box and any of the ground truth box with the same class is larger than 0.33. F-score is calculated as: F-score=2PR/(P+R), where P and R is the precision and recall rate. Notice that F-score is threshold-sensitive, which means you could adjust your score threshold to obtain a better result. Although F-score is not as fair as the mAP criteria but more practical since a threshold should always be given when deploying the model and not all of the algorithms have a score evaluation for the target. Thus, F-score and mAP are both under consideration in the benchmarks.

The evaluation script for mAP and F-score are borrowed from Icdar2015 evaluation scripts with small modification (You may first register an account.). Here, we give the modified evaluation scripts and the ground truth gt.zip file of the test set in evaluation/ directory. You can evaluate your own method by following instructions:

  • run your algorithm and save the detect result for each image named as image_name.txt, where the image_name should be exactly the same as in the gt.zip. You should follow the format of evaluation/gt.zipexcept that the output description of each defect from your algorithm should be: x1,y1,x2,y2,confidence,type , where (x1,y1) and (x2,y2)* is the top left and the bottom right corner of the bounding box of the defect. confidence is a float number to show how confident you believe such detection result. type is a string and should be one of the following: open,short,mousebite,spur,copper,pin-hole. Notice there is no space except the comma.
  • zip your .txt file to res.zip. (You should not contain any sub-directory in the res.zip file)
  • run the evaluation script: python script.py -s=res.zip -g=gt.zip
数据概要
数据格式
IMU, Image,
数据量
553.7K
文件大小
174.15GB
发布方
Shanghai Beidou Research Institute
数据集反馈
| 19 | 数据量 553.7K | 大小 174.15GB
NEAR
Pose
AR/MR
许可协议: Unknown

Overview

The existing datasets for evaluating Visual Inertial Odometry (VIO) have boosted the research of autonomous agents, but they don’t meet the prosperous research of Augmented Reality (AR) or Mixed Reality (MR) given that they are not collected at real AR scenes and do not account for affecting factors of mobile devices. This paper presents the NEAR dataset, an AR oriented visual-inertial dataset collected with commodity handheld phones with ground truthrts. The dataset has a total of 113 sequences in 49 elaborately designed collection cases at two typical indoor scenes, i.e. the living area and the table area. It also covers plenty of setting adjustments for comparison, including the comparisons of different level textures, illuminations, motion patterns, camera settings and the difference between the rolling shutter and the global shutter.

Data Annotation

To enable evaluation of VIO on the NEAR dataset, we also provide the calibration parameters along with the dataset. Here, we brief the calibration procedure of the intrinsics and extrinsics.

Intrinsic Calibration

The calibration sequences are recorded with the phone camera viewing towards a 12×8 chessboard with a grid size of 50mm×50mm for camera intrinsics. Afterward, the sufficient high-quality images are chosen empirically to feed to MATLAB calibration toolbox with the 4 parametric radial and tangential distortion model. Then the intrinsics of all phone cameras and the MYNT camera are calibrated with respective calibration sequences for once since the autofocus mode is turned off during the complete data collection except for the comparison cases. As for the intrinsics of IMU, we follow to calibrate the variances of noise, bias and random walk of both gyroscope and accelerometer of each phone via Allan Variance Tool5.

Extrinsic Calibration

There are 3 moving coordinate frames but relatively invariant with each other in data collection including the phone camera frame C, the phone IMU frame B and the rig frame R. Related 2 fixed frames are the frame of the ChArUco board6 W and the global frame of motion capture system G. The whole frame system is shown in figure 3. Firstly, IMU-camera extrinsic i.e. Tbc and time offset were calibrated with Kalibr 7 . The camera-rig transformation is also needed since the motion capture system provides G R T, but the VIO can only provide W C T. Fol lowing the identical transformation

截屏2020-09-24下午4.22.34

We design a calibration sequence to get the relationship of transfor mation. First, the 6×8 ChArUco board (square length of 48mm and marker length of 36mm) is set in the test fifield. Then we move the rig slowly with the camera facing towards the board while recording images and the ground truth G R Tj simultaneously. Afterward, the chessboard corners in every camera images are detected and the camera poses W C Ti are computed with the PnP solver. Finally, we form the transformation issue as the least square problem

截屏2020-09-24下午4.22.34

where j(i) is the corresponding index of i aligned on the times tamp. We solve this optimization problem with the alternate iteration method. In every iteration, we have two steps. Firstly, we solve the best G W T with the fifixed R C T from the last iteration or prior identity SE(3) according to Umeyama method [16]. Secondly, we fifix the ‘best’ G W T and solve R C T by a similar approach with the fifirst step. The iteration is terminated until the error converges.

Data Format

The average precision rate and F-score are used for evaluation. A detection is correct only if the intersection of unit (IoU) between the detected bounding box and any of the ground truth box with the same class is larger than 0.33. F-score is calculated as: F-score=2PR/(P+R), where P and R is the precision and recall rate. Notice that F-score is threshold-sensitive, which means you could adjust your score threshold to obtain a better result. Although F-score is not as fair as the mAP criteria but more practical since a threshold should always be given when deploying the model and not all of the algorithms have a score evaluation for the target. Thus, F-score and mAP are both under consideration in the benchmarks.

The evaluation script for mAP and F-score are borrowed from Icdar2015 evaluation scripts with small modification (You may first register an account.). Here, we give the modified evaluation scripts and the ground truth gt.zip file of the test set in evaluation/ directory. You can evaluate your own method by following instructions:

  • run your algorithm and save the detect result for each image named as image_name.txt, where the image_name should be exactly the same as in the gt.zip. You should follow the format of evaluation/gt.zipexcept that the output description of each defect from your algorithm should be: x1,y1,x2,y2,confidence,type , where (x1,y1) and (x2,y2)* is the top left and the bottom right corner of the bounding box of the defect. confidence is a float number to show how confident you believe such detection result. type is a string and should be one of the following: open,short,mousebite,spur,copper,pin-hole. Notice there is no space except the comma.
  • zip your .txt file to res.zip. (You should not contain any sub-directory in the res.zip file)
  • run the evaluation script: python script.py -s=res.zip -g=gt.zip
数据集反馈
0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号