ETH3D Two-view Stereo
3D Keypoints
Stereo Matching
许可协议: CC BY-NC-SA 4.0


ETH3D two-view dataset contains 27 training and 20 test frames for low-resolution two-view stereo on frames of the multi-camera rig.

Data Format

Format of two-view data

The two-view datasets provide stereo-rectified image pairs, i.e., for a given pixel in one image the corresponding epipolar line in the other image is the image row having the same y-coordinate as the pixel. These datasets, come with the cameras.txt and images.txt files specifying the intrinsic and extrinsic camera parameters of the images. See above for a description of their format. In the two-view case all images are pre-undistorted, so their camera model is PINHOLE. This model is defined in the camera models section. We do not provide keypoint matches and triangulated keypoints for this type of data.

Furthermore, the two-view datasets also come with a file calib.txt which is formatted according to the Middlebury data format - version 3. Note that those files do not provide any information about the disparity range: the corresponding field is set to the image width.

Training data

The ground truth follows the same format as the Middlebury stereo benchmark, version 3. The ground truth disparity for the left image is provided as a file disp0GT.pfm in the PFM format using little endian data. Therefore, the ASCII header may look as follows, for example:

752 480

The first line is always "Pf", indicating a grayscale PFM image. The second line specifies the width and height of the image. The third line is always "-1", indicating the use of little endian. After this header (where each line is followed by a newline character), the ground truth disparity image follows in row-major binary form as 4-byte floats. The rows are ordered from bottom to top. Positive infinity is used for invalid values.

The occlusion mask for the left image is given as a file "mask0nocc.png". Pixels without ground truth have the color (0, 0, 0). Pixels which are only observed by the left image have the color (128, 128, 128). Pixels with are observed by both images have the color (255, 255, 255). For the "non-occluded" evaluation, the evaluation is limited to the pixels observed by both images.


Please use the following citation when referencing the dataset:

  author = {Thomas Sch\"ops and Johannes L. Sch\"onberger and Silvano Galliani and Torsten
Sattler and Konrad Schindler and Marc Pollefeys and Andreas Geiger},
  title = {A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2017}



ETH Zurich
ETH Zurich is a public research university in the city of Zürich, Switzerland