graviti
产品服务
解决方案
知识库
公开数据集
关于我们
avatar
DeepSat (SAT-6) Airborne Dataset
2D Classification
Aesthetics
|...
许可协议: CC-BY-SA 4.0

Overview

DeepSat SAT-6

Sample images


Originally, images were extracted from the National Agriculture Imagery Program (NAIP) dataset. The NAIP dataset consists of a total of 330,000 scenes spanning the whole of the Continental United States (CONUS). The authors used the uncompressed digital Ortho quarter quad tiles (DOQQs) which are GeoTIFF images and the area corresponds to the United States Geological Survey (USGS) topographic quadrangles. The average image tiles are ~6000 pixels in width and ~7000 pixels in height, measuring around 200 megabytes each. The entire NAIP dataset for CONUS is ~65 terabytes. The imagery is acquired at a 1-m ground sample distance (GSD) with a horizontal accuracy that lies within six meters of photo-identifiable ground control points.

The images consist of 4 bands - red, green, blue and Near Infrared (NIR). In order to maintain the high variance inherent in the entire NAIP dataset, we sample image patches from a multitude of scenes (a total of 1500 image tiles) covering different landscapes like rural areas, urban areas, densely forested, mountainous terrain, small to large water bodies, agricultural areas, etc. covering the whole state of California. An image labeling tool developed as part of this study was used to manually label uniform image patches belonging to a particular landcover class.

Once labeled, 28x28 non-overlapping sliding window blocks were extracted from the uniform image patch and saved to the dataset with the corresponding label. We chose 28x28 as the window size to maintain a significantly bigger context, and at the same time not to make it as big as to drop the relative statistical properties of the target class conditional distributions within the contextual window. Care was taken to avoid interclass overlaps within a selected and labeled image patch.

Content

  • Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared.

  • The training and test labels are one-hot encoded 1x6 vectors

  • The six classes represent the six broad land covers which include barren land, trees, grassland, roads, buildings and water bodies.

  • Training and test datasets belong to disjoint set of image tiles.

  • Each image patch is size normalized to 28x28 pixels.

  • Once generated, both the training and testing datasets were randomized using a pseudo-random number generator.

CSV files

  • X_train_sat6.csv: 324,000 training images, 28x28 images each with 4 channels
  • y_train_sat6.csv: 324,000 training labels, 1x6 one-hot encoded vectors
  • X_test_sat6.csv: 81,000 training images, 28x28 images each with 4 channels
  • y_test_sat6.csv: 81,000 training labels, 1x6 one-hot encoded vectors

The original MAT file

  • train_x: 28x28x6x324000 uint8 (containing 400000 training samples of 28x28 images each with 4 channels)
  • train_y: 324,000x6 uint8 (containing 6x1 vectors having labels for the 400000 training samples)
  • test_x: 28x28x6x18000 uint8 (containing 100000 test samples of 28x28 images each with 4 channels)
  • test_y: 81,000x6 uint8 (containing 6x1 vectors having labels for the 100000 test samples)

Acknowledgements

The original MATLAB file was converted to multiple CSV files

The original SAT-4 and SAT-6 airborne datasets can be found here:

http://csc.lsu.edu/~saikat/deepsat/

Thanks to:

Saikat Basu, Robert DiBiano, Manohar Karki and Supratik Mukhopadhyay, Louisiana State University
Sangram Ganguly, Bay Area Environmental Research Institute/NASA Ames Research Center
Ramakrishna R. Nemani, NASA Advanced Supercomputing Division, NASA Ames Research Center

数据概要
数据格式
image,
数据量
6
文件大小
309.71MB
发布方
Chris Crawford
| 数据量 6 | 大小 309.71MB
DeepSat (SAT-6) Airborne Dataset
2D Classification
Aesthetics
许可协议: CC-BY-SA 4.0

Overview

DeepSat SAT-6

Sample images


Originally, images were extracted from the National Agriculture Imagery Program (NAIP) dataset. The NAIP dataset consists of a total of 330,000 scenes spanning the whole of the Continental United States (CONUS). The authors used the uncompressed digital Ortho quarter quad tiles (DOQQs) which are GeoTIFF images and the area corresponds to the United States Geological Survey (USGS) topographic quadrangles. The average image tiles are ~6000 pixels in width and ~7000 pixels in height, measuring around 200 megabytes each. The entire NAIP dataset for CONUS is ~65 terabytes. The imagery is acquired at a 1-m ground sample distance (GSD) with a horizontal accuracy that lies within six meters of photo-identifiable ground control points.

The images consist of 4 bands - red, green, blue and Near Infrared (NIR). In order to maintain the high variance inherent in the entire NAIP dataset, we sample image patches from a multitude of scenes (a total of 1500 image tiles) covering different landscapes like rural areas, urban areas, densely forested, mountainous terrain, small to large water bodies, agricultural areas, etc. covering the whole state of California. An image labeling tool developed as part of this study was used to manually label uniform image patches belonging to a particular landcover class.

Once labeled, 28x28 non-overlapping sliding window blocks were extracted from the uniform image patch and saved to the dataset with the corresponding label. We chose 28x28 as the window size to maintain a significantly bigger context, and at the same time not to make it as big as to drop the relative statistical properties of the target class conditional distributions within the contextual window. Care was taken to avoid interclass overlaps within a selected and labeled image patch.

Content

  • Each sample image is 28x28 pixels and consists of 4 bands - red, green, blue and near infrared.

  • The training and test labels are one-hot encoded 1x6 vectors

  • The six classes represent the six broad land covers which include barren land, trees, grassland, roads, buildings and water bodies.

  • Training and test datasets belong to disjoint set of image tiles.

  • Each image patch is size normalized to 28x28 pixels.

  • Once generated, both the training and testing datasets were randomized using a pseudo-random number generator.

CSV files

  • X_train_sat6.csv: 324,000 training images, 28x28 images each with 4 channels
  • y_train_sat6.csv: 324,000 training labels, 1x6 one-hot encoded vectors
  • X_test_sat6.csv: 81,000 training images, 28x28 images each with 4 channels
  • y_test_sat6.csv: 81,000 training labels, 1x6 one-hot encoded vectors

The original MAT file

  • train_x: 28x28x6x324000 uint8 (containing 400000 training samples of 28x28 images each with 4 channels)
  • train_y: 324,000x6 uint8 (containing 6x1 vectors having labels for the 400000 training samples)
  • test_x: 28x28x6x18000 uint8 (containing 100000 test samples of 28x28 images each with 4 channels)
  • test_y: 81,000x6 uint8 (containing 6x1 vectors having labels for the 100000 test samples)

Acknowledgements

The original MATLAB file was converted to multiple CSV files

The original SAT-4 and SAT-6 airborne datasets can be found here:

http://csc.lsu.edu/~saikat/deepsat/

Thanks to:

Saikat Basu, Robert DiBiano, Manohar Karki and Supratik Mukhopadhyay, Louisiana State University
Sangram Ganguly, Bay Area Environmental Research Institute/NASA Ames Research Center
Ramakrishna R. Nemani, NASA Advanced Supercomputing Division, NASA Ames Research Center

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号