graviti
产品服务
解决方案
知识库
公开数据集
关于我们
Birdsongs TF Records Fold 4
Aesthetics
|...
许可协议: CC-BY-SA 4.0

Overview

Context

Audio files for Cornell Birdsong competition in TF Record format.

The data has been converted to TF Records from the previously preprocessed MP3 audio files converted to WAV.

The full dataset is split into 5 stratified folds. The filename contains the number of audio files contained in the last digits. All the other parts of this dataset can be found here:

  • https://www.kaggle.com/benayas/birdsongs0
  • https://www.kaggle.com/benayas/birdsongs1
  • https://www.kaggle.com/benayas/birdsongs2
  • https://www.kaggle.com/benayas/birdsongs3
  • https://www.kaggle.com/benayas/birdsongs4

Content

Each audio file has been opened with soundfile, and then resulting numpy array, along with the sample rate and the label (encoded numerically), has been codified into a tfrecord.

The way to read a .tfrec file is

def read_labeled_tfrecord(ex):
    LABELED_TFREC_FORMAT = {
      'wav': tf.io.FixedLenFeature([], tf.string), # wav file
      'sr': tf.io.FixedLenFeature([], tf.int64), # sr
      'y':  tf.io.FixedLenFeature([], tf.int64), # target
    }
    example = tf.io.parse_single_example(ex, LABELED_TFREC_FORMAT)
    wav = tf.io.decode_raw(example['wav'], out_type=tf.float32)
    sr = tf.cast(example['sr'], tf.int32)
    y = tf.cast(example['y'], tf.int32)
    y = tf.one_hot(y, 264, on_value=1.0, off_value=0.0, dtype=tf.float32) # labels to one hot format
    return wav, sr, y # returns a dataset 

Acknowledgements

Source WAV data was extracted from @ttahara datasets

数据概要
数据格式
image,
数据量
98
文件大小
1.71GB
发布方
Alberto Benayas
| 数据量 98 | 大小 1.71GB
Birdsongs TF Records Fold 4
Aesthetics
许可协议: CC-BY-SA 4.0

Overview

Context

Audio files for Cornell Birdsong competition in TF Record format.

The data has been converted to TF Records from the previously preprocessed MP3 audio files converted to WAV.

The full dataset is split into 5 stratified folds. The filename contains the number of audio files contained in the last digits. All the other parts of this dataset can be found here:

  • https://www.kaggle.com/benayas/birdsongs0
  • https://www.kaggle.com/benayas/birdsongs1
  • https://www.kaggle.com/benayas/birdsongs2
  • https://www.kaggle.com/benayas/birdsongs3
  • https://www.kaggle.com/benayas/birdsongs4

Content

Each audio file has been opened with soundfile, and then resulting numpy array, along with the sample rate and the label (encoded numerically), has been codified into a tfrecord.

The way to read a .tfrec file is

def read_labeled_tfrecord(ex):
    LABELED_TFREC_FORMAT = {
      'wav': tf.io.FixedLenFeature([], tf.string), # wav file
      'sr': tf.io.FixedLenFeature([], tf.int64), # sr
      'y':  tf.io.FixedLenFeature([], tf.int64), # target
    }
    example = tf.io.parse_single_example(ex, LABELED_TFREC_FORMAT)
    wav = tf.io.decode_raw(example['wav'], out_type=tf.float32)
    sr = tf.cast(example['sr'], tf.int32)
    y = tf.cast(example['y'], tf.int32)
    y = tf.one_hot(y, 264, on_value=1.0, off_value=0.0, dtype=tf.float32) # labels to one hot format
    return wav, sr, y # returns a dataset 

Acknowledgements

Source WAV data was extracted from @ttahara datasets

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号