graviti
产品服务
解决方案
知识库
公开数据集
关于我们
Bird songs from Europe (xeno-canto)
2D Classification
许可协议: CC-BY-SA 4.0

Overview

Context

In searching for datasets of bird vocalisations, I came across the Kaggle-hosted British Birdsong Dataset and Avian Vocalizations from CA & NV, USA, all based on the xeno-canto collection. However, I found that the first offered little quality and quantity, and the second, while partly addressing quantity and quality, still rendered my CNN accuracy below 50% after numerous attempts. From this experience I hypothesised that factors such as bird sex, vocalisation types (e.g. calls and songs) and recording quality could drastically impact training performance. The problem was that filtering for either bird sex alone (quality is not listed in the CSV metadata) yielded a very limited amount of data for CNN training.

Therefore, I decided to build my own dataset of high-quality male bird songs from xeno-canto.

Content

This dataset was built from a xeno-canto API query using the warbleR R package. The API query was the following:

type:song type:male len_gt:30 q_gt:C area:europe

This query returns high-quality male bird songs recorded in Europe for longer than 30 seconds (MP3 format). I subsequently filtered the 50 most frequent bird species and downsampled all to the minimum species frequency across all 50, turning out to be \(n = 43 \). As a result, I ended up having \(43 \times 50 = 2150\) recordings.

The query executed from the function warbleR::querxc additionally returns metadata associated with the recordings. The only change made on this table was the addition of columns Species and Path that I use to more easily retrieve the species labels and to point to the recording MP3 files under mp3/, respectively. This is the metadata.csv file listed below.

To learn more about the API query parameters check out the main xeno-canto API documentation and the accompanying search tips.

This dataset was designed specifically to train a CNN as featured in my blog poissonisfish, where I describe the classification of all 50 birds species with up to 72% test accuracy.

Finally, please note:

  • Recordings may contain more than just male bird songs, i.e. mixtures of calls and songs from both male and female birds from the same or different species;
  • Quality is restricted to A and B but does also include unrated recordings.

Acknowledgements

I want to thank the xeno-canto community for their work in collecting, documenting and sharing this wealth of information. To consult permissions to use and distribute these data please refer to the Terms of Use page.

Inspiration

My biology background and love for wild life, Kaggle and the datasets from Rachael Tatman and Sam Hiatt that spurred my interest.

数据概要
数据格式
image,
数据量
2.151K
文件大小
955.27MB
发布方
Francisco de Abreu e Lima
| 数据量 2.151K | 大小 955.27MB
Bird songs from Europe (xeno-canto)
2D Classification
许可协议: CC-BY-SA 4.0

Overview

Context

In searching for datasets of bird vocalisations, I came across the Kaggle-hosted British Birdsong Dataset and Avian Vocalizations from CA & NV, USA, all based on the xeno-canto collection. However, I found that the first offered little quality and quantity, and the second, while partly addressing quantity and quality, still rendered my CNN accuracy below 50% after numerous attempts. From this experience I hypothesised that factors such as bird sex, vocalisation types (e.g. calls and songs) and recording quality could drastically impact training performance. The problem was that filtering for either bird sex alone (quality is not listed in the CSV metadata) yielded a very limited amount of data for CNN training.

Therefore, I decided to build my own dataset of high-quality male bird songs from xeno-canto.

Content

This dataset was built from a xeno-canto API query using the warbleR R package. The API query was the following:

type:song type:male len_gt:30 q_gt:C area:europe

This query returns high-quality male bird songs recorded in Europe for longer than 30 seconds (MP3 format). I subsequently filtered the 50 most frequent bird species and downsampled all to the minimum species frequency across all 50, turning out to be \(n = 43 \). As a result, I ended up having \(43 \times 50 = 2150\) recordings.

The query executed from the function warbleR::querxc additionally returns metadata associated with the recordings. The only change made on this table was the addition of columns Species and Path that I use to more easily retrieve the species labels and to point to the recording MP3 files under mp3/, respectively. This is the metadata.csv file listed below.

To learn more about the API query parameters check out the main xeno-canto API documentation and the accompanying search tips.

This dataset was designed specifically to train a CNN as featured in my blog poissonisfish, where I describe the classification of all 50 birds species with up to 72% test accuracy.

Finally, please note:

  • Recordings may contain more than just male bird songs, i.e. mixtures of calls and songs from both male and female birds from the same or different species;
  • Quality is restricted to A and B but does also include unrated recordings.

Acknowledgements

I want to thank the xeno-canto community for their work in collecting, documenting and sharing this wealth of information. To consult permissions to use and distribute these data please refer to the Terms of Use page.

Inspiration

My biology background and love for wild life, Kaggle and the datasets from Rachael Tatman and Sam Hiatt that spurred my interest.

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号