VoxCeleb2
Classification
Voice Print Recognition
|...
许可协议: CC BY 4.0

Overview

VoxCeleb2 contains over 1 million utterances for over 6,000 celebrities, extracted from videos uploaded to YouTube. The dataset is fairly gender balanced, with 61% of the speakers male. The speakers span a wide range of different ethnicities, accents, professions and ages. Videos included in the dataset are shot in a large number of challenging visual and auditory environments. These include interviews from red carpets, outdoor stadiums and quiet indoor studios, speeches given to large audiences, excerpts from professionally shot multimedia, and even crude videos shot on hand-held devices. Audio segments present in the dataset are degraded with background chatter, laughter, overlapping speech and varying room acoustics. We also provide face detections and face-tracks for the speakers in the dataset, and the face images are similarly ‘in the wild’, with variations in pose (including profiles), lighting, image quality and motion blur. Table 1 gives the general statistics, and Figure 1 shows examples of cropped faces as well as utterance length, gender and nationality distributions. The dataset contains both development (train/val) and test sets. However, since we use the VoxCeleb1 dataset for testing, only the development set will be used for the speaker recognition task (Sections 4 and 5). The VoxCeleb2 test set should prove useful for other applications of audio-visual learning for which the dataset might be used. The split is given in Table 2. The development set of VoxCeleb2 has no overlap with the identities in the VoxCeleb1 or SITW datasets.

Citation

@InProceedings{Chung18b,
  author       = "Chung, J.~S. and Nagrani, A. and Zisserman, A.",
  title        = "VoxCeleb2: Deep Speaker Recognition",
  booktitle    = "INTERSPEECH",
  year         = "2018",
}

License

CC BY 4.0

数据概要
数据格式
Video, Audio,
数据量
--
文件大小
256.09GB
发布方
Seebibyte
Seebibyte: Visual Search for the Era of Big Data is a large research project based in the Department of Engineering Science, University of Oxford. It is funded by the EPSRC (Engineering and Physical Sciences Research Council), and will run from 2015 - 2020.
数据集反馈
出错了
刚刚
timeout_error
立即开始构建AI
出错了
刚刚
timeout_error