YouTube Faces DB
2D Box
许可协议: Unknown


Welcome to YouTube Faces Database, a database of face videos designed for studying the problem of unconstrained face recognition in videos. The data set contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube. An average of 2.15 videos are available for each subject. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames.

Number of videos per person:

#videos 1 2 3 4 5 6
#people 591 471 307 167 51 8

In designing our video data set and benchmarks we follow the example of the 'Labeled Faces in the Wild' LFW image collection. Specifically, our goal is to produce a large scale collection of videos along with labels indicating the identities of a person appearing in each video. In addition, we publish benchmark tests, intended to measure the performance of video pair-matching techniques on these videos. Finally, we provide descriptor encodings for the faces appearing in these videos, using well established descriptor methods.

Data Collection

Collection setup: We begin by using the 5,749 names of subjects included in the LFW data set to search YouTube for videos of these same individuals. The top six results for each query were downloaded. We minimize the number of duplicate videos by considering two videos' names with edit distance less than 3 to be duplicates. Downloaded videos are then split to frames at 24fps. We detect faces in these videos using the Viola-Jones face detector. Automatic screening was performed to eliminate detections of less than 48 consecutive frames, where detections were considered consecutive if the Euclidean distance between their detected centers was less than 10 pixels. This process ensures that the videos contain stable detections and are long enough to provide useful information for the various recognition algorithms. Finally, the remaining videos were manually verified to ensure that (a) the videos are correctly labeled by subject, (b) are not semi-static, still-image slide-shows, and (c) no identical videos are included in the database.

Data Annotation

For each person in the database there is a file called subject_name.labeled_faces.txt The data in this file is in the following format: filename,[ignore],x,y,width,height,[ignore],[ignore] where: x,y are the center of the face and the width and height are of the rectangle that the face is in. For example: $ head -3 Richard_Gere.labeled_faces.txt Richard_Gere\3\3.618.jpg,0,262,165,132,132,0.0,1 Richard_Gere\3\3.619.jpg,0,260,164,131,131,0.0,1 Richard_Gere\3\3.620.jpg,0,261,165,129,129,0.0,1


If you use this database, or refer to its results, please cite the following paper:

author={L. {Wolf} and T. {Hassner} and I. {Maoz}},
booktitle={CVPR 2011},
title={Face recognition in unconstrained videos with matched background similarity}, year={2011},
volume={}, number={}, pages={529-534},
Lior Wolf
A faculty member at the School of Computer Science at Tel Aviv University and a research scientist at Facebook AI Research.