graviti
产品服务
解决方案
知识库
公开数据集
关于我们
MS-Celeb-1M
2D Box
2D Classification
Face
|...
许可协议: Research Only

Overview

We select one million celebrities, who are real persons in the world and have/had public attentions. The steps for selection are described in details in the following paragraphs. First, we select a subset of entities from a knowledge base called freebase [11] based on the information within freebase. In freebase, each entity is identifified by a unique key (called machine identififier, mid in [11]), and associated with rich properties. More specififically, we select the entities of which the properties satisfy all the three following conditions.

The object type of the entity is defifined as “people.person” in freebase. This condition means that we select entities which are claimed (by freebase) to be real persons in the world. We don’t include movie characters since their appearance is not strictly defifined, especially when a classic movie is retaken.

The entities are required to have at least one of the properties unique for human beings, such as “person’s name”, “place of birth”, “date of birth”, “person’s professions”. This condition removes the entities which have too sparse information for us to collect and label images. This condition also helps us to remove some of the entities of which the object type are mislabeled as “people.person” in freebase.

If the date of birth is available for a given entity in freebase, this entity can not be selected if he/she was born before the mid-nineteenth century. The reason for this condition is as follows. The fifirst roll-fifilm specialized camera “Kodak” was invented in 1888 [20] and started to get popular in late nineteenth century. We can not rely on drawings or sculptures to recognize people’s faces, since whether they are visually similar to the actual person could be subjective and arguable. An interesting example is that the sculpture of John Harvard in Harvard university is claimed to be inspired by a Harvard student Sherman Hoar rather than Harvard himself, since no one knew what John Harvard had looked like [21].

In the second step, we rank all the entities in the above sub set according to the frequency of their occurrence on the web. Then, we select the top one million entities to form our one mil lion celebrity list and provide their entity keys (mid) in freebase. The occurrence frequency for a given entity is obtained by count ing how many documents contain this entity in a large corpus with billions of documents from the web.

数据概要
数据格式
image,
数据量
10000K
文件大小
229.47GB
发布方
Microsoft
Microsoft Corporation is an American multinational technology company headquartered in Redmond, Washington, that develops, manufactures, licenses, supports and sells computer software, consumer electronics and personal computers and services.
| 数据量 10000K | 大小 229.47GB
MS-Celeb-1M
2D Box 2D Classification
Face
许可协议: Research Only

Overview

We select one million celebrities, who are real persons in the world and have/had public attentions. The steps for selection are described in details in the following paragraphs. First, we select a subset of entities from a knowledge base called freebase [11] based on the information within freebase. In freebase, each entity is identifified by a unique key (called machine identififier, mid in [11]), and associated with rich properties. More specififically, we select the entities of which the properties satisfy all the three following conditions.

The object type of the entity is defifined as “people.person” in freebase. This condition means that we select entities which are claimed (by freebase) to be real persons in the world. We don’t include movie characters since their appearance is not strictly defifined, especially when a classic movie is retaken.

The entities are required to have at least one of the properties unique for human beings, such as “person’s name”, “place of birth”, “date of birth”, “person’s professions”. This condition removes the entities which have too sparse information for us to collect and label images. This condition also helps us to remove some of the entities of which the object type are mislabeled as “people.person” in freebase.

If the date of birth is available for a given entity in freebase, this entity can not be selected if he/she was born before the mid-nineteenth century. The reason for this condition is as follows. The fifirst roll-fifilm specialized camera “Kodak” was invented in 1888 [20] and started to get popular in late nineteenth century. We can not rely on drawings or sculptures to recognize people’s faces, since whether they are visually similar to the actual person could be subjective and arguable. An interesting example is that the sculpture of John Harvard in Harvard university is claimed to be inspired by a Harvard student Sherman Hoar rather than Harvard himself, since no one knew what John Harvard had looked like [21].

In the second step, we rank all the entities in the above sub set according to the frequency of their occurrence on the web. Then, we select the top one million entities to form our one mil lion celebrity list and provide their entity keys (mid) in freebase. The occurrence frequency for a given entity is obtained by count ing how many documents contain this entity in a large corpus with billions of documents from the web.

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号