graviti
产品服务
解决方案
知识库
公开数据集
关于我们
Wikilinks: 40 Million Entities in Context
Text Detection
|...
许可协议: Unknown

Overview

The Wikipedia links (WikiLinks) data consists of web pages that satisfy the following two constraints:

  1. contain at least one hyperlink that points to Wikipedia
  2. the anchor text of that hyperlink closely matches the title of the target Wikipedia page.

We treat each page on Wikipedia as representing an entity (or concept or idea), and the anchor text as a mention of the entity. The WikiLinks data set was obtained by iterating over Google's web index.

Citation

Please use the following citation when referencing the dataset:

@article{singh2012wikilinks,
  title={Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia},
  author={Singh, Sameer and Subramanya, Amarnag and Pereira, Fernando and McCallum, Andrew},
  journal={University of Massachusetts, Amherst, Tech. Rep. UM-CS-2012},
  volume={15},
  year={2012}
}
数据概要
数据格式
数据量
--
文件大小
--
| 数据量 -- | 大小 --
Wikilinks: 40 Million Entities in Context
Text Detection
许可协议: Unknown

Overview

The Wikipedia links (WikiLinks) data consists of web pages that satisfy the following two constraints:

  1. contain at least one hyperlink that points to Wikipedia
  2. the anchor text of that hyperlink closely matches the title of the target Wikipedia page.

We treat each page on Wikipedia as representing an entity (or concept or idea), and the anchor text as a mention of the entity. The WikiLinks data set was obtained by iterating over Google's web index.

Citation

Please use the following citation when referencing the dataset:

@article{singh2012wikilinks,
  title={Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia},
  author={Singh, Sameer and Subramanya, Amarnag and Pereira, Fernando and McCallum, Andrew},
  journal={University of Massachusetts, Amherst, Tech. Rep. UM-CS-2012},
  volume={15},
  year={2012}
}
0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号