graviti
产品服务
解决方案
知识库
公开数据集
关于我们
WikiSplit
Text Detection
|...
许可协议: CC-BY-SA 4.0

Overview

One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.

The WikiSplit dataset was constructed automatically from the publicly available Wikipedia revision history. Although the dataset contains some inherent noise, it can serve as valuable training data for models that split or merge sentences.

数据概要
数据格式
数据量
1000K
文件大小
--
| 数据量 1000K | 大小 --
WikiSplit
Text Detection
许可协议: CC-BY-SA 4.0

Overview

One million English sentences, each split into two sentences that together preserve the original meaning, extracted from Wikipedia edits.

The WikiSplit dataset was constructed automatically from the publicly available Wikipedia revision history. Although the dataset contains some inherent noise, it can serve as valuable training data for models that split or merge sentences.

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号