graviti
产品服务
解决方案
知识库
公开数据集
关于我们
avatar
OSCAR Corpus Nepali
Text Detection
Text Detection
|...
许可协议: CC-BY-SA 4.0

Overview

The files are from OSCAR Corpus. Please visit their site for more information.

The dataset is currently shuffled at line level and no metadata is provided. Thus it is mainly intended to be used in the training of unsupervised language models for NLP.

The files are:

  • ne.txt (uncompressed size 1.8GB)
  • ne_dedup.txt (uncompressed size 1.2GB)
    • In this version, duplicate lines have been removed.

I do not own the dataset. Please cite the paper of the curators of the datasets in case you decide to use it for your research.

数据概要
数据格式
text,
数据量
2
文件大小
82.59MB
发布方
Prabesh Dhakal
| 数据量 2 | 大小 82.59MB
OSCAR Corpus Nepali
Text Detection
Text Detection
许可协议: CC-BY-SA 4.0

Overview

The files are from OSCAR Corpus. Please visit their site for more information.

The dataset is currently shuffled at line level and no metadata is provided. Thus it is mainly intended to be used in the training of unsupervised language models for NLP.

The files are:

  • ne.txt (uncompressed size 1.8GB)
  • ne_dedup.txt (uncompressed size 1.2GB)
    • In this version, duplicate lines have been removed.

I do not own the dataset. Please cite the paper of the curators of the datasets in case you decide to use it for your research.

0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号

Copyright@Graviti
沪ICP备19019574号
沪公网安备 31011002004865号