UD-Chinese-GSD
Text
NLP
|...
许可协议: CC BY-SA 4.0

Overview

Traditional Chinese Universal Dependencies Treebank annotated and converted by Google.

Tokenization and Word Segmentation

  • This corpus contains 4997 sentences and 123291 tokens.
  • This corpus contains 122962 tokens (100%) that are not followed by a space.
  • This corpus does not contain words with spaces.
  • This corpus contains 41 types of words that contain both letters and punctuation. Examples: #A, DC-10, km/h, #B, #C, #D, #E, #F, #G, -an, A-AVG, AK-47, Arzacq-Arraziguet, Beaune-Sud, Berne-Belp, CI-7957, CRH380B-002, F-15A, F-16A, Frito-Lay, It's, Kink.com, MD-11, Micro-USM, NX-01, Navy's, O., P-700, Pre-rendering, S-IVB, TVS-5, Tu-16, Uhler-Phillips, al-Banna, f(x), g(x), t.163.com, t.qq.com, t.sina.com.cn, t.sohu.com, t.xxxx.com

Click Here to learn more.

License

CC BY-SA 4.0

数据概要
数据格式
Text,
数据量
--
文件大小
400.69MB
发布方
Universal Dependencies
Universal Dependencies (UD) is a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages. UD is an open community effort with over 300 contributors producing more than 150 treebanks in 90 languages.
数据集反馈
| 36 | 数据量 -- | 大小 400.69MB
UD-Chinese-GSD
Text
NLP
许可协议: CC BY-SA 4.0

Overview

Traditional Chinese Universal Dependencies Treebank annotated and converted by Google.

Tokenization and Word Segmentation

  • This corpus contains 4997 sentences and 123291 tokens.
  • This corpus contains 122962 tokens (100%) that are not followed by a space.
  • This corpus does not contain words with spaces.
  • This corpus contains 41 types of words that contain both letters and punctuation. Examples: #A, DC-10, km/h, #B, #C, #D, #E, #F, #G, -an, A-AVG, AK-47, Arzacq-Arraziguet, Beaune-Sud, Berne-Belp, CI-7957, CRH380B-002, F-15A, F-16A, Frito-Lay, It's, Kink.com, MD-11, Micro-USM, NX-01, Navy's, O., P-700, Pre-rendering, S-IVB, TVS-5, Tu-16, Uhler-Phillips, al-Banna, f(x), g(x), t.163.com, t.qq.com, t.sina.com.cn, t.sohu.com, t.xxxx.com

Click Here to learn more.

License

CC BY-SA 4.0

数据集反馈
0
立即开始构建AI
graviti
wechat-QR
长按保存识别二维码,关注Graviti公众号