UD-Chinese-GSD
许可协议:
CC BY-SA 4.0
Overview
Traditional Chinese Universal Dependencies Treebank annotated and converted by Google.
Tokenization and Word Segmentation
- This corpus contains 4997 sentences and 123291 tokens.
- This corpus contains 122962 tokens (100%) that are not followed by a space.
- This corpus does not contain words with spaces.
- This corpus contains 41 types of words that contain both letters and punctuation. Examples: #A, DC-10, km/h, #B, #C, #D, #E, #F, #G, -an, A-AVG, AK-47, Arzacq-Arraziguet, Beaune-Sud, Berne-Belp, CI-7957, CRH380B-002, F-15A, F-16A, Frito-Lay, It's, Kink.com, MD-11, Micro-USM, NX-01, Navy's, O., P-700, Pre-rendering, S-IVB, TVS-5, Tu-16, Uhler-Phillips, al-Banna, f(x), g(x), t.163.com, t.qq.com, t.sina.com.cn, t.sohu.com, t.xxxx.com
Click Here to learn more.
License
report
出错了
刚刚
timeout_error