ChID
许可协议:
Apache-2.0
Overview
ChID: A Large-scale Chinese IDiom Dataset for Cloze Test
Data Format
One example is shown below:
{
"content": "世锦赛的整体水平远高于亚洲杯,要如同亚洲杯那样“鱼与熊掌兼得”,就需要各方面密切配合、#idiom#。作为主帅的俞觉敏,除了得打破保守思想,敢于破格用人,还得巧于用兵、#idiom#、灵活排阵,指挥得当,力争通过比赛推新人、出佳绩、出新的战斗力。",
"realCount": 2,
"groundTruth": ["通力合作", "有的放矢"],
"candidates": [
["凭空捏造", "高头大马", "通力合作", "同舟共济", "和衷共济", "蓬头垢面", "紧锣密鼓"],
["叫苦连天", "量体裁衣", "金榜题名", "百战不殆", "知彼知己", "有的放矢", "风流才子"]
]
}
content
: The given passage where the original idioms are replaced by placeholders#idiom#
realCount
: The number of placeholders or blanksgroundTruth
: The golden answers in the order of blankscandidates
: The given candidates in the order of blanks
Citation
The ChID Dataset for paper ChID: A Large-scale Chinese IDiom Dataset for Cloze Test.
If your research is related to or based on our ChID dataset (or the version adapted for the competition), please kindly cite it:
@inproceedings{zheng-etal-2019-chid,
title = "{C}h{ID}: A Large-scale {C}hinese {ID}iom Dataset for Cloze Test",
author = "Zheng, Chujie and
Huang, Minlie and
Sun, Aixin",
booktitle = "Proceedings of the 57th Conference of
the Association for Computational Linguistics",
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1075",
pages = "778--787",
}
License
report
出错了
刚刚
timeout_error