C3
许可协议:
Custom
Overview
C3 is the first free-form multiple-Choice Chinese machine reading Comprehension dataset, containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second language examinations.
Data Format
data/c3-{m,d}-{train,dev,test}.json
: the dataset files, where m and d represent "mixed-genre"
and "dialogue", respectively. The data format is as follows.
[
[
[
document 1
],
[
{
"question": document 1 / question 1,
"choice": [
document 1 / question 1 / answer option 1,
document 1 / question 1 / answer option 2,
...
],
"answer": document 1 / question 1 / correct answer option
},
{
"question": document 1 / question 2,
"choice": [
document 1 / question 2 / answer option 1,
document 1 / question 2 / answer option 2,
...
],
"answer": document 1 / question 2 / correct answer option
},
...
],
document 1 / id
],
[
[
document 2
],
[
{
"question": document 2 / question 1,
"choice": [
document 2 / question 1 / answer option 1,
document 2 / question 1 / answer option 2,
...
],
"answer": document 2 / question 1 / correct answer option
},
{
"question": document 2 / question 2,
"choice": [
document 2 / question 2 / answer option 1,
document 2 / question 2 / answer option 2,
...
],
"answer": document 2 / question 2 / correct answer option
},
...
],
document 2 / id
],
...
]
Citation
@article{sun2019investigating,
title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},
journal={Transactions of the Association for Computational Linguistics},
year={2020},
url={https://arxiv.org/abs/1904.09679v3}
}