C3
Text
NLP
|...
许可协议: Custom

Overview

C3 is the first free-form multiple-Choice Chinese machine reading Comprehension dataset, containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second language examinations.

Data Format

data/c3-{m,d}-{train,dev,test}.json: the dataset files, where m and d represent "mixed-genre" and "dialogue", respectively. The data format is as follows.

[
  [
    [
      document 1
    ],
    [
      {
        "question": document 1 / question 1,
        "choice": [
          document 1 / question 1 / answer option 1,
          document 1 / question 1 / answer option 2,
          ...
        ],
        "answer": document 1 / question 1 / correct answer option
      },
      {
        "question": document 1 / question 2,
        "choice": [
          document 1 / question 2 / answer option 1,
          document 1 / question 2 / answer option 2,
          ...
        ],
        "answer": document 1 / question 2 / correct answer option
      },
      ...
    ],
    document 1 / id
  ],
  [
    [
      document 2
    ],
    [
      {
        "question": document 2 / question 1,
        "choice": [
          document 2 / question 1 / answer option 1,
          document 2 / question 1 / answer option 2,
          ...
        ],
        "answer": document 2 / question 1 / correct answer option
      },
      {
        "question": document 2 / question 2,
        "choice": [
          document 2 / question 2 / answer option 1,
          document 2 / question 2 / answer option 2,
          ...
        ],
        "answer": document 2 / question 2 / correct answer option
      },
      ...
    ],
    document 2 / id
  ],
  ...
]

Citation

@article{sun2019investigating,
  title={Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension},
  author={Sun, Kai and Yu, Dian and Yu, Dong and Cardie, Claire},
  journal={Transactions of the Association for Computational Linguistics},
  year={2020},
  url={https://arxiv.org/abs/1904.09679v3}
}

License

Custom

数据概要
数据格式
Text,
数据量
19.577K
文件大小
3.09MB
发布方
dataset.org
Our basic research areas include computer vision, speech recognition, natural language processing, and machine learning. Applied exploration combines Tencent's scenarios and business advantages to create four categories of content, games, social networking, and platform-based tools AI. At present, Weiqi AI has become a 'superb art' and its technology has also been developed by Weixin, QQ, Daily Express and QQ music. Hundreds of Tencent products are used.
数据集反馈
立即开始构建AI