CROHME
Text
OCR/Text Detection
|...
许可协议: Custom

Overview

The dataset provides training and test data from the competitions CROHME 2011, 2012 and 2013.
Furthermore, thanks to the participants' authorization, we are allowed to distribute the results.
files from the majority of the submitted systems in 2012. Here is the description of all data.
directories:

  • CROHME2011_data : all data from the CROHME 2011 competition:
    • CROHME_test : inkml test files without ground truth.
    • CROHME_testGT : inkml test files with ground truth.
    • CROHME_train : inkml train files with ground truth.
    • gram : xml grammars and symbol lists for parts I and II.
  • CROHME2012_data : all data from the CROHME 2012 competition:
    • testData : inkml test files without ground truth.
    • testDataGT : inkml test files with ground truth.
    • trainData : inkml train files with ground truth.
    • gram : xml grammars and symbol lists for parts I, II and III.
    • lists : lists of inkml files and latex expressions for parts I, II and III.
  • CROHME2013_data : all data from the CROHME 2013 competition:
    • TrainINKML : all training inkml files sorted by origin.
    • TestINKML : inkml test files without ground-truth, used to run the participants systems.
    • TestINKMLGT : inkml test files with ground-truth, used to evaluate the participants systems with the evalinkml tool.
    • Test_LG/Test2012LG Test_LG/Test2013LG: label graph version of the test files for 2012 and 2013 dataset, using inherited edges (so the graphs are DAGs).
    • Test_LG/Test2012LG_TREE Test_LG/Test2013LG_TREE: label graph version of the test files for 2012 and 2013 data set, without inherited edges (so the graphs are trees).

Data Format

The mathematical expressions ink corresponding to each expression is saved in a INKML file. An INKML file mainly contains three kinds of information:

  • the ink: a set of traces made of points.
  • the symbol level ground truth: the segmentation and label information of each symbol of the expression.
  • the mathematical ground truth: the MATHML structure of the expression. The two ground truth information (at the symbol level, and the mathematical one) are entered manually. Furthermore, some general information is added in the file.
  • the channels (here, X and Y).
  • the writer information (identification, handedness, age,gender, etc.), if available.
  • the LATEX ground truth (without any reference to the ink, to easily render it).
    The INKML format enables to make references between the digital ink of the expression, its segmentation into symbols and its MATHML representation. Listing below shows an example of an INKML file for the expression a < b/c, containing 5 symbols for a total number of 6 strokes(two for the 'a', and one for the other symbols). It can be seen that the traceGroup with identifier xml:id="8" has references to the 2 corresponding strokes of symbol 'a', as well as to the MATHML part with identifier xml:id="A". Thus, the stroke segmentation of a symbol can be linked to its MATHML representation.
    Example of an INKML file for the expression a b/c
    http://www.w3.org/2003/InkMLs
    traceFormat
    channel name="X" type="decimal"
    channel name="Y" type="decimal"
    traceFormat
    annotation type="writer"w123 annotation
    annotation type="truth" $a frac{b}{c}$ annotation
    annotationXML type="truth" encoding="Content-MathML"
    math xmlns="http://www.w3.org/1998/Math/MathML"
    mrow
    mi xml:id="A" a mi
    mrow
    mo xml:id="B" mo
    mfrac xml:id="C"
    mi xml:id="D" b mi
    mi xml:id="E" c mi
    mfrac
    mrow
    mrow
    math
    annotationXML
    trace id="1" 985 3317, ..., 1019 3340 trace
    ...
    trace id="6" 1123 3308, ..., 1127 3365 trace
    traceGroup xml:id="7"
    annotation type="truth" Ground truth annotation
    traceGroup xml:id="8"
    annotation type="truth" a annotation
    annotationXML href="A"
    traceView traceDataRef="1"
    traceView traceDataRef="2"
    traceGroup
    ...
    traceGroup
    ink

License

Custom

数据概要
数据格式
Image,
数据量
--
文件大小
29.04MB
发布方
University of Nantes, France
The University of Nantes (French: Université de Nantes) is a French university, located in the city of Nantes. In addition to the several campuses scattered in the city of Nantes, there are two satellite campuses located respectively in Saint-Nazaire and La Roche-sur-Yon. Currently, the university is attended by approximately 34,500 students. More than 10% of them are international students coming from 110 countries.
数据集反馈
出错了
刚刚
timeout_error
立即开始构建AI
出错了
刚刚
timeout_error