The ground-truthed datasets of PDF tables
Text
OCR/Text Detection
|...
许可协议: Unknown

Overview

Two ground-truthed datasets of natively-digital PDF documents containing tables.
On this page you will find two ground-truthed datasets of natively-digital PDF documents containing tables. These documents have been collected systematically from the European Union and US Government websites, and we therefore expect them to have public domain status. Each PDF document is accompanied by three XML (or CSV) file containing its ground truth in the following models:

  • table regions (for evaluating table location)
  • cell structures (for evaluating table structure recognition)
  • functional representation (for evaluating table interpretation)
数据概要
数据格式
Text,
数据量
--
文件大小
--
发布方
Dr. Tamir Hassan
A researcher, developer and consultant in the field of Document Engineering and have over 15 years of experience working with PDF and HTML(+CSS+JS) documents on topics including table recognition, automatic tagging, accessibility, layout optimization and conversion between the two formats.
数据集反馈
出错了
刚刚
timeout_error
立即开始构建AI
出错了
刚刚
timeout_error