Two ground-truthed datasets of natively-digital PDF documents containing tables.
On this page you will find two ground-truthed datasets of natively-digital PDF documents containing tables. These documents have been collected systematically from the European Union and US Government websites, and we therefore expect them to have public domain status. Each PDF document is accompanied by three XML (or CSV) file containing its ground truth in the following models:
- table regions (for evaluating table location)
- cell structures (for evaluating table structure recognition)
- functional representation (for evaluating table interpretation)