IEEHR 2017-The esposalles
Text
OCR/Text Detection
|...
许可协议: CC BY-NC-ND 4.0

Overview

The extraction of relevant information from historical handwritten document collections is one of the key steps in order to make these manuscripts available for access and searches.
In this context, instead of a pure transcription, the objective is to move towards document understanding. Concretely,the aim is to detect the named entities and assign each of them a semantic category, such as family names, places, occupations, etc.
A typical application scenario of named entity recognition is demographic documents, since they contain people's names,birthplaces, occupations, etc. In this scenario, the extraction of the key contents and its storage in databases allows the access to their contents and envision innovative services based in genealogical, social or demographic searches.
Lately, the interest of the document image analysis community in document understanding, named entity recognition and semantic categorization is awaking, and some techniques based on HMMs, BLSTMs and CNNs have been proposed. With this competition, we aim to foster the research in this field an offer a benchmark for the research community.

Data Collection

This database consists of historical handwritten marriages records from the Archives of the Cathedral of Barcelona. The pages we used correspond to the volume 69, written in old Catalan by one single writer in the 17th century. Each marriage record contains information about the husbands occupation, place of origin, husbands and wifes former marital status, parents occupation, place of residence, geographical origin, etc.

数据概要
数据格式
Image,
数据量
--
文件大小
2.11GB
发布方
Computer Vision Center (CVC)
The CVC is a non-profit research center with an independent legal status, established in 1995 by the Generalitat de Catalunya and the Universitat Autònoma de Barcelona (UAB).
数据集反馈
立即开始构建AI