The extraction of relevant information from historical handwritten document collections is one
of the key steps in order to make these manuscripts available for access and searches.
In this context, instead of a pure transcription, the objective is to move towards document understanding. Concretely,the aim is to detect the named entities and assign each of them a semantic category, such as family names, places, occupations, etc.
A typical application scenario of named entity recognition is demographic documents, since they contain people's names,birthplaces, occupations, etc. In this scenario, the extraction of the key contents and its storage in databases allows the access to their contents and envision innovative services based in genealogical, social or demographic searches.
Lately, the interest of the document image analysis community in document understanding, named entity recognition and semantic categorization is awaking, and some techniques based on HMMs, BLSTMs and CNNs have been proposed. With this competition, we aim to foster the research in this field an offer a benchmark for the research community.
This database consists of historical handwritten marriages records from the Archives of the Cathedral of Barcelona. The pages we used correspond to the volume 69, written in old Catalan by one single writer in the 17th century. Each marriage record contains information about the husbands occupation, place of origin, husbands and wifes former marital status, parents occupation, place of residence, geographical origin, etc.