In Codice Ratio is a research project that aims at developing novel methods and tools to support content analysis and knowledge discovery from large collections of historical documents. The goal is to provide humanities scholars with novel tools to conduct data-driven studies over large historical sources. The project concentrates on the collections of the Vatican Secret Archives, one of the largest and most important historical archive in the world. In an extension of 85 kilometres of shelving, it maintains more than 600 archival collections containing historical documents on the Vatican activities, such as, all the acts promulgated by the Vatican, account books, correspondence of the popes, starting from the eighth century.
Handwritten Text Recognition
We are developing a full-fledged system to automatically transcribe the contents of the manuscripts. We follow a novel approach, based on character segmentation. Our idea is to govern imprecise character segmentation by considering that correct segments are those that give rise to a sequence of characters that more likely compose a Latin word. We have designed a principled solution that relies on convolutional neural networks and statistical language models.
We have experimented our approach on the Vatican Registers of the Vatican Secret Archives. These documents record the inbound and outbound correspondence of the popes: political letters that testify the broad activities of the popes in the ecclesiastical and temporal spheres; authoritative opinions on legal issues; documents addressed to sovereigns, religious and political institutions scattered throughout the globe; correspondence relating to the harvest of tithes and tributes due to the Church.
Never having been transcribed in the past, these documents are of unprecedented historical relevance. Preliminary results are encouraging.