Kodikologie und Paläographie im Digitalen Zeitalter 4

Codicology and Palaeography in the Digital Age 4

Der hier vorliegende vierte Band der Reihe zu Kodikologie und Paläographie im digitalen Zeitalter versammelt Beiträge aus der Forschung im interdisziplinären Schnittfeld traditioneller Geisteswissenschaften und Informatik. Er ist zugleich der Tagungsband der in den Jahren 2014 bis 2016 durchgeführten Veranstaltungsreihe "Maschinen und Manuskripte". Die 13 Beiträge aus den Bereichen der digitalen Kodikologie und Paläographie geben Einblicke in aktuelle computergestützte Forschung mit historischen Schriftzeugnissen und schließt Untersuchungen zu Bildern und zur musikalischen Notation ein. Der thematische Rahmen spannt sich dabei von der Erforschung digitalisierter Sammlungen mittels automatischer Mustererkennung über die Erfassung und Analyse von Schrift und Zeichensystemen bis zur Informationsvisualisierung von Forschungsdaten. The present fourth volume of the series on codicology and palaeography in the digital age features articles on research at the interdisciplinary intersection of the fields of traditional humanities and computer science. At the same time it represents the proceedings of the conference series "Machines and Manuscripts" held from 2014 to 2016. The 13 contributions from the field give insights on current computer aided research with historical written documents including images and musical notation. The thematic framework ranges from exploration of digitized collections to recognition and analysis of script and sign systems to information visualisation of research data.

eCodicology: The Computer and the Mediaeval Library

Hannah Busch, Swati Chandna


Through digitisation a large amount of mediaeval manuscript collection became publicly available, but the resources in time and human attention have not grown in proportion of digitised sources. Therefore, the question arises whether the computer can help to evaluate larger amounts of material like this. The project eCodicology has focused its research on the detection and measuring of the different layout features by using methods of pattern recognition for further analyses. The present paper gives insights into the developed software, SWATI – the Software Workflow for the Automatic Tagging of Images, and CodiVis, a visualisation framework for high-dimensional data sets, and how it can help the codicologist to explore the massive amount of heterogeneous datasets. The paper also focusses the various challenges, such as uncertain data due to irregularities and missing information in the manuscript’s catalogues, as well as the accuracy of the image processing results.


Durch die Digitalisierung sind zahlreiche Sammlungen mittelalterlicher Handschriften öffentlich zugänglich gemacht worden, jedoch sind weder die zeitlichen noch die personellen Möglichkeiten der Erforschung proportional dazu gewachsen. Daher stellt sich die Frage, inwiefern der Computer bei der Auswertung des Materials helfen kann. Das Projekt eCodicology hat seine Forschungsarbeit auf die Erkennung und Vermessung verschiedener makro- und mikrostruktureller Gestaltungsmerkmale der mittelalterlichen Seite gerichtet, indem es Methoden der Mustererkennung nutzt. Der vorliegende Artikel stellt die im Rahmen des Projektes entwickelte Software SWATI – Software Workflow for the Automatic Tagging of Images und CodiVis, ein Visualisierungsframework für hochdimensionales Datenmaterial, vor und erklärt, wie die entwickelte Software die Erforschung großer heterogener Datenbestände ermöglichen soll. Darüber hinaus richtet der Artikel sein Blickfeld auch auf die zahlreichen Herausforderungen die durch Unsicherheiten im Datenmaterial hervorgerufen werden sowie auf die Präzision der Ergebnisse der Bildverarbeitung.

Kodikologie und Paläographie im Digitalen Zeitalter 4 – Codicology and Palaeography in the Digital Age 4. Hrsg. Hannah Busch, Franz Fischer und Patrick Sahle, unter Mitarbeit von Bernhard Assmann, Philipp Hegel und Celia Krause. Schriften des Instituts für Dokumentologie und Editorik 11. Norderstedt: Books on Demand, 2017. 3–23.

1Page Layout and Mediaeval Manuscripts

A written page is more than text, it is not just a carrier of textual information, and the distribution of layout elements on the page can tell us more about the history of our written cultural heritage.

The page layout is defined as the collocation of rectangles containing graphical signs on the page surface of a book (Agati 2009, 219), the ratio between page and its content. The page layout aims to structure the codex and is designed according to the function of the text or book, to guarantee legibility. This is something everyone can notice by leafing through the codex. The appearance of a mediaeval book is very aesthetic, so it is hard to believe that it was realised by individual visual judgement, but research suggests the mediaeval artisans were artists rather than pure technicians. The question arises if they followed geometric rules, algorithms or a canon of proportions. This question has been the base of many layout studies concerning Latin and Greek manuscripts and it has been proven that at least in the most important scriptoria instructions had to be followed (see Maniaci 1995).

That the layout of the mediaeval manuscript page is not left to chance is proven by the existence of formulae of proportions as well (see Agati 2009 and Maniaci 1995). A formula of proportions can be defined as a coherent unit of standards, which – causing an organic bond between the different elements of the page – aims to extract the construction of a schema of ruling.11 The formula must be un-ambiguous and universal, it must not give values but proportions between the different features of the page and it is sufficient to give essential parameters to obtain all layout features (Maniaci 1995, 17). The validation of a formula can only be proven if one applies a flexible approach with a tolerance range, not to forget that a manuscript is still an artisanal work.

Concerning the connection of geometrics and page layout, it is sufficient to observe the ratio between the two sides of the rectangle to understand if a notable rectangle is involved. Notable rectangles can be defined by proportions which converge the aesthetic ideals of antiquity and exhibit certain geometrical proportions between their long and the short side. Two of those antique visions of aesthetics are the Golden Ratio and the Pythagorean Theorem.

The theory is proven by certain recurring relations, like the relation between the height of the text block and the width of the page: h=L, or the width of the text block is equal to the page height divided in half l=H/2 (Agati 2009, 227ff.).

To verify such theories, analysis of large corpora of mediaeval manuscripts is required. Measuring hundreds and thousands of manuscript pages manually is a very time consuming undertaking and the error rate of human work increases with every page measured. The availability of digitised manuscripts offers the possibility to utilise computers to collect and process the data. The project eCodicology12 is one attempt to analyse digital reproductions of mediaeval manuscripts with the help of computers by using methods of pattern recognition to take a closer look at the layout and perform statistical analysis of the newly gained data.

2Introducing eCodicology

The idea of eCodicology was born during the digitisation project Virtuelles Skriptorium St. Matthias which digitised, reunited and published the manuscripts and fragments from the mediaeval library of the Benedictine Abbey of St. Matthias in Trier. Its basis is the idea of thinking further than just giving access to digitised manuscripts and catalogues.13 For almost twenty years mediaeval manuscripts and other historical written documents have been digitised. Initially, digitisation focused on extremely important, famous, or rare manuscripts with the objective of making them accessible to the broad public and to ensure a better protection of the original. When high resolution scanners and digital single lens reflex cameras became more and more affordable, entire collections made their way into digital libraries.

New technologies and inventions have since been increasing the quality of the image data. It was time to take a next step and to rescue the digital collections from gathering dusty: digitised manuscripts can open new ways of research beyond better accessibility for researchers. The special research question of eCodicology focuses on generating new descriptive metadata by automatic analyses of digital images: is it possible to add missing or more precise information on the page layout in the catalogues by using the computer? And to which extent can these data help to support a historical research interest? To answer these questions, the project eCodicology tries to measure and analyse the page layout of mediaeval manuscripts by using the machine.

It has been the idea of eCodicology to establish a workflow for the automatic tagging of mediaeval manuscript layout features, including an algorithm library for pre-processing and feature extraction steps and transformation into the common format of the virtual scriptorium’s database.14 Furthermore, it experiments with the exploration of these data by performing statistical analyses and by providing an interactive visualisation framework.

eCodicology follows the quantitative approach to codicology which was first developed by a group of French and Italian researchers in the 1970s. Instead of focusing their research on the description of single manuscripts, the group Quanticod15 started to collect data for entire collections by building corpora and measuring similar features of the page layout. By manually collecting results of measurements and counting layout features on which statistical evaluations were performed, trends in manuscript production could be proven and displayed with graphic charts. Thus, it was possible to make statements about the character density on pages with a one or two column layout, about the significance of marginal space, and about temporary and regional tendencies concerning the mise-en-page of mediaeval manuscripts. Geometrical calculations could tell if the aspect ratio was influenced by norms like the Golden Ratio, well known from paintings, or the Pythagorean Theorem.

For the codicologist, the objective of working with the “masses” is to learn more about the materiality of manuscripts and their manufacturing process and to build a typology of manuscripts in a synchronic and diachronic perspective. For unknown reasons, the group of researchers stopped working on their projects just when computers developed more potential and, most importantly became affordable for research institutions and scholars.

3SWATI – Software Workflow for the Automatic Tagging of Images16

In order to analyse a large quantity of digitised manuscripts one has to figure out how to prepare and to handle the image data, which, in...

