Introduction into Handwritten Text Recognition

Winter School | October - December 2022 •

Part 1: What is HTR? | Transkribus 1 | Introduction into manuscripts and working in groups

4 Zoom online sessions | Oct 21, Nov 4 and 25, Dec 9
3-day-workshop in person at Vienna | December 19-21

Please, send a short CV and cover letter to Jan Odstrcilik
before 11^th September 2022. As a subject, use „HTR School 2022“.
Please don’t forget to indicate the team/module you would like to take (i.e. Carolingian Latin, late medieval Latin, medieval German, medieval Czech).
If you have any questions, don‘t hesitate to ask.

Michael Berger | University of Vienna
Tim Geelhaar | Bielefeld University
Tobias Hodel | University of Bern
Sarah Hutterer | Unviersität Wien
Anna Michalcová | Czech Academy of Sciences
Stephan Müller | University of Vienna
Jan Odstrčilík | Austrian Academy of Sciences
Steffen Patzold | University of Tübingen
Leon Pürstinger | Austrian Academy of Sciences
Helmut Reimitz | Princeton University
Dennis Wegener | University of Vienna
Vicent Bosch | Transkriptorium, Universitat Politècnica de València,
Gerda Heydemann | Freie Universität Berlin, [email protected]
Daniela Mairhofer | Princeton University
Joan Andreu Sanchez | Transkriptorium, Universitat Politècnica de València
Alejandro Toselli | Transkriptorium, Universitat Politècnica de València
Enrique Vidal | Transkriptorium, Universitat Politècnica de València

Program & information | PDF

SCHEDULE

4 Zoom online sessions
OCT 21	What is HTR?
	Transkribus 1 (first hands-on)
	Introduction into manuscripts and working in groups
NOV 4	Transkribus 2 (uploading documents, layout recognition, simple transcription)
NOV 4	Working in groups (first transcripitions)
NOV 25	Transkribus 3 (training of custommodels, learningcurves, tagging)
DEC 9	Presentation of resulting models
DEC 9	Transkribus 4 (exporting documents and furter processing)
3-day-workshop in person at Vienna
DEC 19	Publication of the Ground truth
	Creating a simple website 1
	Alternatives to Transkribus
DEC 20	Creating a simple website 2
	VCEditor and other tools
	Library visit
DEC 21	Time to finish the work

Introduction into HTR | Technologies of Medieval Manuscripts – Latin|German|Czech

A revolution has slowly begun in the study of historical documents: Machine Learning tools have been developed to allow for the automatic transcription of documents. Over the last decade, these tools can now help assist in the production of texts from medieval manuscripts at previously unobtainable levels of accuracy. Today, libraries have used these tools to make their collections searchable, while researchers have sped up the process of creating editions of texts and adopted them for the study of medieval documents.
The course will offer an introduction into some of these ongoing projects, but more importantly provide an introduction into the practice of studying medieval documents with Handwritten Text Recognition (HTR) technologies. The course will have two main parts: 4 online sessions and a three-day in person workshop in Vienna. During the first phase, participants will be introduced to both the theory of handwritten text recognition and its practical application using the Transkribus (transkribus.eu) tool. We will then work in four groups, focusing on four different periods and languages: Carolingian Latin, late medieval Latin, late medieval German, and late medieval Czech. Each group will have its own supervisor and its goal will be to train an HTR model for each type of writing.
During the in-person workshop in Vienna, we will finalize the four projects and publish our results online: both the transcriptions and Handwritten Text Recognition models. Additionally, we will also visit libraries in Vienna to see selected manuscripts in person. Finally, we will test other machine learning tools for their automatic transcription outcomes and use other digital tools. The course will be taught by a team of experts in HTR, medieval manuscript studies and Latin, German and Czech philology. At the end of the course, you will receive a certificate.

Requirements

The course is primarily designed for Masters or PhD students, however, we will consider other applications as well. You are expected to be familiar with the language of the group you want to join: Medieval Latin, Medieval German, and Medieval Czech. We expect you to have at least basic knowledge of medieval palaeography and manuscript studies. However, we would also be able and delighted to offer resources for self-training in these languages and manuscript studies to prepare for the course.

Costs

There is no participation fee, but you will be expected to cover the costs of your trip to Vienna, including accommodation (c. 300 Euros). We do hope, however, to be able to offer bursaries for students who do not have support from their institutions. If you would like to apply for a bursary, please let us know as early as possible.

4 Zoom online sessions | Oct 21, Nov 4 and 25, Dec 9
3-day-workshop in person at Vienna | December 19-21

Program & information | PDF

SCHEDULE

4 Zoom online sessions
OCT 21	What is HTR?
	Transkribus 1 (first hands-on)
	Introduction into manuscripts and working in groups
NOV 4	Transkribus 2 (uploading documents, layout recognition, simple transcription)
NOV 4	Working in groups (first transcripitions)
NOV 25	Transkribus 3 (training of custommodels, learningcurves, tagging)
DEC 9	Presentation of resulting models
DEC 9	Transkribus 4 (exporting documents and furter processing)
3-day-workshop in person at Vienna
DEC 19	Publication of the Ground truth
	Creating a simple website 1
	Alternatives to Transkribus
DEC 20	Creating a simple website 2
	VCEditor and other tools
	Library visit
DEC 21	Time to finish the work

Introduction into HTR | Technologies of Medieval Manuscripts – Latin|German|Czech

Requirements

Costs

Informationen

Contact:

Jan Odstrčilík
Austrian Academy of Sciences

Organisation:

Institute for Medieval Research of the Austrian Academy of Sciences

Manuscript, Rare Books and Archival Studies Initiative (MARBAS), Princeton University

CRC 1288 Practices of Comparing, University Bielefeld

Department of German Studies, University of Vienna

Digital Humanities, Walter Benjamin Kolleg, Universität Bern

History Department, University of Tübingen

tranSkritporium, Valencia
tranSkriptorium.com

Name	Funktion	Speicherdauer	Typ	Anbieter
CookieConsent	Speichert Ihre Einwilligung zur Verwendung von Cookies.	1 Jahr	HTML	Web Consent
fe_typo_user	Ordnet Ihren Browser einer Session auf dem Server zu. Dies beeinflusst nur die Inhalte, die Sie sehen und wird von uns nicht ausgewertet oder weiterverarbeitet.	-	HTTP	Web User

Name	Funktion	Speicherdauer	Typ	Anbieter
_pk_id	Wird verwendet, um ein paar Details über den Benutzer wie die eindeutige Besucher-ID zu speichern.	13 Monate	HTML	Matomo-id
_pk_ref	Wird benutzt, um die Informationen der Herkunftswebsite des Benutzers zu speichern.	6 Monate	HTML	Matomo-ref
_pk_ses	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo-ses
_pk_cvar	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo-cvar
_pk_hsr	Kurzzeitiges Cookie, um vorübergehende Daten des Besuchs zu speichern.	30 Minuten	HTML	Matomo

Name	Funktion	Speicherdauer	Typ	Anbieter
YouTube	Es wird eine Verbindung mit YouTube hergestellt, um Videos anzuzeigen.	-	Verbindung	YouTube
SoundCloud	Es wird eine Verbindung mit SoundCloud hergestellt, um Audio-Dateien abzuspielen.	-	Verbindung	SoundCloud
Twitter	Es wird eine Verbindung mit Twitter hergestellt, um Tweets anzuzeigen.	-	missing translation: type.	Twitter
_cs_c	Zeigt an, ob der Nutzer dem Tracking durch ContentSquare zugestimmt hat.	394 Tage	missing translation: type.	Spotify (ContentSquare)
_cs_id	Speichert eine eindeutige Benutzer-ID für die Analyse durch ContentSquare.	394 Tage	missing translation: type.	Spotify (ContentSquare)
_ga	Wird verwendet, um Benutzer zu unterscheiden.	400 Tage	missing translation: type.	Google Analytics
_ga_BMC5VGR8YS	Dient Google Analytics zur Aufrechterhaltung des Sitzungsstatus.	400 Tage	missing translation: type.	Google Analytics
_ga_S0T2DJJFZM	Dient Google Analytics zur Aufrechterhaltung des Sitzungsstatus.	399 Tage	missing translation: type.	Google Analytics
_ga_ZWG1NSHWD8	Dient Google Analytics zur Aufrechterhaltung des Sitzungsstatus.	400 Tage	missing translation: type.	Google Analytics
_ga_ZWRF3NLZJZ	Dient Google Analytics zur Aufrechterhaltung des Sitzungsstatus.	400 Tage	missing translation: type.	Google Analytics
_gid	Wird verwendet, um Benutzer zu unterscheiden.	1 Tage	missing translation: type.	Google Analytics
_ScCbts	Speichert temporäre Sitzungs- oder Wiedergabeeinstellungen.	6 Tage	missing translation: type.	Spotify
_scid	Spotify-Werbe-ID für Analyse und Remarketing.	395 Tage	missing translation: type.	Spotify
_scid_r	Spotify-Werbe-ID für Analyse und Remarketing.	395 Tage	missing translation: type.	Spotify
eupubconsent-v2	Speichert die IAB-Zustimmungsinformationen gemäß dem TCF.	364 Tage	missing translation: type.	IAB / Spotify
OptanonAlertBoxClosed	Speichert, ob der Cookie-Hinweis geschlossen wurde.	364 Tage	missing translation: type.	OneTrust
OptanonConsent	Speichert die Zustimmungseinstellungen, die über OneTrust gesetzt wurden.	365 Tage	missing translation: type.	OneTrust
sp_adid	Werbekennung von Spotify für Tracking und Personalisierung.	365 Tage	missing translation: type.	Spotify
sp_landing	Zeichnet auf, welche Spotify-Seite zuerst besucht wurde.	1 Tage	missing translation: type.	Spotify
sp_m	Speichert die Marktregion des Nutzers.	399 Tage	missing translation: type.	Spotify
sp_t	Sitzungstoken für Spotify-Wiedergabe und Zugriff.	365 Tage	missing translation: type.	Spotify

Veranstaltungen

Introduction into Handwritten Text Recognition

Informationen

Contact:

Organisation: