Ruse of Reuse

Detecting Text-similarity with AI in Historical Sources | International Workshop

Orvieto, Libreria di Antonio Albèri, © Sailko, CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0/deed.en), via Wikimedia Commons, modified by Martin Roček

The volume of text available in various digital corpora has grown immensely thanks to numerous ongoing digitization projects, and the continued success of HTR will only further expand these collections. Consequently, the need to take full advantage of these vast resources is more pressing than ever. The ability to identify text-reuse and measure text-similarity is thus more important than ever, offering the potential to see connections never viewed before. AI is the key to making this possible.

This two-day workshop is a continuation of our 2023 event, Finding Connections: Using AI and DNA Sequencing to Find Similarities and Parallels in Medieval Texts, now with an extended focus on general historical sources.

March 5th will be devoted to research presentations across two sessions. In the afternoon, William Mattingly will deliver an associated KIMAFO Lecture: From Physical Object to Structured Data: Building AI Pipelines in Cultural Heritage.

March 6th will take a more informal, hands-on workshop approach. Two morning sessions will be dedicated to work-in-progress reports, plans for implementing AI in various projects, experimental approaches, and open discussions. The afternoon will conclude with a closed practical programming workshop focusing on the application of various AI methods, led by Martin Roček and Gleb Schmidt.

Registration

Preliminary Program

Thursday, 5th March

9:45 – Registration

10:00 – Welcome and Introduction

10:30-12:00 – Session 1
Chair: Sven Meeder

• Martin Roček and Jan Odstrčilík: Visualising Semantic Similarity
• Jeffrey Witt: Tackling the Granularity Matching Problem in Hierarchical Texts with Multi-Resolution Embeddings
• Jan Maliszewski: What is Semantic Search Good for in Scholastic Corpora?

12:00-13:00 – Invited Lunch for Speakers and Moderators

13:00-14:30 – Session 2
Chair: Gerda Heydemann

• Svetlana Yatsyk: Automatic Cataloguing of the Books of Hours. Textual Unit Identification and Structural Annotation
• Tim Geelhaar: Does AI Break the Chains of Prometheus? Or How AI Can Advance the Analytical Possibilities of the Latin Text Archive (LTA)
• Gabriel Viehhauser: How to Align Medieval Prose Texts and Other Impossibilities

14:30-15:00 – Coffee Break

15:00-17:00 – KImafo Lecture
Moderator: Jan Odstrčilík

William Mattingly: From Physical Object to Structured Data: Building AI Pipelines in Cultural Heritage

Invited Dinner for Speakers and Moderators

Friday, 6th March

9:00-10:30 – Session 3
Chair: Martin Roček

• Marin Le Bris: Operationalising Classical Antiquity as a Culture of Reference(s)
• Anna Dolganov: A Multimodal LLM for Ancient Greek: Initial Results and Future Perspectives
• William Mattingly: Data Augmentation for Capturing Variance in Manuscript Traditions

10:30-11:00 – Coffee Break

11:00-12:30 – Session 4
Chair: Gleb Schmidt

• Andrea Scalia: Viral Bible. Biblical Trends Across the Middle Ages
• Sven Meeder and Gleb Schmidt: Text Reuse and the Social Life of Early Medieval Canon Law
• Alexander Marx and Peter Andorfer: Tracing the Tradition of the Roman Conquest of Jerusalem in Latin Texts (c.400-c.1300): A Database with c.2500 Entries

Conclusion of the Public Part

12:30-13:30 – Invited Lunch for Speakers and Moderators

Afternoon: Closed Practical Workshop for Speakers and Moderators

Information

Date

March 5-6, 2026

Venue

Seminar rooms 7 and 8 | 5th Floor
Austrian Academy of Sciences
PSK, Georg-Coch-Platz 2
1010 Vienna

Registration

Organisers

Digital Lab, Institute for Medieval Research, Austrian Academy of Sciences

SOLEMNE 'The Social Life of Early Medieval Normative Texts’ (canones.org) (ERC CoG: 101087979), Radboud University

Cooperation

Austrian Center for Digital Humanities, Austrian Academy of Sciences
Machine Learning Topical Platform, Austrian Academy of Sciences

Contact

Jan Odstrčilík

Name	Purpose	Storage duration	Type	Provider
CookieConsent	Remembers your consent to the use of cookies.	1 year	HTML	Web Consent
fe_typo_user	Assigns your browser to a session on the server. This only affects the content you see and is not evaluated or processed by us	-	HTTP	Web User

Name	Purpose	Storage duration	Type	Provider
_pk_id	Used to store a few details about the user like unique visitor ID.	13 months	HTML	Matomo-id
_pk_ref	Used to store information about the user's referring website.	6 months	HTML	Matomo-ref
_pk_ses	Short-term cookie to save temporary data from the visit.	30 minutes	HTML	Matomo-ses
_pk_cvar	Short-term cookie to save temporary data from the visit.	30 minutes	HTML	Matomo-cvar
_pk_hsr	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo

Name	Purpose	Storage duration	Type	Provider
YouTube	A connection to YouTube will be established to view videos.	-	Connection	YouTube
SoundCloud	A connection to SoundCloud will be established to play audio files.	-	Connection	SoundCloud
Twitter	A connection to Twitter will be established to display tweets.	-	missing translation: type.	Twitter
_cs_c	Indicates whether the user has consented to ContentSquare tracking.	394 days	missing translation: type.	Spotify (ContentSquare)
_cs_id	Stores a unique user ID for ContentSquare session analysis.	394 days	missing translation: type.	Spotify (ContentSquare)
_ga	Used to distinguish users.	400 days	missing translation: type.	Google Analytics
_ga_BMC5VGR8YS	Used by Google Analytics to persist session state.	400 days	missing translation: type.	Google Analytics
_ga_S0T2DJJFZM	Used by Google Analytics to persist session state.	399 days	missing translation: type.	Google Analytics
_ga_ZWG1NSHWD8	Used by Google Analytics to persist session state.	400 days	missing translation: type.	Google Analytics
_ga_ZWRF3NLZJZ	Used by Google Analytics to persist session state.	400 days	missing translation: type.	Google Analytics
_gid	Used to distinguish users.	1 days	missing translation: type.	Google Analytics
_ScCbts	Stores temporary session or playback preferences.	6 days	missing translation: type.	Spotify
_scid	Spotify advertising ID used for analytics and retargeting.	395 days	missing translation: type.	Spotify
_scid_r	Spotify advertising ID used for analytics and retargeting.	395 days	missing translation: type.	Spotify
eupubconsent-v2	Stores the IAB Transparency & Consent Framework string.	364 days	missing translation: type.	IAB / Spotify
OptanonAlertBoxClosed	Saves the state of your data protection consent.	364 days	missing translation: type.	OneTrust
OptanonConsent	Saves the state of your data protection consent.	365 days	missing translation: type.	OneTrust
sp_adid	Spotify advertising identifier.	365 days	missing translation: type.	Spotify
sp_landing	Tracks which page the user landed on within Spotify.	1 days	missing translation: type.	Spotify
sp_m	Stores the user’s market region (Spotify).	399 days	missing translation: type.	Spotify
sp_t	Session token used for Spotify playback and access.	365 days	missing translation: type.	Spotify

Events

Ruse of Reuse

Thursday, 5th March

Friday, 6th March

Information