The term culturomics first appeared in the famous 2011 article in Science (Michel et al. 2011) that accompanied the launch of the Google Ngram Viewer and was intended as the name for a new field of broad studies in the digital humanities. Although no new academic departments or institutions for culturomics have been established since then, and the approach itself has been widely criticised, the idea of quantitative analysis of large textual data for cultural studies seems to be alive and expanding, overcoming its most controversial problems.
The most commonly cited problems with Google Books data are poor and dubious metadata and a variety of OCR errors. These are mitigated by the use of alternative corpora (Morin & Acerbi 2017) or vector models (Chalesworth, Caliskan, Banaji 2022), which can greatly increase the number of words to be examined. But the basic idea itself – viewing language as a sequence of tokens considered sufficient to capture cultural textual traces – is taken for granted and not questioned.
In my talk, I will try to bring in a slightly more complicated understanding of a language by drawing on the theoretical linguistic foundations of construction grammar to show how the study of constructional distribution in textual data can enrich culturomic studies.
Anastasia Bonch-Osmolovskaya is a linguist and digital humanist, founder of DH CLOUD Community, Fellow of ACDH-CH.
Her interests lie in corpus linguistics, digital preservation, computational literary studies and computational lexicography. Anastasia has worked on the National Russian Corpus and the digital edition of Tolstoy’s legacy.