Living languages and their varieties are highly dynamic systems that are subject to constant change. Authentic access to earlier stages of spoken linguistic varieties is ideally provided by sound recordings, which are only available to a limited extent for roughly the last 130 years and capture snapshots of older language states. Such recordings must be preserved and made accessible.
The project to process and make accessible the corpus Austrian Dialect Recordings from the 20th Century is carried out in close cooperation between the Phonogrammarchiv (PhA), the Research department “Linguistics” of ACDH-CH (formerly Variation and Change of German in Austria, VaWaDiÖ) and the FWF Special Research Programme German in Austria. Variation – Contact – Perception (DiÖ). The project focuses on the digitization, metadata enrichment and systematization as well as the corpus linguistic processing of a collection of dialect recordings from the years 1951–1995 in order to preserve them in a sustainable way and to make them utilizable and available for the scientific community as well as a broader audience.
The cooperation project will create a digital, searchable corpus in the form of an interdisciplinary online platform. This ensures the optimal exploitability of these linguistically as well as socio-culturally and socio-historically unique holdings for a wide range of research purposes. The digitization of the recordings as well as enriching and systemizing the metadata and making them searchable in the electronic documentation together with the corpus linguistic annotation will make it possible to utilize the recordings in a great variety of research contexts.
The PhA’s dialect recordings from the first half of the 20th century will be published in the edition The Complete Historical Collections (1899–1950).
In the first half of the 20th century, beginning in the early 1900s, the Phonogrammarchiv’s field recordings had been made using the Archiv-Phonograph. However, with the advent of magnetic tape recording technology at the Phonogrammarchiv in the year of 1951, entirely new possibilities became available for phonography-aided field research. From then on, a range of linguists and folklorists have made dialect recordings throughout Austria (and also in neighboring German-speaking areas in Italy, Slovenia, Hungary, Czechia and Romania) in cooperation with the Phonogrammarchiv, where the tapes are archived. These recordings constitute a unique historical source for studying the Austrian dialect landscape after the middle of the 20th century.
The corpus Austrian Dialect Recordings from the 20th Century is a selection of about 2450 recordings (or about 540 hours of audio material) on magnetic tape that was compiled during 2016–2018 and finalized in 2018. The corpus thus comprises about half of the German-language dialect recordings in the Phonogrammarchiv’s holdings.
A substantial part of the corpus consists of the subcorpus Tonaufnahmen österreichischer Dialekte 1951–1983[Sound Recordings of Austrian Dialects 1951–1983], which was also included in UNESCO’s Memory of Austria register in the fall of 2018. This collection of about 1750 sound recordings on magnetic tape (about 263 hrs.) was created within the framework of a cooperation between the Phonogrammarchiv and the so-called “Wörterbuchkanzlei [dictionary office]” at the ÖAW (i.e. “Kommission zur Schaffung des Österreichisch-Bayerischen Wörterbuches”, after 1969 “Kommission für Mundartkunde und Namenforschung”). In the 1950s and 1960s, under the direction of Maria Hornung and Eberhard Kranzmayer, a large-scale Austria-wide (and occasionally cross-border) recording enterprise was carried out for the phonographic documentation of base dialects through spontaneous language recordings in hundreds of places. Involved field researchers also include Eugen Gabriel, Werner Bauer, Herbert Tatzreiter, and Franz Roitinger. The recording activities were not limited to German-speaking varieties, but also covered the Austrian minority languages Croatian, Romany and Hungarian in Burgenland, Slovenian in Carinthia, and Ladin and local Italian varieties in northern Italy. Other recordings from this period were made in university contexts, for example by Hans Pusch, Alois Brandstetter, Oskar Pausch and Helga Ebner-Hiermanseder, by Walter Graf of the Phonogrammarchiv, the folklorists Károly Gaál, Leopold Kretzenbacher, Hans Haid, Hans Hönigschmied & Josef Kirchner, Elfriede Lies and Franz Lipp, or by Erwin Koch-Emmery, who had fled to Australia in 1938.
The 1970s and 1980s are characterized in the corpus primarily by the field research activities of Werner Bauer, Herbert Tatzreiter, Wilfried Schabus, Franz Patocka, Hermann Scheuringer, Friedrich Bouterwek, Ingeborg Schönhuber (Geyer), Heinz-Karl Stark, Ingrid Bigler-Marschall, Barbara Jocher, Roland Girtler and Günter Lipold, who conducted fieldwork in all Austrian provinces except Vorarlberg. The corpus thus also contains comparative material at a time interval of about 20 years. Maria Hornung is represented in the corpus during this period by recordings from northern Italy, Slovenia, Burgenland and Upper Austria/Bavaria. In order to increase the small number of Viennese recordings in the corpus, Wilfried Schabus’s interviews with older Viennese citizens (1992–1995) were also included.
Preparing the corpus
The preparation of the corpus is mainly concerned with the digitization of analogue materials, the enrichment, granularization and potential correction of metadata, as well as the transcription and corpus linguistic annotation of the recordings. Digitizing the recordings, enriching and systemizing the metadata and making them searchable in the electronic documentation will make it possible to utilize the recordings in a great number of scholarly contexts.
Traditional sound carriers such as wax cylinders or audio tapes are subject to natural decay: once a sound carrier can no longer be played, the recordings stored on it are lost forever. Moreover, such sound carriers require special playback devices which are nowadays available in functioning condition only in specialized institutions. It is therefore essential to digitize perishable audio documents in order to preserve the linguistic data and contents in the long term and make them available for future generations.
Of the approx. 2450 recordings in the corpus, only about a third were already available in digitized form, so that several hundred tapes had to be digitized and segmented. In addition, it was found that many digital copies of tapes that were already available had not been segmented, and we also had to include their segmentation in the workflow, in addition to creating a technical documentation of the digitization procedures. The work in question has since been completed.
For optimal searchability of the corpus, it is necessary to create a carefully designed and well-structured metadata description. Due to personnel-related factors, many data sets in the PhA’s database were only in a rudimentary state, so that incomplete entries had to be amended manually by falling back on the original handwritten or typewritten documentation. Moreover, about half of the recordings in the corpus were not documented separately, but only represented in metadata bundles in which the metadata of several recordings were lumped together, so that the metadata in such a bundle entry could no longer be associated with the respective individual recordings. For granularizing bundle entries, we developed a matrix tool.
Since the timelines in the original protocols of recordings (indicating what happens when in a recording) often do not start at the beginning of the respective recording but at the beginning of the tape reel containing it (which usually contains several other recordings), we had to correct the time markers in about 900 protocols and align them with the sound files.
Due to the large mass of data to be added and technical metadata yet to be determined, various strategies were developed to simplify the respective tasks, such as batch processing to determine and tabulate audio metadata, and importing large amounts of data into the database from spreadsheets.
The areallinguistic aspect of the corpus required a systematic georeferenced representation of the dialect points and other localities. Thanks to a dataset provided by Statistik Austria containing the geodata and official designations of nearly 17300 localities in Austria, it was possible to implement a hierarchical structure of the toponyms in the electronic documentation according to the administrative subdivisions that can also serve as the basis of a cartographic representation. The extrahierarchical assignment of places to cultural regions or dialect areas could not be implemented so far due to budgetary reasons.
Since several required procedures were not feasible in the PhA’s original database, an excerpt was created from it containing only the recordings of the corpus, which was then modified accordingly. However, care was taken to preserve structural integrity where possible in order to facilitate a planned later retransfer of the data to the PhA database.
In the extant documentary materials, transcripts of recordings are only available to a very limited extent. Since the competent creation of a transcript is a time-consuming procedure, the Phonogrammarchiv has started in 2021 to collect transcripts from researchers who use recordings from the corpus in their research, so as to by and by enlarge the number of transcripts. ACDH-CH plans to obtain further transcripts from third-party funded projects dealing with recordings from the corpus. Transcripts from the mid-1970s from various transcribers found in Maria Hornung’s estate were made machine-readable using the Transkribus software. The transcripts will be annotated using DiÖ’s corpus tools.
The next steps
Now that a granularized and enriched systematic electronic documentation is available in the form of PhA’s project database, the common platform combining the PhA’s metadata structures with the corpus linguistic structures of FSB DiÖ is to be tackled. The work to establish the common platform had been delayed so far due to staff shortages at ACDH-CH.
After a delay, the online catalogue of the recordings in the corpus is now in preparation and is planned to be available before the end of 2023.
- Christian Huber (project coordination at Phonogrammarchiv)
- Alexandra N. Lenz, (since 2022) Philipp Stöckle (project coordination at Research dept. “Linguistics” ÖAW und SFB DiÖ) (until 2021: Ludwig Maximilian Breuer)
- Benjamin Fischer (metadata enrichment and systemizing; until 2022)
- Bernhard Graf (audio digitizing)
- Michael Hagleitner (database programming at PhA)
- Stefan Kiesling (metadata enrichment; 2021 and 2022)
- Zoe Kaldor Fox (metadata enrichment; 2021)