Contact person PhA: Christian Huber
Contact person Dept. of Linguistics ÖAW and SFB DiÖ: (since 2022) Philipp Stöckle (until 2021: Ludwig Maximilian Breuer)

Living languages and their varieties are highly dynamic systems that are subject to constant change. Authentic access to earlier stages of spoken linguistic varieties is ideally provided by sound recordings, which are only available to a limited extent for roughly the last 130 years and capture snapshots of older language states. Such recordings must be preserved and made accessible.

The project to process and make accessible the corpus Austrian Dialect Recordings from the 20th Century is carried out in close cooperation between the Phonogrammarchiv (PhA), the Research department “Linguistics” of ACDH-CH (formerly Variation and Change of German in Austria, VaWaDiÖ) and the FWF Special Research Programme German in Austria. Variation – Contact – Perception (DiÖ). The project focuses on the digitization, metadata enrichment and systematization as well as the corpus linguistic processing of a collection of dialect recordings from the years 1951–1995 in order to preserve them in a sustainable way and to make them utilizable and available for the scientific community as well as a broader audience.

The cooperation project will create a digital, searchable corpus in the form of an interdisciplinary online platform. This ensures the optimal exploitability of these linguistically as well as socio-culturally and socio-historically unique holdings for a wide range of research purposes. The digitization of the recordings as well as enriching and systemizing the metadata and making them searchable in the electronic documentation together with the corpus linguistic annotation will make it possible to utilize the recordings in a great variety of research contexts.

The PhA’s dialect recordings from the first half of the 20th century will be published in the edition The Complete Historical Collections (1899–1950).

The corpus

In the first half of the 20th century, beginning in the early 1900s, the Phonogrammarchiv’s field recordings had been made using the Archiv-Phonograph. However, with the advent of magnetic tape recording technology at the Phonogrammarchiv in the year of 1951, entirely new possibilities became available for phonography-aided field research. From then on, a range of linguists and folklorists have made dialect recordings throughout Austria (and also in neighboring German-speaking areas in Italy, Slovenia, Hungary, Czechia and Romania) in cooperation with the Phonogrammarchiv, where the tapes are archived. These recordings constitute a unique historical source for studying the Austrian dialect landscape after the middle of the 20th century.

The corpus Austrian Dialect Recordings from the 20th Century is a selection of about 2450 recordings (or about 540 hours of audio material) on magnetic tape that was compiled during 2016–2018 and finalized in 2018. The corpus thus comprises about half of the German-language dialect recordings in the Phonogrammarchiv’s holdings.

A substantial part of the corpus consists of the subcorpus Tonaufnahmen österreichischer Dialekte 1951–1983[Sound Recordings of Austrian Dialects 1951–1983], which was also included in UNESCO’s Memory of Austria register in the fall of 2018. This collection of about 1750 sound recordings on magnetic tape (about 263 hrs.) was created within the framework of a cooperation between the Phonogrammarchiv and the so-called “Wörterbuchkanzlei [dictionary office]” at the ÖAW (i.e. “Kommission zur Schaffung des Österreichisch-Bayerischen Wörterbuches”, after 1969 “Kommission für Mundartkunde und Namenforschung”). In the 1950s and 1960s, under the direction of Maria Hornung and Eberhard Kranzmayer, a large-scale Austria-wide (and occasionally cross-border) recording enterprise was carried out for the phonographic documentation of base dialects through spontaneous language recordings in hundreds of places. Involved field researchers also include Eugen Gabriel, Werner Bauer, Herbert Tatzreiter, and Franz Roitinger. The recording activities were not limited to German-speaking varieties, but also covered the Austrian minority languages Croatian, Romany and Hungarian in Burgenland, Slovenian in Carinthia, and Ladin and local Italian varieties in northern Italy. Other recordings from this period were made in university contexts, for example by Hans Pusch, Alois Brandstetter, Oskar Pausch and Helga Ebner-Hiermanseder, by Walter Graf of the Phonogrammarchiv, the folklorists Károly Gaál, Leopold Kretzenbacher, Hans Haid, Hans Hönigschmied & Josef Kirchner, Elfriede Lies and Franz Lipp, or by Erwin Koch-Emmery, who had fled to Australia in 1938.

The 1970s and 1980s are characterized in the corpus primarily by the field research activities of Werner Bauer, Herbert Tatzreiter, Wilfried Schabus, Franz Patocka, Hermann Scheuringer, Friedrich Bouterwek, Ingeborg Schönhuber (Geyer), Heinz-Karl Stark, Ingrid Bigler-Marschall, Barbara Jocher, Roland Girtler and Günter Lipold, who conducted fieldwork in all Austrian provinces except Vorarlberg. The corpus thus also contains comparative material at a time interval of about 20 years. Maria Hornung is represented in the corpus during this period by recordings from northern Italy, Slovenia, Burgenland and Upper Austria/Bavaria. In order to increase the small number of Viennese recordings in the corpus, Wilfried Schabus’s interviews with older Viennese citizens (1992–1995) were also included.


Preparing the corpus

The preparation of the corpus is mainly concerned with the digitization of analogue materials, the enrichment, granularization and potential correction of metadata, as well as the transcription and corpus linguistic annotation of the recordings. Digitizing the recordings, enriching and systemizing the metadata and making them searchable in the electronic documentation will make it possible to utilize the recordings in a great number of scholarly contexts.


Traditional sound carriers such as wax cylinders or audio tapes are subject to natural decay: once a sound carrier can no longer be played, the recordings stored on it are lost forever. Moreover, such sound carriers require special playback devices which are nowadays available in functioning condition only in specialized institutions. It is therefore essential to digitize perishable audio documents in order to preserve the linguistic data and contents in the long term and make them available for future generations.

Of the approx. 2450 recordings in the corpus, only about a third were already available in digitized form, so that several hundred tapes had to be digitized and segmented. In addition, it was found that many digital copies of tapes that were already available had not been segmented, and we also had to include their segmentation in the workflow, in addition to creating a technical documentation of the digitization procedures. The work in question has since been completed.


For optimal searchability of the corpus, it is necessary to create a carefully designed and well-structured metadata description. Due to personnel-related factors, many data sets in the PhA’s database were only in a rudimentary state, so that incomplete entries had to be amended manually by falling back on the original handwritten or typewritten documentation. Moreover, about half of the recordings in the corpus were not documented separately, but only represented in metadata bundles in which the metadata of several recordings were lumped together, so that the metadata in such a bundle entry could no longer be associated with the respective individual recordings. For granularizing bundle entries, we developed a matrix tool.

Since the timelines in the original protocols of recordings (indicating what happens when in a recording) often do not start at the beginning of the respective recording but at the beginning of the tape reel containing it (which usually contains several other recordings), we had to correct the time markers in about 900 protocols and align them with the sound files.

Due to the large mass of data to be added and technical metadata yet to be determined, various strategies were developed to simplify the respective tasks, such as batch processing to determine and tabulate audio metadata, and importing large amounts of data into the database from spreadsheets.

The areallinguistic aspect of the corpus required a systematic georeferenced representation of the dialect points and other localities. Thanks to a dataset provided by Statistik Austria containing the geodata and official designations of nearly 17300 localities in Austria, it was possible to implement a hierarchical structure of the toponyms in the electronic documentation according to the administrative subdivisions that can also serve as the basis of a cartographic representation. The extrahierarchical assignment of places to cultural regions or dialect areas could not be implemented so far due to budgetary reasons.

Since several required procedures were not feasible in the PhA’s original database, an excerpt was created from it containing only the recordings of the corpus, which was then modified accordingly. However, care was taken to preserve structural integrity where possible in order to facilitate a planned later retransfer of the data to the PhA database.

Transcripts, annotation

In the extant documentary materials, transcripts of recordings are only available to a very limited extent. Since the competent creation of a transcript is a time-consuming procedure, the Phonogrammarchiv has started in 2021 to collect transcripts from researchers who use recordings from the corpus in their research, so as to by and by enlarge the number of transcripts. ACDH-CH plans to obtain further transcripts from third-party funded projects dealing with recordings from the corpus. Transcripts from the mid-1970s from various transcribers found in Maria Hornung’s estate were made machine-readable using the Transkribus software. The transcripts will be annotated using DiÖ’s corpus tools.


The next steps

Now that a granularized and enriched systematic electronic documentation is available in the form of PhA’s project database, the common platform combining the PhA’s metadata structures with the corpus linguistic structures of FSB DiÖ is to be tackled. The work to establish the common platform had been delayed so far due to staff shortages at ACDH-CH.

After a delay, the online catalogue of the recordings in the corpus is now in preparation and is planned to be available before the end of 2023.





Huber, Christian. 2023. “Why it Can be Difficult to Make Historic Language Recordings Accessible: A View from a Corpus of Historic Dialect Recordings.” Proceedings of the 2nd International Workshop on Digital Language Archives (LangArc-2023), ACM/IEEE Joint Conference on Digital Libraries, ed. by Oksana L. Zavalina & Shobhana L. Chelliah. Denton, Texas: University of North Texas.

Huber, Christian & Benjamin Fischer. 2021. Digitising a corpus of Austrian dialect recordings from the 20th century. Digital Lexis and Beyond, ed. by Christina Katsikadeli, Manfred Sellner & Michael Gassner, 38–65. Vienna: Austrian Academy of Sciences Press.

Lenz, Alexandra N., Ludwig Maximilian Breuer, Christian Huber, Benjamin Fischer & Bernhard Graf. 2020. Österreichische Dialektaufnahmen im 20. Jahrhundert. Zur Genese, Aufbereitung und wissenschaftlichen Nutzung eines einmaligen Sprachkorpus. International Forum on Audio-Visual Research – Jahrbuch des Phonogrammarchivs 10, 128–140.

Huber, Christian, Benjamin Fischer & Bernhard Graf. 2019. Corpus of Austrian Dialect Recordings from the 20th Century – A Cooperation Project. HSCR 2019. Proceedings of the Third International Workshop on the History of Speech Communication Research Vienna, September 13-14, 2019 (= Studientexte zur Sprachkommunikation 94), ed. by Michael Pucher, Jürgen Trouvain & Carina Lozo, 31–40. Dresden: TUDpress.

see also Talks


“Tonaufnahmen österreichischer Dialekte 1951–1983” included in the Austrian national “Memory of the World” register of the UNESCO (German)

“Memory of Austria-Register, Tonaufnahmen österreichischer Dialekte 1951-1983 (German)

ORF Science: Geistberger / Wieselberg: Dialektarchiv ist UNESCO-Welterbe (German)

ÖAW: Dialect recordings of the ÖAW in the “Memory of the World” (German)

Audio samples (German)


Project history and prehistory

2012                      Start of activities at PhA for the preprocessing of dialect recordings on magnetic tape by Christian Huber.

2014-2015          Christian Huber member of the SAPID/LISTEN project team to establish an international repository for dialect recordings of European languages.

                                Creation of a corpus of already digitized recordings, development of strategies for metadata granularization and enrichment, development of strategies for ethical and legal handling of the recordings (especially clarification of rights regarding online presentation, potential registration of recordings as orphan works, etc.; together with lawyers of the ÖAW).

2016 (March)     Start of collaboration with the SRP German in Austria (DiÖ) and the former research department Variation and Change of German in Austria, VaWaDiÖ (Alexandra N. Lenz, Ludwig Maximilian Breuer) for the joint preparation of a corpus of historical dialect recordings.

                                Creation of different corpus versions, first digitization work.

2018 (Sept.)       Inclusion of a collection of dialect recordings on magnetic tape from 1951–1983 in UNESCO’s Memory of Austria register.

                                Finalization of the selection of recordings for the corpus to be processed.

2018 (Nov.)        First cooperation agreement with SRP DiÖ and VaWaDiÖ.

                                Increased digitization activities, preparation of metadata enrichment and granularization.

2019 (Jan.)          Start of work on corpus preparation at the Phonogrammarchiv.


Additional literature (selection)

On dialect recordings (and language recordings in general) in the Phonogrammarchiv

Seemüller, Joseph, 1908. Deutsche Mundarten. I. Mitteilungen der Phonogrammarchivs-Kommission 11.

Pollak, Hans Wolfgang. 1913. Die Aufnahme deutscher Mundarten durch das Phonogramm-Archiv der kaiserlichen Akademie der Wissenschaften in Wien. Zeitschrift für Deutsche Mundarten 8, 83–88.

Seemüller, Joseph, 1909. Deutsche Mundarten. 2. Mitteilungen der Phonogrammarchivs-Kommission 15.

Seemüller, Joseph, 1911. Deutsche Mundarten. 3. Mitteilungen der Phonogrammarchivs-Kommission 20.

Pfalz, Anton, 1913. Deutsche Mundarten. 4. Mitteilungen der Phonogramm-Archivs-Kommission 27.

Seemüller, Joseph, 1918. Deutsche Mundarten. 5. Mitteilungen der Phonogrammarchivs-Kommission 48.

Hajek, Leo. 1928. Das Phonogrammarchiv der Akademie der Wissenschaften in Wien von seiner Gründung bis zur Neueinrichtung im Jahre 1927. Mitteilungen der Phonogrammarchivs-Kommission 58.

Hornung, Maria. 1961. Tonaufnahmen im Dienste der Mundartforschung. Zeitschrift für Mundartforschung 28(2), 183–191.

Graf, Walter. 1964. Aus der Geschichte des Phonogrammarchivs der Österreichischen Akademie der Wissenschaften. Bulletin phonographique 6, 9–39.

Schabus, Wilfried. 1999. Die Bestände des Phonogrammarchivs an SprachaufnahmenDas audiovisuelle Archiv 45, 23–32.

Schabus, Wilfried. 2003. „Dazähl’n“ – 100 Jahre Dialektaufnahme in Österreich. Zusammengestellt und bearbeitet von Wilfried Schabus, unter Mitarbeit von Werner Bauer et al. OEAW PHA CD 20, 2003.


Phonogrammarchiv, ÖAW and German Studies in the 1st half of the 20th century

Wahlmüller, Marlene. 2010. Die Akademie der Wissenschaften in Wien. Kontinuitäten und Diskontinuitäten 1938-1945. Diplomarbeit, Universität Wien.

Braun, Jan David. 2015. Das ‚Lautdenkmal reichsdeutscher Mundarten zur Zeit Adolf Hitlers‘ in der ‚Ostmark‘. Geisteswissenschaftliche Gemeinschaftsforschung am Beispiel der Germanistik von 1938 bis 1945. Masterarbeit, Universität Wien.

Kowar, Helmut. 2017. „Die Anlage einer Art phonographischen Archives“ – mehr als ein Archiv. Ein Überblick über die Geschichte des Phonogrammarchivs der Österreichischen Akademie der Wissenschaften. Geistes-, sozial- und kulturwissenschaftlicher Anzeiger 152(1), 5–45.

Feichtinger, Johannes, Katja Geiger & Stefan Sienell. 2022. Die Akademie der Wissenschaften in Wien im Nationalsozialismus und im Kontext der Akademien im „Altreich“. In: Feichtinger, Johannes & Brigitte Mazohl (eds.), Die Österreichische Akademie der Wissenschaften 1847–2022. Eine neue Akademiegeschichte, Band 2, 11–141. Wien: Verlag der Österreichischen Akademie der Wissenschaften.


Dialect recordings and the PhA’s self-image under National Socialism

Ruth, Walter. 1940. Das Phonogrammarchiv der Akademie der Wissenschaften in Wien und seine Aufgaben. Mitteilungen der Phonogrammarchivs-Kommission 72.

Ruth, Walter. 1940. Bericht über die Südtiroler Mundartaufnahmen im Sommer 1940. Mitteilungen der Phonogrammarchivs-Kommission 73.