amc – austrian media corpus

CC-BY 4.0, Sandra Lehecka

The Austrian Media Corpus (amc), created as part of a public-private cooperation between the Austrian Academy of Sciences and the Austrian Press Agency (APA), is a unique and significant language resource. The collection covers the entire Austrian media landscape of the past two decades, comprising a wide range of text types which can be classified as journalistic prose (Austrian newspapers, magazines, press releases, transcribed television interviews, news stories from television, etc.). Altogether, the corpus contains 40 million texts, constituting more than 10 billion tokens. In comparison to other contemporary German language corpora, the amc ranks among the largest collection of its kind. Since this vast amount of data has been integrated into the Academy’s corpus infrastructure in 2012, it has been furnished with elaborate metadata and basic linguistic markup. Currently, this unique language resource is primarily used in linguistically and lexicographically oriented projects.