The main aims of the project DeBiASED (Detecting Biases in the Austrian Everyday Discourse) are detecting, mapping and analysing biases in the Austrian Media Landscape, exemplified on the 20-year collection of the Austrian Media Corpus (AMC), to-date the largest German corpus of its kind. The objectives are to explore this large, diachronic corpus of Austrian-German media to detect cognitive and systemic biases, while simultaneously shedding light on linguistic, cultural, political, sociological and geographical aspects. The methodological approach comprehends the development and application of Machine Learning (ML) and Natural Language Processing (NLP) tools to build language models, based on Word Embeddings and other Neural network Architectures called “Transformers”, besides more traditional linguistic, lexic and lexicographical analyses. It also involves crafting topical sets of word analogies to detect biases with these linguistic models.
Modern linguistics, as long as the most complex topics in machine learning today are based on language models created for natural language processing (NLP) tasks. The models and sets of analogies created in the scope of the project can be used in a wide variety of tasks; ranging from measurement of quality in corpora, evaluation of language models and also in syntactic, semantic and pragmatics research for contemporary German. Not only academic researchers, but also citizens can benefit from the availability of german language models, dictionaries, the mapping of regional variations of language, diachrony studies, words statistics, collocations and so on. The power of the embeddings models is to summarize and make available a wide amount of information that would not be accessible otherwise.