What 300-dimensional Fridges can Tell Us about Language

Sociolinguistics has always been an empirical field. With the availability of large amounts of data, it has met new possibilities, but also (methodological) challenges. Recent advances in machine learning have produced promising approaches to gain new insights and corroborate perceived wisdom.

In this talk, I will give a brief introduction of a method called embeddings, and will show several applications of it. Embeddings are a new way of representing words (a direct implementation of the distributional hypothesis by Firth) as points in in a multi-dimensional vector space. This is not unlike arranging word magnets on a fridge. Each word's position relative to all others is determined by the contextual similarity to all other words, thereby determining semantic and syntactic groupings.
The resulting vector representations of words have turned out to capture a variety of latent factors, from lexical semantics to syntax to socio-demographic aspects to societal attitudes.

The ease of use and the range of applications make embeddings a valuable tool for further research in (computational) sociolinguistics. I will show how they capture regional variation at an intra- and interlingual level, how they distinguish varieties and linguistic resources, and how they allow for the assessment of changing societal norms and associations.

Dirk Hovy

Dirk Hovy is associate professor of computer science at Bocconi University in Milan, Italy. Before that, he was faculty and postdoc in Copenhagen, got a PhD in NLP from USC, and a masters degree in linguistics from Marburg, Germany. He is interested in the interaction between language, society, and machine learning, or what language can tell us about society, and what computers can tell us about language. He has authored over 70 articles on these topics, including 3 best paper awards. He is also the author of a recent book on using text analysis in Python for social science research. Dirk has organized one conference and several workshops (on abusive language, ethics in NLP, and computational social science and sociolinguistics).
He recently received an ERC Starting Grant for a project on demographic factors and bias in NLP models.
Outside of work, Dirk enjoys cooking, running, and leather-crafting.

