Spoken Corpus Linguistics and Open Access: Usability and Technology of VOICE 3.0 Online

Open access (OA) is a gold standard for any language corpus. Yet, after a project’s completion, it is often challenging to keep open-access web applications ‘alive’ long-term, despite the fact that the compilation of (spoken) corpora is time- and cost-intensive. The Tool Gallery 8.1. addresses this challenge by sharing insights of the development and usability of the new web application for the Vienna-Oxford International Corpus of English (VOICE, first released in 2009), developed recently in the VOICE CLARIAH project (2020-2021).

The first day of the Tool Gallery is targeted at researchers, PhD candidates and advanced students interested in working with and analyzing transcribed spoken data. We will introduce VOICE, a one-million-word corpus of spoken English as a lingua franca (ELF) interactions, and then engage with its usability as an open-access tool for linguistic research. We will discuss specific properties of spoken corpora (such as field work, data collection, detailed transcription, conversational mark-up, and metadata) and provide an in-depth introduction of the new VOICE 3.0 Online OA web interface and its functionalities through numerous hands-on activities.

The second day of this ACDH-CH Tool Gallery takes a look behind the scenes: it focuses on the OA technologies used and developed for the new VOICE 3.0 Online web interface. We introduce key properties of VOICE 3.0 XML, outline the process of setting up a local NoSketch Engine to run queries, and provide details on technology stacks and OA software packages. The second day of this Tool Gallery is targeted primarily at researchers, PhD candidates, advanced students and programmers with an interest in building OA web applications for language corpora and related resources. Some technological expertise in corpus linguistics, web design, XML technologies or software development may be advantageous, but is not a prerequisite.

The ACDH-CH Tool Gallery will end with a closing panel on Day 2 where core members of the VOICE CLARIAH project team will answer questions related to project management and implementation, interdisciplinary collaboration and the challenges of planning for long-term OA availability. 


Marie-Luise Pitzl, Ruth Osimk-Teasdale, Stefanie Riegler, Omar Siam, Hannes Pirker, Susanne Zhanial


Prospective participants can register for either Day 1 or Day 2, or participate in both days (two registrations required). 

Register for Day 1

Register for Day 2


Day 1 (Thu, 28 April 2022) 

14.00-14.10 Welcoming words

14.10-14.40 Spoken corpora and the challenge of long-term open access: The case of VOICE (Marie-Luise Pitzl, ACDH-CH)

14.40-15.00 Introducing VOICE: Corpus structure and text properties (Ruth Osimk-Teasdale, JKU; Stefanie Riegler, Uni Wien)

15.00-15.10 The VOICE CLARIAH project: Developing VOICE 3.0 Online (Marie-Luise Pitzl, ACDH-CH; Omar Siam, ACDH-CH)

15.10-15.30 Coffee break

15.30-16.00 Introducing VOICE 3.0 Online

16.00-17.00 Hands-on activities for using VOICE 3.0: Queries, sub-corpora, etc.

17.00-17.15 Closing discussion


Day 2 (Fri, 29 April 2022)

10.00-10.15 Welcome and Summary Day 1

10.15-10.45 VOICE 3.0 XML and NoSketch Engine (Hannes Pirker, ACDH-CH)

10.45-11.15 The technological infrastructure behind VOICE 3.0 Online (Omar Siam, ACDH-CH)

11:15-11.25 Demonstration: Applying VOICE technologies to other data (Omar Siam, ACDH-CH)

11.25–11.45 Discussion: OA technologies of VOICE and re-usability/further applications

11.45-12.15 Coffee break

12.15-13.00 Q&A and closing panel: Challenges of long-term OA for corpora, project management & interdisciplinarity

13.00 Farewell and closing


28 April 2022 – 14:00-17:15

29 April 2022 – 10:00-13:00 


Online via Zoom


Susanne Zhanial