ACDH-CH Tool Gallery 8.1

Spoken Corpus Linguistics and Open Access: Usability and Technology of VOICE 3.0 Online

Open access (OA) is a gold standard for any language corpus. Yet, after a project’s completion, it is often challenging to keep open-access web applications ‘alive’ long-term, despite the fact that the compilation of (spoken) corpora is time- and cost-intensive. The Tool Gallery 8.1. addresses this challenge by sharing insights of the development and usability of the new web application for the Vienna-Oxford International Corpus of English (VOICE, first released in 2009), developed recently in the VOICE CLARIAH project (2020-2021).

The first day of the Tool Gallery is targeted at researchers, PhD candidates and advanced students interested in working with and analyzing transcribed spoken data. We will introduce VOICE, a one-million-word corpus of spoken English as a lingua franca (ELF) interactions, and then engage with its usability as an open-access tool for linguistic research. We will discuss specific properties of spoken corpora (such as field work, data collection, detailed transcription, conversational mark-up, and metadata) and provide an in-depth introduction of the new VOICE 3.0 Online OA web interface and its functionalities through numerous hands-on activities.

The second day of this ACDH-CH Tool Gallery takes a look behind the scenes: it focuses on the OA technologies used and developed for the new VOICE 3.0 Online web interface. We introduce key properties of VOICE 3.0 XML, outline the process of setting up a local NoSketch Engine to run queries, and provide details on technology stacks and OA software packages. The second day of this Tool Gallery is targeted primarily at researchers, PhD candidates, advanced students and programmers with an interest in building OA web applications for language corpora and related resources. Some technological expertise in corpus linguistics, web design, XML technologies or software development may be advantageous, but is not a prerequisite.

The ACDH-CH Tool Gallery will end with a closing panel on Day 2 where core members of the VOICE CLARIAH project team will answer questions related to project management and implementation, interdisciplinary collaboration and the challenges of planning for long-term OA availability.

Concept

Marie-Luise Pitzl, Ruth Osimk-Teasdale, Stefanie Riegler, Omar Siam, Hannes Pirker, Susanne Zhanial

Programme

Day 1 (Thu, 28 April 2022)

14.00-14.10 Welcoming words

14.10-14.40 Spoken corpora and the challenge of long-term open access: The case of VOICE (Marie-Luise Pitzl, ACDH-CH)

14.40-15.00 Introducing VOICE: Corpus structure and text properties (Ruth Osimk-Teasdale, JKU; Stefanie Riegler, Uni Wien)

15.00-15.10 The VOICE CLARIAH project: Developing VOICE 3.0 Online (Marie-Luise Pitzl, ACDH-CH; Omar Siam, ACDH-CH)

15.10-15.30 Coffee break

15.30-16.00 Introducing VOICE 3.0 Online

16.00-17.00 Hands-on activities for using VOICE 3.0: Queries, sub-corpora, etc.

17.00-17.15 Closing discussion

Day 2 (Fri, 29 April 2022)

10.00-10.15 Welcome and Summary Day 1

10.15-10.45 VOICE 3.0 XML and NoSketch Engine (Hannes Pirker, ACDH-CH)

10.45-11.15 The technological infrastructure behind VOICE 3.0 Online (Omar Siam, ACDH-CH)

11:15-11.25 Demonstration: Applying VOICE technologies to other data (Omar Siam, ACDH-CH)

11.25–11.45 Discussion: OA technologies of VOICE and re-usability/further applications

11.45-12.15 Coffee break

12.15-13.00 Q&A and closing panel: Challenges of long-term OA for corpora, project management & interdisciplinarity

13.00 Farewell and closing

Date

28 April 2022 – 14:00-17:15

29 April 2022 – 10:00-13:00

Location

Online via Zoom

Contact

Susanne Zhanial

Twitter

#TOOLGALLERY

Name	Purpose	Storage duration	Type	Provider
CookieConsent	Remembers your consent to the use of cookies.	1 year	HTML	Web Consent
fe_typo_user	Assigns your browser to a session on the server. This only affects the content you see and is not evaluated or processed by us	-	HTTP	Web User

Name	Purpose	Storage duration	Type	Provider
_pk_id	Used to store a few details about the user like unique visitor ID.	13 months	HTML	Matomo-id
_pk_ref	Used to store information about the user's referring website.	6 months	HTML	Matomo-ref
_pk_ses	Short-term cookie to save temporary data from the visit.	30 minutes	HTML	Matomo-ses
_pk_cvar	Short-term cookie to save temporary data from the visit.	30 minutes	HTML	Matomo-cvar
_pk_hsr	Short lived cookie used to temporarily store data for the visit.	30 minutes	HTML	Matomo

Name	Purpose	Storage duration	Type	Provider
YouTube	A connection to YouTube will be established to view videos.	-	Connection	YouTube
SoundCloud	A connection to SoundCloud will be established to play audio files.	-	Connection	SoundCloud
Twitter	A connection to Twitter will be established to display tweets.	-	missing translation: type.	Twitter