VOICE CLARIAH

Integration of VOICE to CLARIAH-AT and update of VOICE Online system infrastructure

This project will ensure the long-term availability of VOICE (Vienna-Oxford International Corpus of English) as an online resource for scholars and university students worldwide. For this purpose, the project has two main aims: the integration of VOICE XML to the CLARIAH-AT infrastructure and an update of the open-access VOICE Online user interface (first released to the academic public in 2009).

Concerning the first aim, the integration of VOICE to CLARIAH-AT will involve minor updates of the XML corpus to the current TEI (Text Encoding Initiative) release and the conversion of the TEI-conform metadata in VOICE XML (in particular corpus header and text headers) to the CMDI (Component MetaData Infrastructure) format of CLARIN. Subsequently, VOICE will be ingested to ARCHE (A Resource Center for the HumanitiEs) for secure long-term archiving.

Concerning the second project goal, i.e. the update of the VOICE Online system infrastructure, the project team and co-opted researchers at both institutions will collaborate in order to mirror the existing VOICE Online User Interface. A newly designed backend making use of NoSketch Engine will be built for both VOICE XML 2.0 and for the part-of-speech tagged version VOICE POS XML 2.0 (first released in 2013). In addition, the project seeks to update and implement a new/updated frontend for VOICE Online and VOICE POS Online relying on current web technologies.

All infrastructures and VOICE Online resources will be open-access tools that can serve as models to be used and adapted for other language resources and newly built (lingua franca) corpora.