Constructing SABeD: A Spoken Academic Belgian Dutch Corpus
DOI:
https://doi.org/10.3384/ecp210001Abstract
We present the Spoken Academic Belgian Dutch (SABeD) corpus and a description of its construction. It was compiled from selected first bachelor academic lectures in higher education institutions in Flanders, as students indicate that the language used in such lectures is one of the hurdles for comprehension and academic success. We first applied speech recognition on these lectures and then applied manual utterance segmentation and manual correction of the automated transcription. A filtered version of the resulting transcriptions was automatically punctuated and linguistically annotated with CLARIN tools and is currently available for search in the Autosearch online corpus query environment. The manual transcriptions and the ELAN files with the final annotation will soon be made available to the research community for download in the CLARIN infrastructure at http://hdl.handle.net/10032/tm-a2-w4.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Jolien Mathysen, Vincent Vandeghinste, Elke Peters, Patrick Wambacq
This work is licensed under a Creative Commons Attribution 4.0 International License.