Towards Comprehensive Definitions of Data Quality for Audiovisual Annotated Language Resources

Authors

  • Hanna Hedeland

DOI:

https://doi.org/10.3384/ecp18011

Keywords:

data quality, audiovisual data, spoken corpora, TEI

Abstract

Though digital infrastructures such as CLARIN have been successfully established and now provide large collections of digital resources, the lack of widely accepted standards for data quality and documentation still makes re-use of research data a difficult endeavour, especially for more complex resource types. The article gives a detailed overview over relevant characteristics of audiovisual annotated language resources and reviews possible approaches to data quality in terms of their suitability for the current context. Conclusively, various strategies are suggested in order to arrive at comprehensive and adequate definitions of data quality for this specific resource type and possibly for digital language resources in general.

Downloads

Published

2021-06-22