Towards Comprehensive Definitions of Data Quality for Audiovisual Annotated Language Resources
Keywords:data quality, audiovisual data, spoken corpora, TEI
Though digital infrastructures such as CLARIN have been successfully established and now provide large collections of digital resources, the lack of widely accepted standards for data quality and documentation still makes re-use of research data a difficult endeavour, especially for more complex resource types. The article gives a detailed overview over relevant characteristics of audiovisual annotated language resources and reviews possible approaches to data quality in terms of their suitability for the current context. Conclusively, various strategies are suggested in order to arrive at comprehensive and adequate definitions of data quality for this specific resource type and possibly for digital language resources in general.
Copyright (c) 2021 Hanna Hedeland
This work is licensed under a Creative Commons Attribution 4.0 International License.