Flexible Metadata Schemes for Research Data Repositories.The Common Framework in Dataverse and the CMDI Use Case


  • Jerry de Vries
  • Vyacheslav Tykhonov
  • Andrea Scharnhorst
  • Eko Indarto
  • Femmy Admiraal
  • Mike Priddy




Metadata Schemes, cmdi, Concepts, Data Repository, Semantic Registry, 5-star Open Data, Linked Open Data, Semantic Web, Data Linkage, Linked Data


In this paper we present an approach called Common Framework, which addresses issues of interoperability and flexibility of metadata schemes as developed by specific scientific communities, and as later supported by domain and cross-domain data repositories. The approach was triggered by a very concrete use case, namely the question how to expose Component Metadata Infrastructure (CMDI) metadata, stored in computational linguistics datasets in the DANSEASY archive, for discovery services. The work in CLARIN to push further for the development of CMDI into a standard (ISO 24622-1:2015, ISO 24622-2:2019) forms part of the background of the use case. We used the Dataverse platform to deliver proof of concepts for various elements of the Common Framework, including the recommendation of standardised elements for Dataverse instances in CLARIN. At the core of the Common Framework is a design which envisions an interaction between different microservices, possibly also hosted by various service providers. Mechanisms of semantic mapping are used throughout a pipeline which starts at a set of existing metadata standards and values at a digital research data repository (Extraction) and their analysis. This leads to an alignment of these metadata standards with others standards (Transformation) and proposes enrichments to be used by other service providers but also to be imported back to the original source (Load). Some modules applied along this pipeline are discussed in detail, together with the challenges this specific use case entails. At the same time, we also stress generic aspects, as we are convinced that this approach can also be applied in other settings, other archival platforms and other domain specific metadata schemes. The high-level goal of this exploration is to explore ways to make research data collections FAIR (Findable, Accessible, Interoperable and Re-usable), and in particular interoperable and re-usable, while preserving the rigour of domain specific indexing practices.