Managing Access to Language Resources in a Corpus Analysis Platform

Authors

  • Eliza Margaretha Illig Department of Digital Linguistics, IDS Mannheim, Germany 
  • Nils Diewald Department of Digital Linguistics, IDS Mannheim, Germany
  • Paweł Kamocki Department of Digital Linguistics, IDS Mannheim, Germany
  • Marc Kupietz Department of Digital Linguistics, IDS Mannheim, Germany

DOI:

https://doi.org/10.3384/ecp216.09

Keywords:

corpus access, license management, user rights management, query rewriting, authorization

Abstract

"Corpus query tools are crucial to CLARIN’s mission of facilitating the sharing and use of language data for research. It is a huge challenge for online corpus platforms to manage user access rights for large corpora with complex licenses and heterogeneous restrictions on access methods and purposes. This paper presents an approach to maximize user access to corpus data while protecting rights holders’ legitimate interests. Query rewriting techniques and authorization procedures allow for modelling license terms in detail, enabling broader applications. This offers an alternative to methods that only model a greatest common denominator of licenses, thereby limiting the possibilities for using the data. Our approach constitutes a flexible and extensible corpus license and user rights management component applicable for other language research environments."

Author Biography

Nils Diewald, Department of Digital Linguistics, IDS Mannheim, Germany



Downloads

Published

2025-08-25