The Interaction of Personal Data, Intellectual Property and Freedom of Expression in the Context of Language Research


  • Aleksei Kelli
  • Krister Lindén
  • Pawel Kamocki
  • Kadri Vider
  • Penny Labropoulou
  • Ramūnas Birštonas
  • Vadim Mantrov
  • Vanessa Hannesschläger
  • Riccardo Del Gratta
  • Age Värv
  • Gaabriel Tavits
  • Andres Vutt
  • Esther Hoorn
  • Jan Hajic Charles
  • Arvi Tavast



Personal Data, Copyright, Language Model, Access Rights, Freedom of Expression


Language researchers are usually aware of intellectual property and personal data (PD) requirements. The problem, however, arises when these two legal regimes have conflicting requirements. For instance, when copyright law requires the acknowledgement of the author, but personal data law enshrines the data mini-misation principle. It is a practical question for a language researcher whether he should name the author of the text used for, e.g., building a language model, or follow the data minimisation principle not to name the author. The access right that a data subject has introduces similar conflicts. The question is what the scope of the access right is. Does it cover only processed personal data, or does it extend to data derived from PD? The interaction of the freedom of expression with PD protection entails several problems. The question is whether researchers can publish their research results containing personal data. The General Data Protection Regulation establishes a general framework that needs to be implemented by EU member states. We analyse different implementations based on examples from several EU countries.