CLARIN Annual Conference

https://ecp.ep.liu.se/index.php/clarin/issue/feed CLARIN Annual Conference 2026-06-29T14:01:28+02:00 CLARIN clarin@clarin.eu Open Journal Systems <p>CLARIN, the Common Language Resources and Technology Infrastructure, is a virtual platform for everyone interested in language. CLARIN offers access to language resources, technology, and knowledge, and enables cross-country collaboration among academia, industry, policy-makers, cultural institutions, and the general public. Researchers, students, and citizens are offered access to digital language resources and technology services to deploy, connect, analyse and sustain such resources. In line with the OpenScience agenda, CLARIN enables scholars from the Social Sciences and Humanities (SSH) and beyond to engage in and contribute to cutting-edge, data-driven research driven by language data.</p> https://ecp.ep.liu.se/index.php/clarin/article/view/1595 Results of Crowdsourcing Human Evaluation of Synthetic Simplification 2026-06-29T14:01:07+02:00 Vincent Vandeghinste Bram Vanroy Job van Doeselaar

This paper describes the creation of a set of crowd-sourced human evaluations of automated simplifications. We created a synthetic simplification dataset which was assessed by the crowd on dimensions of fluency, clarity, accuracy and complexity. We briefly describe the crowdsourcing environment and some techniques to keep the crowd engaged. This resulted in a dataset of nearly 25,000 responses from 384 respondents on synthetically simplified Dutch data. In order to mitigate the low inter-annotator agreement we investigate the effect of outlier removal techniques on Krippendorf α. The main conclusions stay the same: the synthetic texts are not simpler according to the linguistic proxies investigated, but the crowd judged that in about 80% of the cases the synthetic text was more clear than the original and less complex in about 70% of the cases. The fluency was the same for original sentences versus synthetic sentences, and in the case of accuracy the numbers changed most after outlier removal. A medium accuracy was achieved in about 80%, a high accuracy in 50% before outlier removal and 60% after outlier removal. Both the synthetic parallel data as well as the crowd assessments are made available to the CLARIN infrastructure.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Vincent Vandeghinste, Bram Vanroy, Job van Doeselaar https://ecp.ep.liu.se/index.php/clarin/article/view/1596 Methodology for Converting and Publish Tabular Data into SKOS Resources via Python Notebooks 2026-06-29T14:01:09+02:00 Michele Mallia Fahad Khan Silvia Calvi Klara Dankova

This paper presents a methodology for creating and converting tabular data into SKOS linguistic resources using Python notebooks. Designed to support users with limited technical skills, the approach offers a structured and reproducible process through interactive notebooks. The methodology covers metadata preparation, data scraping from repositories, information mapping, and data normalization for standardized vocabulary publication. Various specialized multilingual vocabularies, including those related to textile description and smart city terminology, were analyzed to evaluate the approach. Guidelines were also developed to optimize LOD environment deployment, including resource uploading and web application configuration. The methodology supports linguistic data management such as Linked Open Data, offering a platform for hosting SKOS resources and training users to create structured data efficiently. The system was tested through external user engagement, demonstrating its scientific relevance and practical utility.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Michele Mallia, Fahad Khan, Silvia Calvi, Klara Dankova https://ecp.ep.liu.se/index.php/clarin/article/view/1597 Govorjena slovenščina: A Platform for the Structured Collection of Conversational Speech 2026-06-29T14:01:10+02:00 Darinka Verdonik Andreja Bizjak Gregor Donaj

Conversational spoken language resources remain underrepresented despite their importance for linguistic research and speech technology development. Their collection is constrained by methodological, technical, and legal challenges, particularly in the case of private, everyday interaction. This paper presents Govorjena slovenščina, a CLARIN.SI service developed to support the systematic collection, transcription, and archiving of spoken Slovene. The platform integrates standardized workflows for consent management, metadata collection, data submission, and transcription, and is designed as a reusable research infrastructure rather than a project-specific collection tool. Drawing on insights from citizen-science and crowdsourcing initiatives, the paper discusses key design choices related to participant motivation, data quality, and legal governance, and reports on initial corpora contributions enabled by the platform. The approach demonstrates how a standards-driven, legally grounded infrastructure can support the sustainable development of conversational speech resources, particularly for smaller or under-resourced languages.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Darinka Verdonik, Andreja Bizjak, Gregor Donaj https://ecp.ep.liu.se/index.php/clarin/article/view/1598 Analysing Speech Acts in Organisational Communication 2026-06-29T14:01:11+02:00 Marcus Grattan Andrea Fried Arne Jönsson

The research presented in this paper addresses the challenges of analysing organisational stakeholder communication using speech act theory, supported by a novel analytical model. We present a large, automatically annotated dataset focused on present- and future-oriented speech acts in Swedish organisational communication, specifically with regard to the information security standard ISO/IEC 27001. To construct and evaluate the dataset, we employ a hybrid annotation pipeline combining quantised large language models, retrieval-augmented prompting, and a finetuned multilingual XLM-RoBERTa classifier, which is evaluated against a manually annotated gold standard. The classifier is subsequently applied to a large corpus of corporate texts from Swedish companies’ websites, enabling comparative analysis between general corporate communication and ISO-focused discourse. The results show that ISO/IEC 27001 communication is dominated by assertive speech acts, with commissives playing a secondary role, reflecting an emphasis on factual compliance rather than aspirational or promotional claims. Beyond its empirical findings, the study contributes a reusable dataset and classifier to the CLARIN infrastructure, demonstrates how CLARIN K-centres can support social science research, and provides insights into the use of quantised large language models for speech act classification in mid-resource language settings.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Marcus Grattan, Andrea Fried, Arne Jönsson https://ecp.ep.liu.se/index.php/clarin/article/view/1599 CLARIAH-AT: Back (and) to the Future 2026-06-29T14:01:12+02:00 Tanja Wissik Walter Scholger Kerstin Klenke Vesna Lusicky Matej Durco Martina Trognitz Seta Štuhec Elisabeth Steiner Alexandra N. Lenz Markus Pluschkovits Anna Woldrich

This contribution presents the CLARIAH-AT consortium, which coordinates and drives Austrian activities in the European research infrastructures CLARIN and DARIAH and supports the development of digital humanities in Austria. The paper describes the evolution from the early CLARIN-AT activities to the formal establishment of the CLARIAH-AT consortium in 2019. The paper outlines the current organisational structure and infrastructure, including CLARIN B-centres and K-centres that provide repositories, services, and expertise for researcher in the humanities and beyond. It further highlights community-building measures such as funding schemes, training initiatives, and dissemination activities. Furthermore, some CLARIAH-AT funded projects are described in more detail. The paper concludes with an outlook on future developments.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Tanja Wissik, Walter Scholger, Kerstin Klenke, Vesna Lusicky, Matej Durco, Martina Trognitz, Seta Štuhec, Elisabeth Steiner, Alexandra N. Lenz, Markus Pluschkovits, Anna Woldrich https://ecp.ep.liu.se/index.php/clarin/article/view/1600 Reading LEMI: A Corpus-Based Tool to Support Literacy in an Under-Resourced Language 2026-06-29T14:01:13+02:00 Karla Csuros Madalina Chitez Aura Cristina Udrea Mihai Dascalu Roxana Rogobete Andreea Dinca

LEMI (Lectură pentru mine, ’Reading for Me’) is a corpus-based literacy support platform for Romanian, an under-resourced language in which nearly 42% of students aged 6–15 are classified as functionally illiterate despite a general literacy rate of 99%. The platform combines a curated repository of 250 children’s literary texts with an automatic readability analysis tool based on four interpretable linguistic parameters: Average Sentence Length (ASL), Percentage of Complex Words (PCW), Unique Content Word Density (UCWD), and Word Diversity (WD), combined through a weighted scoring formula designed to avoid the syllable-based biases of English readability metrics and better suited to the morphological complexity of Romanian. This paper presents LEMI’s design and documents its application across five empirical contexts: classroombased validation with primary school students, linguistic complexity assessment of school textbooks, contrastive analysis of readability in translated versus original children’s literature, lexical modernization of canonical literary texts, and in-service teacher training for corpus-informed text adaptation. Across these contexts, LEMI has functioned as a diagnostic, benchmarking, and decision-support tool, producing interpretable outputs that complement rather than replace professional judgment. The platform adheres to FAIR data principles and implements a metadata schema supporting federated content search and reuse across European research infrastructures. By making both tools and data openly available, LEMI contributes a transparent, reusable resource for readability research and literacy support in an under-resourced language.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Karla Csuros, Madalina Chitez, Aura Cristina Udrea, Mihai Dascalu, Roxana Rogobete, Andreea Dinca https://ecp.ep.liu.se/index.php/clarin/article/view/1601 Putting things on top of other things: The ZuMult platform for multimodal corpora and its ecosystem 2026-06-29T14:01:14+02:00 Thomas Schmidt Anne Ferger Elena Frick

We present ZuMult, an open-source platform for multimodal corpora, and its ecosystem. ZuMult has a flexible three-layer architecture that enables integration with different types of backend, and development of frontend applications tailored to the needs of highly diverse user groups. The paper puts a special emphasis on the way that ZuMult is built upon existing best practices, standards and technologies arguing that such an “ecosystem” makes it easier to integrate the technology with different workflows and technical environments.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Thomas Schmidt, Anne Ferger, Elena Frick https://ecp.ep.liu.se/index.php/clarin/article/view/1602 The CLARIN Resource Families Workflow 2026-06-29T14:01:15+02:00 Jakob Lenardič Alexander König Kristina Pahor de Maiti Tekavčič Dieter Van Uytvanck

This paper presents the new workflow for the CLARIN Resource Families (CRF) initiative. After introducing the CRF initiative and laying out its primary aim, we present the functioning of the new CRF backend, which uses a JSON-based workflow publicly accessible on GitHub. We then present the guidelines for working with the JSON files and describing the metadata. We also discuss how the new workflow enables a collaborative approach to maintaining the families. Finally, we discuss how the CRF complements and is complemented by CLARIN ERIC’s Virtual Language Observatory.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Jakob Lenardič, Alexander König, Kristina Pahor de Maiti Tekavčič, Dieter Van Uytvanck https://ecp.ep.liu.se/index.php/clarin/article/view/1603 Multilingual Word Rains: Text Visualisation Built on Cross-Language Word Similarities 2026-06-29T14:01:16+02:00 Magnus Ahltorp Maria Skeppstedt

In this study, the text visualisation technique Word Rain is extended with functionality for generating multilingual word rains. Using multilingual pre-trained word embeddings provided by the MUSE library, we show that cross-language studies on lexical frequency with Word Rain are possible in the same way as has previously been possible for monolingual corpora. We make the programming code for generating multilingual word rains available, and we also provide a web page where visualisations can be produced by simply uploading text files that the user wants to visualise. To illustrate the technique, a tentative study of common word groups and crosslanguage frequency similarities and differences is performed on debates from three parliaments included in the CLARIN ParlaMint corpus.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Magnus Ahltorp, Maria Skeppstedt https://ecp.ep.liu.se/index.php/clarin/article/view/1604 A Trilingual Parallel Corpus of Yiddish, French, and Russian Literary Texts 2026-06-29T14:01:16+02:00 Valentina Fedchenko Assaf Urieli Arnaud Bikard

This article presents a trilingual Yiddish–French–Russian parallel corpus of literary translations, which will be made available through the CLARIN B-centre in France, ORTOLANG. Parallel corpora are fundamental resources for translation studies, contrastive linguistics, and natural language processing. Despite advances in neural machine translation, such corpora remain essential for improving automatic translation tools, particularly for less-resourced languages such as Yiddish. This article describes the phenomenon of translation in Yiddish and the construction of the corpus, including text alignment with the integration of a custom bilingual dictionary. The corpus is integrated into a multilingual concordancer, facilitating data exploration and analysis.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Valentina Fedchenko, Assaf Urieli, Arnaud Bikard https://ecp.ep.liu.se/index.php/clarin/article/view/1605 A Digital Humanism perspective on providing language resources to CLARIN in an age of AI commodification: The case of UniTermGPT 2026-06-29T14:01:17+02:00 Barbara Heinisch

The accelerated uptake of large language models (LLMs) has intensified the demand for highquality language resources, while simultaneously reshaping the political economy of language data. Research infrastructures such as CLARIN play a central role in advancing open science by providing FAIR-compliant access to language resources, yet they also operate within a context increasingly characterized by AI-driven commodification and extractive data practices. This paper adopts a Digital Humanism perspective to examine the ethical implications of providing language resources to CLARIN in this context, using the UniTermGPT project as an example. UniTermGPT compiles and annotates German-language university corpora and terminology across Austrian, German, Swiss and South Tyrolean varieties to address limitations of large language models in handling language-variety-specific terminology. By discussing the types of data produced by UniTermGPT, their integration into CLARIN, this paper discusses the challenges associated with openness, attribution and potential downstream reuse in opaque AI training pipelines. By situating FAIR and CARE principles within a broader Digital Humanism framework, the paper argues for ethical-by-design approaches to language resource provision that make human labor visible, preserve linguistic diversity and address power asymmetries between public research infrastructures and commercial AI actors.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Barbara Heinisch https://ecp.ep.liu.se/index.php/clarin/article/view/1606 An infrastructure for Historical Dutch Corpus Development 2026-06-29T14:01:18+02:00 Katrien Depuydt Jesse De Does Vincent Prins Mathieu Fannee Roland de Bonth Thomas Haga

We describe an infrastructure for linguistic annotation of historical Dutch texts, consisting of tagging and lemmatisation guidelines, gold standard data, trained tagging models, the GaLAHaD platform for automatic linguistic annotation and evaluation and the LAnCeLoT manual annotation tool. Users can upload unannotated materials to the GaLAHaD platform, where trained PoS taggers and lemmatizers (including modern deep learning models) annotate the data. Results can be evaluated against newly expanded gold standard corpora (13th to 19th centuries) with various metrics, then exported for further analysis. The LAnCeLoT tool supports manual annotation correction at both type and token level, thus supporting the development of in-domain gold standard material. The infrastructure was developed at the Instituut voor de Nederlandse Taal (INT, Dutch Language Institute) and first released in 2025.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Katrien Depuydt, Jesse De Does, Vincent Prins, Mathieu Fannee, Roland de Bonth, Thomas Haga https://ecp.ep.liu.se/index.php/clarin/article/view/1607 Users’ Experience and Development Priorities for CLARIN.SI 2026-06-29T14:01:19+02:00 Špela Arhar Holdt Katja Meden Taja Kuzman Pungeršek Nikola Ljubešić Tomaž Erjavec

This paper reports the results of a user survey evaluating the services, tools, and broader functioning of the CLARIN.SI research infrastructure. The survey was designed to identify areas for improvement and to support development planning and strategic decision-making for the coming years. The questionnaire included 50 questions across eight thematic sections, covering the full range of user engagement with the infrastructure, from depositing and downloading resources to using tools and attending events. It was distributed to existing and potential users in Slovenia and abroad. Based on 228 responses, the results show that CLARIN.SI is generally highly valued by its community, especially for the reliability of its data, its support for open science, and the usefulness of its resources. The findings also point to a highly interdisciplinary user base and to a substantial group of potential users who have not yet directly engaged with the infrastructure. Although repository services and language tools are widely used and positively evaluated, respondents consistently identified challenges related to resource discovery, metadata clarity, and search functionality. More broadly, the results suggest that the future impact of CLARIN.SI will depend not only on technical development, but also on effective outreach, onboarding, and training, including the provision of concrete use cases and reusable educational materials.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Špela Arhar Holdt, Katja Meden, Taja Kuzman Pungeršek, Nikola Ljubešić, Tomaž Erjavec https://ecp.ep.liu.se/index.php/clarin/article/view/1608 The AI Act and its impact on Large Language Models and the CLARIN Infrastructure 2026-06-29T14:01:20+02:00 Pawel Kamocki Joanna Blochowiak Anna Gosławska Henk van den Heuvel Erik Ketzan Krister Lindén Costanza Navarretta Andrius Puksas German Rigau

The EU Artificial Intelligence Act (Regulation 2024/1689), which entered into force in August 2024, represents the world’s first comprehensive regulatory framework for AI. While it explicitly excludes AI systems developed solely for scientific research and development, its implications for the CLARIN community are far-reaching, especially given the growing reliance on large language models (LLMs) and general-purpose AI (GPAI). This paper discusses the scope of research exemptions in the AI Act (Section 2), and provides an overview of the obligations related to transparency of AI-generated text outputs (Section 3), as well as those imposed on providers of General-Purpose AI models, including those with systemic risk (Section 4). Given CLARIN’s role as a data provider, special attention is paid to the requirement for dataset documentation.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Pawel Kamocki, Joanna Blochowiak, Anna Gosławska, Henk van den Heuvel, Erik Ketzan, Krister Lindén, Costanza Navarretta, Andrius Puksas, German Rigau https://ecp.ep.liu.se/index.php/clarin/article/view/1609 Advancing FAIR language data through training and capacity building: the Swiss contribution to CLARIN 2026-06-29T14:01:21+02:00 Joanna Blochowiak Cristina Grisot

The CLARIN-CH ecosystem is unique in its approach to language resources, as it brings together a diverse range of research infrastructures distributed across the various language regions of Switzerland. In a larger European context, we present in this paper the CLARIN-CH training program, which plays a central role in Switzerland’s efforts to advance sustainable Open Research Data practices within the evolving landscape of Open Science. The program is structured around five principal axes of action: (i) assessing the needs of researchers across disciplines who work with language data, both within linguistics and beyond; (ii) collaborating with stakeholders, including researchers and training providers, to develop offerings tailored to these specific needs; (iii) organizing diverse training events using a variety of educational formats; (iv) co-developing Open Educational Resources aligned with the FAIR-by-design methodology; and (v) ensuring the publication and long-term accessibility of these resources via Zenodo for the CLARIN-CH community. In this contribution, we outline these core components of the training program and showcase three training initiatives delivered through multiple educational formats. The long-term goal of the CLARIN-CH training program is to contribute to capacity building among researchers working with language data, by promoting a spirit of lifelong learning and interdisciplinary collaboration. As such, Switzerland significantly contributes to CLARIN’s efforts put into disseminating knowledge and building capacity about language resources and promote best practices aligned with FAIR and Open Research Data principles.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Joanna Blochowiak, Cristina Grisot https://ecp.ep.liu.se/index.php/clarin/article/view/1610 A resource for lexical variation in a pluri-areal context: Introducing LexAT21, the atlas on lexical variation in Austria in the 21st century 2026-06-29T14:01:22+02:00 Markus Pluschkovits Daniel Schopper Anja Wittibschlager Jakob Bal Kilian Kukelka Olivia Reichl

The following contribution introduces LexAT21, the atlas on lexical variation in Austria in the 21st century. The linguistic context of this project is briefly covered in section one, suggesting that areal and social/vertical variation is characteristic for the German language in Austria. This is then followed by a brief description of the dataset of the first two survey rounds of LexAT21 and a short sample analysis, showing the potential of the dataset for linguistic research. We then turn to integration into the CLARIN context. The contribution closes by considering the issue of representing different varieties of languages in standards.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Markus Pluschkovits, Daniel Schopper, Anja Wittibschlager, Jakob Bal, Kilian Kukelka, Olivia Reichl https://ecp.ep.liu.se/index.php/clarin/article/view/1611 Towards FAIR Metadata for Specialised Corpora: A Community-Informed Empirical Study of Schema Development in Two Communities 2026-06-29T14:01:23+02:00 Egon W. Stemle Alexander König Nannan Liu Jennifer-Carmen Frey Hubert Naets Magali Paquot Mariachiara Russo

This paper investigates the development of domain-specific metadata schemata to support FAIR data management within specialised research communities. We report on two case studies, the Core Metadata Schema for Learner Corpora (LC-meta) and the metadata schema developed for interpreting corpora within the Unified Interpreting Corpus (UNIC) project. We address how metadata are described and standardised in these projects, the principles that informed the schema design, and the infrastructural challenges that arise when semantically rich, community-specific metadata are to be represented within generic discovery frameworks. Our findings underline the importance of domain-oriented approaches to metadata standardisation, iterative design and stakeholder involvement, and suggest a more active role for CLARIN in supporting long-term sustainability, technical mediation, and shared stewardship.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Egon W. Stemle, Alexander König, Nannan Liu, Jennifer-Carmen Frey, Hubert Naets, Magali Paquot, Mariachiara Russo https://ecp.ep.liu.se/index.php/clarin/article/view/1612 Interfacing CLARIN with H2IOSC: Metadata Interoperability through Ontology-based Mediation 2026-06-29T14:01:25+02:00 Daniele Melaccio Federico Boschetti Monica Monachini Pietro Sichera

We present an ontology-based mediation approach for integrating CLARIN language resources into the H2IOSC semantic framework, with the aim of enabling semantic interoperability across the Social Sciences and Humanities. Building on CMDI and CLARIN CCR, our work develops a layered semantic alignment workflow in which CLARIN metadata are progressively reinterpreted through multiple mediation steps. In a first layer, CMDI metadata are aligned with CIDOC CRM and SSHOCro, which are extended through a CLARIN domain ontology that formalises distinctions between datasets, tools, and services as reflected in discovery environments such as the Virtual Language Observatory, Resource Families, and the Language Resource Switchboard. In a second layer, these domain-level representations are projected into H2IOSCro, a Marketplaceoriented ontology that introduces entities, attributes, and typed relations required for operational federation within H2IOSC. The resulting ontology-based representation is finally projected onto the Marketplace operational data model, supporting metadata harvesting, normalisation, validation, and exposure through standard mechanisms such as OAI-PMH. By explicitly connecting semantic modelling, ontology mediation, and platform-level integration, this layered approach enhances discoverability, preserves compatibility with existing CLARIN metadata practices, and contributes to FAIR- and EOSC-aligned interoperability.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Daniele Melaccio, Federico Boschetti, Monica Monachini, Pietro Sichera https://ecp.ep.liu.se/index.php/clarin/article/view/1613 Implementing and Promoting Data Citation for CLARIN Resources at FIN-CLARIN 2026-06-29T14:01:26+02:00 Mietta Lennes Ute Dieckmann Martin Matthiesen Tommi Jauhiainen Jussi Piitulainen Krister Lindén

Citation instructions are provided for all corpora in the Language Bank of Finland to promote good reference practices in the use of research data. By utilizing persistent identifiers (PIDs) and curated metadata, uniform citations can be constructed automatically, even for forthcoming resources. This work outlines how citation components such as authors, publication year, versions, and publisher are currently managed in the Language Bank of Finland, and discusses challenges related to metadata accuracy and interoperability. Data citation practices could be harmonized across CLARIN repositories by supporting data citation more directly via CLARIN compliant metadata.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Mietta Lennes, Ute Dieckmann, Martin Matthiesen, Tommi Jauhiainen, Jussi Piitulainen, Krister Lindén https://ecp.ep.liu.se/index.php/clarin/article/view/1614 Synergies between CLARIN-IT and OPERAS-IT within H2IOSC: Monitoring Communities and Orchestrating Digital Services 2026-06-29T14:01:26+02:00 Pietro Sichera Monica Monachini Valeria Quochi Nicola Giampietro Vittoria Fabiani Roberta Bianca Luzietti Roberta Ottaviani Daniele Melaccio Laura Baggiani

This paper presents two components developed within the Humanities and Cultural Heritage Italian Open Science Cloud, H2IOSC, the Observatory and the Marketplace. Together, they support the SSH federated national cluster of Research Infrastructures by combining evidence-based community monitoring with an environment for discovering and combining digital resources and services. The Observatory adopts a mixed methods approach, integrating quantitative and qualitative instruments to document researchers’ practices, needs, and expectations across partially overlapping domains. The Marketplace provides a catalogue of services and a workflow environment based on controlled ingestion, semantic normalisation, and interoperability with external sources. We also describe how the two components interact, linking community analysis with service provision. This interaction shows how CLARIN-IT and OPERAS-IT work together within H2IOSC to monitor uptake and inform service development.

2026-06-29T00:00:00+02:00 Copyright (c) 2026 Pietro Sichera, Monica Monachini, Valeria Quochi, Nicola Giampietro, Vittoria Fabiani, Roberta Bianca Luzietti, Roberta Ottaviani, Daniele Melaccio, Laura Baggiani https://ecp.ep.liu.se/index.php/clarin/article/view/1615 UPSKILLS Two Years on: Teaching about Language Resources and FAIR Data Principles with CLARIN 2026-06-29T14:01:28+02:00 Iulianna van der Lek Maja Miličević Petrović Silvia Bernardini Adriano Ferraresi Olga Arsic

The ongoing technological developments have a major impact on language-related research and professional activities. This highlights an increasing demand for new skills and profiles, as well as a greater focus on making language data and resources open and FAIR. In this paper, we present long-term efforts that address these issues, starting from the Erasmus+ project UPSKILLS (2020– 2023). We specifically focus on follow-up activities at the University of Bologna in collaboration with CLARIN ERIC, proposing a conceptual framework that combines progressive competence development across all study levels with infrastructure-based pedagogy to support disciplinespecific Open Science education. Using CLARIN as a learning environment, lecturers can modernize curricula, improve students’ FAIR data literacy, and improve graduate employability.