Building of Parallel and Comparable Cybersecurity Corpora for Bilingual Terminology Extraction
DOI:
https://doi.org/10.3384/ecp18912Keywords:
Bilingual Terminology Extraction, Parallel Corpus, Comparable CorpusAbstract
The paper aims at presenting English-Lithuanian corpora for bilingual term extraction (BiTE) in the cybersecurity domain within the framework of the project DVITAS. It is argued that a system of parallel, comparable, and training corpora for BiTE is particularly useful for less-resourced languages, as it allows efficiently to combine strengths and avoid weaknesses of comparable and parallel resources. A special focus is given to the availability of sources in the cybersecurity domain and issues related to copyright-protected publications, as well as the data curation performed for building the corpora and depositing them to CLARIN-LT repository.Downloads
Published
2022-07-08
Issue
Section
Contents
License
Copyright (c) 2022 Andrius Utka, Sigita Rackevičienė, Aivaras Rokas, Liudmila Mockienė, Marius Laurinaitis, Agnė Bielinskienė
This work is licensed under a Creative Commons Attribution 4.0 International License.