Reliability of Automatic Linguistic Annotation: Native vs Non-native Texts
DOI:
https://doi.org/10.3384/ecp18914Keywords:
Automatic Pipelines, Learner Language, Reliability of AnnotationsAbstract
We present the results of a manual evaluation of the performance of automatic linguistic annotation on three different datasets: (1) texts written by native speakers, (2) essays written by second language (L2) learners of Swedish in the original form and (3) the normalized versions of learner-written essays. The focus of the evaluation is on lemmatization, POS-tagging, word sense disambiguation, multi-word detection and dependency annotation. Two annotators manually went through the automatic annotation on a subset of the datasets and marked up all deviations based on their expert judgments and the guidelines provided. We report Inter-Annotator Agreement between the two annotators and accuracy for the linguistic annotation quality for the three datasets, by levels and linguistic features.Downloads
Published
2022-07-08
Issue
Section
Contents
License
Copyright (c) 2022 Elena Volodina, David Alfter, Therese Lindström Tiedemann, Maisa Lauriala, Daniela Piipponen
This work is licensed under a Creative Commons Attribution 4.0 International License.