Reliability of Automatic Linguistic Annotation: Native vs Non-native Texts

Authors

  • Elena Volodina
  • David Alfter
  • Therese Lindström Tiedemann
  • Maisa Lauriala
  • Daniela Piipponen

DOI:

https://doi.org/10.3384/ecp18914

Keywords:

Automatic Pipelines, Learner Language, Reliability of Annotations

Abstract

We present the results of a manual evaluation of the performance of automatic linguistic annotation on three different datasets: (1) texts written by native speakers, (2) essays written by second language (L2) learners of Swedish in the original form and (3) the normalized versions of learner-written essays. The focus of the evaluation is on lemmatization, POS-tagging, word sense disambiguation, multi-word detection and dependency annotation. Two annotators manually went through the automatic annotation on a subset of the datasets and marked up all deviations based on their expert judgments and the guidelines provided. We report Inter-Annotator Agreement between the two annotators and accuracy for the linguistic annotation quality for the three datasets, by levels and linguistic features.

Downloads

Published

2022-07-08