Semantic Error Prediction: Estimating Word Production Complexity

Authors

  • David Strohmaier ALTA Institute, University of Cambridge
  • Paula Buttery ALTA Institute, University of Cambridge

DOI:

https://doi.org/10.3384/ecp211016

Keywords:

complexity, lexical semantics, transformers, Bayesian models

Abstract

Estimating word complexity is a well-established task in computer-assisted language learning. So far, however, complexity estimation has been largely limited to comprehension. This neglects words that are easy to comprehend, but hard to produce. We introduce semantic error prediction (SEP) as a novel task that assesses the production complexity of content words. Given the corrected version of a learner-produced text, a system has to predict which content words replace tokens from the original text. We present and analyse one example of such a semantic error prediction dataset, which we generate from an error correction dataset. As neural baselines, we use BERT, RoBERTa, and LLAMA2 embeddings for SEP. We show that our models can already improve downstream applications, such as predicting essay vocabulary scores.

Downloads

Published

2024-10-15