Investigating strategies for lexical complexity prediction in a multilingual setting using generative language models and supervised approaches

Authors

  • Abdelhak Kelious University of Lorraine and CNRS/ATILF
  • Mathieu Constant University of Lorraine and CNRS/ATILF
  • Christophe Coeur Consultant

DOI:

https://doi.org/10.3384/ecp211008

Keywords:

lexical complexity, chatgpt, llm, lexical complexity prediction, Complex Word Identification

Abstract

This paper explores methods to automatically predict lexical complexity in a multilingual setting using advanced natural language processing models. More precisely, it investigates the use of transfer learning and data augmentation techniques in the context of supervised learning, showing the great interest of multilingual approaches. We also assess the potential of generative large language models for predicting lexical complexity. Through different prompting strategies (zero-shot, one-shot, and chain-of-thought prompts), we analyze model performance in diverse languages. Our findings reveal that while generative models achieve high correlation scores, their predictive quality varies. The comparative study illustrates that while generative large language models have potential, optimized task-specific models still outperform them in accuracy and reliability.

Downloads

Published

2024-10-15