Investigating strategies for lexical complexity prediction in a multilingual setting using generative language models and supervised approaches
DOI:
https://doi.org/10.3384/ecp211008Keywords:
lexical complexity, chatgpt, llm, lexical complexity prediction, Complex Word IdentificationAbstract
This paper explores methods to automatically predict lexical complexity in a multilingual setting using advanced natural language processing models. More precisely, it investigates the use of transfer learning and data augmentation techniques in the context of supervised learning, showing the great interest of multilingual approaches. We also assess the potential of generative large language models for predicting lexical complexity. Through different prompting strategies (zero-shot, one-shot, and chain-of-thought prompts), we analyze model performance in diverse languages. Our findings reveal that while generative models achieve high correlation scores, their predictive quality varies. The comparative study illustrates that while generative large language models have potential, optimized task-specific models still outperform them in accuracy and reliability.