Investigating Acoustic Correlates of Whisper Scoring for L2 Speech Using Forced alignment with the Italian Component of the ISLE corpus

Authors

  • Nicolas Ballier LLF & CLILLAC-ARP, Universit´e Paris Cit´e, PARIS
  • Adrien Méli CLILLAC-ARP, Universit´e Paris Cit´e, PARIS

DOI:

https://doi.org/10.3384/ecp211002

Keywords:

L2 speech, audio LLM, Whisper, acoustic correlates, vowels

Abstract

This paper analyses how global phonetic analyses of learner data can be used to confirm Whisper probability scores assigned to learner phonetic data. We explore the Italian component of the ISLE corpus with phonetic analyses of 23 learners of English. Using a C++ wrapper of the Whisper models, we investigate the probability scores assigned by Whisper's tiny model. We discuss the phonetic features that may account for these Whisper predictions using P2FA-forced alignment. We try to correlate the quality of the phonetic realisation (measured using Levenshtein distance to the read text) to global vocalic measurements such as the convex hull or Euclidian distances between monophthongs. We show that Levenshtein distance to the reference transcription of the Whisper tidy model correlates with the grades assigned by the annotators and partially to the accuracy of the classification of monophthongs using the k-NN algorithm.

Downloads

Published

2024-10-15