Evaluating Automatic Pronunciation Scoring with Crowd-sourced Speech Corpus Annotations

Authors

  • Nils Hjortnaes Indiana University
  • Daniel  Dakota Indiana University
  • Sandra Kübler Indiana University
  • Francis Tyers Indiana University

DOI:

https://doi.org/10.3384/ecp211006

Keywords:

Feedback, Speech Recognition, ASR, Pronunciation

Abstract

Pronunciation is an important, and difficult aspect of learning a language. Providing feedback to learners automatically can help train pronunciation, but training a model to do so requires corpora annotated for mispronunciation. Such corpora are rare. We investigate the potential of using the crowdsourced annotations included in Common Voice to indicate mispronunciation. We evaluate the quality of ASR generated goodness of pronunciation scores through the Common Voice corpus against a simple baseline. These scores allow us to see how the Common Voice annotations behave in a real use scenario. We also take a qualitative approach to analyzing the corpus and show that the crowdsourced annotations are a poor substitute for mispronunciation annotations as they typically reflect issues in audio quality or misreadings instead of mispronunciation.

Downloads

Published

2024-10-15