Evaluating Automatic Pronunciation Scoring with Crowd-sourced Speech Corpus Annotations
DOI:
https://doi.org/10.3384/ecp211006Keywords:
Feedback, Speech Recognition, ASR, PronunciationAbstract
Pronunciation is an important, and difficult aspect of learning a language. Providing feedback to learners automatically can help train pronunciation, but training a model to do so requires corpora annotated for mispronunciation. Such corpora are rare. We investigate the potential of using the crowdsourced annotations included in Common Voice to indicate mispronunciation. We evaluate the quality of ASR generated goodness of pronunciation scores through the Common Voice corpus against a simple baseline. These scores allow us to see how the Common Voice annotations behave in a real use scenario. We also take a qualitative approach to analyzing the corpus and show that the crowdsourced annotations are a poor substitute for mispronunciation annotations as they typically reflect issues in audio quality or misreadings instead of mispronunciation.