Evaluating Automatic Pronunciation Scoring with Crowd-sourced Speech Corpus Annotations

Nils Hjortnaes; Daniel  Dakota; Sandra Kübler; Francis Tyers

doi:10.3384/ecp211006

Authors

Nils Hjortnaes Indiana University
Daniel Dakota Indiana University
Sandra Kübler Indiana University
Francis Tyers Indiana University

DOI:

https://doi.org/10.3384/ecp211006

Keywords:

Feedback, Speech Recognition, ASR, Pronunciation

Abstract

Pronunciation is an important, and difficult aspect of learning a language. Providing feedback to learners automatically can help train pronunciation, but training a model to do so requires corpora annotated for mispronunciation. Such corpora are rare. We investigate the potential of using the crowdsourced annotations included in Common Voice to indicate mispronunciation. We evaluate the quality of ASR generated goodness of pronunciation scores through the Common Voice corpus against a simple baseline. These scores allow us to see how the Common Voice annotations behave in a real use scenario. We also take a qualitative approach to analyzing the corpus and show that the crowdsourced annotations are a poor substitute for mispronunciation annotations as they typically reflect issues in audio quality or misreadings instead of mispronunciation.

Evaluating Automatic Pronunciation Scoring with Crowd-sourced Speech Corpus Annotations

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License