Developing Resources for Measuring Text Readability in Sesotho


  • Johannes Sibeko



Classical readability metrics, Sesotho, Text readability, Low-resourced language, Basic language resources


This article presents a work-in-progress doctoral project that explores measuring text readability in Sesotho, a Bantu language spoken by more than 10 million speakers across Southern Africa. The main project adopts a classical readability formulas approach to text readability analysis. We aim to adapt nine existing readability metrics into Sesotho using English as a higher-resourced helper language. So far, five resources have been developed as part of the study. The rule-based and the TeX-based syllabification systems, the syllable annotated word list, and the grade 12 exam reading comprehension and summary writing corpus have been published on the South African Centre for Digital Language Resources' (SADiLaR) online repository. The machine-translated corpus is still under development. This article describes the progress of the PhD project by overviewing the basic digital language resources developed for the project. The metrics under consideration for adaptation into Sesotho are also briefly discussed.