Curating a historical source corpus of 20th century patient organization periodicals
Keywords:Corpus curation, historical text digitization, OCR processing, patient organizations
AbstractActing out Disease: How Patient Organizations Shaped Modern Medicine (ActDisease) explores the history of patient organizations in 20th century Europe. By combining traditional historiographic methods with text mining techniques, the project aims to shed light on how patient organizations co-constructed concepts of and management of disease. Part of the project is to digitize print sources and build a digital corpus for historical text mining. The corpus consists of periodical publications from selected British, French, German and Swedish patient organizations, a type of material that poses a number of challenges in scan quality, layout, and lack of consistency. This paper discusses the technical process of building the ActDisease corpus from digitizing patient organization periodicals to OCR post-processing. It touches upon the methodological questions and challenges of curating a corpus of fragmented and heterogeneous historical source material tailored to a specific project.
Copyright (c) 2024 Gijs Aangenendt, Maria Skeppstedt, Ylva Söderfeldt
This work is licensed under a Creative Commons Attribution 4.0 International License.