Curating a historical source corpus of 20th century patient organization periodicals

Authors

  • Gijs Aangenendt
  • Maria Skeppstedt
  • Ylva Söderfeldt

DOI:

https://doi.org/10.3384/ecp205011

Keywords:

Corpus curation, historical text digitization, OCR processing, patient organizations

Abstract

Acting out Disease: How Patient Organizations Shaped Modern Medicine (ActDisease) explores the history of patient organizations in 20th century Europe. By combining traditional historiographic methods with text mining techniques, the project aims to shed light on how patient organizations co-constructed concepts of and management of disease. Part of the project is to digitize print sources and build a digital corpus for historical text mining. The corpus consists of periodical publications from selected British, French, German and Swedish patient organizations, a type of material that poses a number of challenges in scan quality, layout, and lack of consistency. This paper discusses the technical process of building the ActDisease corpus from digitizing patient organization periodicals to OCR post-processing. It touches upon the methodological questions and challenges of curating a corpus of fragmented and heterogeneous historical source material tailored to a specific project.

Downloads

Published

2024-01-04