AI, Data Curation and the Data Readiness of Heritage Collections: Exploring the Swedish Newspaper Archive at KBLab

Authors

  • Justyna Sikora
  • Chris Haffenden

DOI:

https://doi.org/10.3384/ecp205009

Keywords:

Data curation, data readiness, digitized newspaper archives, document AI, digital research infrastructure

Abstract

The increasing availability of digital material and tools for large-scale computational analysis has produced a growing interest in big data approaches in the humanities and social sciences. However, the vital role of data curation as a precondition for such projects remains underappreciated. This paper details the work of KBLab at the National Library of Sweden in testing AI tools to help curate the digitized newspaper archive and make it more amenable to quantitative, machine learning-based research. It provides a description of the library’s newspaper data to offer orientation to researchers interested in the material, before turning to recount the results of our exploration with automated data curation. It concludes by sketching possible next steps for these exploratory efforts, as well as situating this project within a broader recent turn to conceptualize and prioritize the notion of data readiness. Its principal argument is in drawing attention to data curation as an essential part of any digital research project, not something prior to or external from the research process.

Downloads

Published

2024-01-04