Huminfra Conference https://ecp.ep.liu.se/index.php/hic <p>Huminfra is a Swedish national research infrastructure supporting digital and experimental research in the Humanities by providing users with a single entry point for finding existing Swedish materials and research tools, as well as developing national methods courses.</p> <p><a href="https://www.huminfra.se/">https://www.huminfra.se/</a></p> Linköping University Electronic Press en-US Huminfra Conference 1650-3686 Open Brain AI: An AI Research Platform https://ecp.ep.liu.se/index.php/hic/article/view/885 Language assessment is pivotal in identifying therapeutic interventions for speech, language, and communication disorders stemming from neurogenic origins, developmental or acquired, and student performance in the classroom. Traditional assessment techniques, however, are predominantly manual, necessitating extensive time and effort for administration and scoring. Such procedures can exacerbate the stress experienced by patients. In response to these inherent challenges, we introduced Open Brain AI (https://openbrainai.com). This state-of-the-art computational platform leverages advanced AI methodologies, encompassing machine learning, natural language processing, large language models, and automated speech-to-text transcription. These capabilities enable Open Brain AI to autonomously analyze multilingual spoken and written language productions. This work aims to present the development and evolution of Open Brain AI, elucidating its AI-driven language processing components and the intricate linguistic metrics it employs to evaluate the overarching and granular discourse structures. Open Brain AI significantly reduces the workload on researchers, clinicians, and teachers by facilitating rapid and automated language analysis. It allows healthcare and education professionals to optimize their operational processes, reallocating precious time and resources to more personalized user interactions. Moreover, Open Brain AI provides clinicians, researchers, and educators the autonomy to undertake essential data analytics, freeing up more bandwidth to focus on other vital facets of therapeutic intervention and care. Charalambos Themistocleous Copyright (c) 2024 Charalambos Themistocleous https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 1 9 10.3384/ecp205001 Profiles for Swedish as a Second Language: Lexis, Grammar, Morphology https://ecp.ep.liu.se/index.php/hic/article/view/886 This article gives a short introduction to the Swedish Second Language Profile, a tool that visualizes language in Swedish learner corpora from different angles, such as vocabulary, grammar and morphology. The tool is aimed at research on Second Language Acquisition, development of NLP models, teaching of Swedish as a second language, automatic approaches for second language teaching and learning, and at a number of other fields. Elena Volodina David Alfter Therese Lindström Tiedemann Copyright (c) 2024 Elena Volodina, David Alfter, Therese Lindström Tiedemann https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 10 19 10.3384/ecp205002 Digital History and Immaterial Infrastructure: A Bottom-Up Approach https://ecp.ep.liu.se/index.php/hic/article/view/887 This paper argues for an expanded view of research infrastructure. Drawing on our experiences leading the research platform DigitalHistory@Lund, it shows how research capacity can be unlocked “bottom-up”, by providing scholars with comparatively cheap—yet often inaccessible— technological support. By engaging researchers in digitally enabled scholarly practices, the platform yielded a multiplying effect that has seen participants produce highly competitive grant applications and eventually bring home external funding currently worth eight times the platform’s original costs. The platform thus demonstrates the importance of “immaterial” infrastructure in the sense of basic organisational structures that facilitate collaboration and communication. Sune Bechmann Pedersen Marie Cronqvist Kajsa Weber Copyright (c) 2024 Sune Bechmann Pedersen, Marie Cronqvist, Kajsa Weber https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 20 25 10.3384/ecp205003 Documentation of data making, processing and use facilitates future reuse of research data: the CAPTURE project https://ecp.ep.liu.se/index.php/hic/article/view/888 Reuse of research data requires knowing what the data is about but also of how it was created and previously processed, interpreted and used. The major challenges in capturing enough – but not too much – such process information, termed paradata, are to know what to document and how to document it in adequate detail and form. This paper showcases research and findings from the ERC-funded research project CAPTURE, which develops in-depth understanding of how paradata is being created and used today and which elicits and explores methods for capturing paradata. From a research infrastructure perspective, the most challenging question for managing paradata is how to enable and support the creation of paradata that is sufficient, relevant for its future reusers, and not too labour-intensive to produce and maintain. Considering the significant extent to which paradata is coincidental and exists because of the lack of data cleaning and management, a major challenge is also how to strike a balance between too much and too little standardisation. Isto Huvila Stefan Ekman Copyright (c) 2024 Isto Huvila, Stefan Ekman https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 26 30 10.3384/ecp205004 Queerlit – a bibliography of Swedish fiction with LGBTQI topics https://ecp.ep.liu.se/index.php/hic/article/view/889 This paper summarizes the project Queerlit: Metadata and Searchability for LGBTQ+ Literary Heritage 2020-2023 and discusses some challenges in the development of this resource. The Queerlit project consist of four parts: 1. Creating a bibliography of Swedish fiction with LGBTQI themes 2. Creating a Swedish thesaurus (QLIT), adapted from the of the linked open data thesaurus Homosaurus 3. Assigning all material in the bibliography with subject headings from QLIT. 4. A web user interface for searching the material All four parts are integrated with the Swedish union catalog, Libris, making the results of the project available for all under a CC0 license. QLIT is the first external thesaurus integrated in the linked open data framework used in the technical platform of Libris, XL. The bibliography spans from rune stones from the 7th century to recently published fiction. When applying subject headings for the material both general aspects of the work and specific LGBTQI topics are described, making this the most comprehensive retrospective indexing project of Swedish literature to date. The underlying knowledge organization is made a prominent method of interacting with the search interface, which is empirically designed around the needs of various user groups. Siska Humlesjö Jenny Bergenmar Arild Matsson Copyright (c) 2024 Siska Humlesjö, Jenny Bergenmar, Arild Matsson https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 31 35 10.3384/ecp205005 From Zipf distribution to Universal Dependencies – Interactive Notebooks for Swedish Text Analysis https://ecp.ep.liu.se/index.php/hic/article/view/890 Notebook-based environments are powerful (web-based) interactive development resources for conducting exploratory (textual) data analysis (EDA). These environments allow the embedding of code (code snippets in ‛code cells’) which can be easily executed with the results immediately presented into the user’s window. This paper introduces some basic exploratory tools and techniques using JupyterLab notebooks, applied to Swedish using a subcorpus that address various topics related to the COVID-19 pandemic published during January-December 2021. Dimitrios Kokkinakis Copyright (c) 2024 Dimitrios Kokkinakis https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 36 40 10.3384/ecp205006 Tradita innovare, innovata tradere https://ecp.ep.liu.se/index.php/hic/article/view/891 Swedish computational lexicography has a long history at the University of Gothenburg, both in its primary role as a central aspect of the scientific study of vocabulary and also as an infrastructural component for conducting research based on language data. Starting in the 1960s, the Språkdata research group pioneered corpus-supported lexicography for Swedish, forming the basis for successive editions of the two main descriptive dictionaries of contemporary Swedish, SAOL and SO. Language technological lexical resources for Swedish have been developed by the research unit/research infrastructure Språkbanken Text since the turn of the millennium, most recently in the framework of the Swedish FrameNet++ initiative. After two decades of separation, these two largely mutually independently developed strands of computational lexicography have now joined forces under the umbrella of Språkbanken’s lexical research infrastructure to advance the field technically, methodologically, and scientifically. Lars Borin Louise Holmer Copyright (c) 2024 Lars Borin, Louise Holmer https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 41 50 10.3384/ecp205007 Collectio: a software especially designed for creating dynamic libraries for fluid and multilingual text traditions https://ecp.ep.liu.se/index.php/hic/article/view/892 This contribution presents a new software, Collectio, which can be used for creating highly complex relational MySQL databases, or more accurately, dynamic libraries. These libraries prove particularly well-suited for texts where the material has been organized in different ways and thus represents a ‘fluid’ textual tradition, or in traditions transmitted in many languages. So far, two libraries have been created using Collectio: APDB (the Apophthegmata Patrum Database) and HIPPO, which contains pre-modern hippiatric material. The sources included in the libraries are mainly in the form of manuscripts, editions and modern translations. Collectio employs a unique input model, built upon .txt and .csv files stored in an archive in the folder of the library. The contents of the database tables in the master database are generated from these documents. Since not only texts are registered but also the detailed structure and parallel text segments in other sources, both texts and structures can be systematically compared and analysed within and across language boundaries. In addition to the advanced research tools for comparing texts and structures, the application contains search options, indexes of names, places and concepts, metadata on the sources, pre-written SQL commands and more. A new way of encoding text, which can be converted into TEI/XML, is also introduced. Britt Dahlman Copyright (c) 2024 Britt Dahlman https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 51 59 10.3384/ecp205008 AI, Data Curation and the Data Readiness of Heritage Collections: Exploring the Swedish Newspaper Archive at KBLab https://ecp.ep.liu.se/index.php/hic/article/view/893 The increasing availability of digital material and tools for large-scale computational analysis has produced a growing interest in big data approaches in the humanities and social sciences. However, the vital role of data curation as a precondition for such projects remains underappreciated. This paper details the work of KBLab at the National Library of Sweden in testing AI tools to help curate the digitized newspaper archive and make it more amenable to quantitative, machine learning-based research. It provides a description of the library’s newspaper data to offer orientation to researchers interested in the material, before turning to recount the results of our exploration with automated data curation. It concludes by sketching possible next steps for these exploratory efforts, as well as situating this project within a broader recent turn to conceptualize and prioritize the notion of data readiness. Its principal argument is in drawing attention to data curation as an essential part of any digital research project, not something prior to or external from the research process. Justyna Sikora Chris Haffenden Copyright (c) 2024 Justyna Sikora, Chris Haffenden https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 60 67 10.3384/ecp205009 SAOL och svensk språkvetenskaplig infrastruktur – nu och i framtiden https://ecp.ep.liu.se/index.php/hic/article/view/894 Svenska Akademiens ordlista (SAOL 14, 2015 [1]) spelar en viktig roll inom svensk språkvetenskaplig infrastruktur, något som framkommer i denna artikel. Vidare presenteras preliminära resultat av en undersökning av hur frekventa uppslagsorden i SAOL egentligen är i olika delkorpusar med modern allmänspråklig svenska. För att ordlistan även fortsättningsvis ska kunna användas inom svensk ordforskning, vid språkstudier m.m., men också bli mer central inom språkteknologiska sammanhang, är det avgörande att SAOL:s uppslagsord vilar på vetenskaplig grund, moderna språkteknologiska metoder och uppdaterade korpusmaterial. Fokus i artikeln ligger på de uppslagsord som inte finns belagda i korpusmaterialet, och som därmed kan tänkas mönstras ut inför den kommande femtonde upplagan. Louise Holmer Ann Lillieström Emma Sköldberg Jonatan Uppström Copyright (c) 2024 Louise Holmer, Ann Lillieström, Emma Sköldberg, Jonatan Uppström https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 68 75 10.3384/ecp205010 Curating a historical source corpus of 20th century patient organization periodicals https://ecp.ep.liu.se/index.php/hic/article/view/895 Acting out Disease: How Patient Organizations Shaped Modern Medicine (ActDisease) explores the history of patient organizations in 20th century Europe. By combining traditional historiographic methods with text mining techniques, the project aims to shed light on how patient organizations co-constructed concepts of and management of disease. Part of the project is to digitize print sources and build a digital corpus for historical text mining. The corpus consists of periodical publications from selected British, French, German and Swedish patient organizations, a type of material that poses a number of challenges in scan quality, layout, and lack of consistency. This paper discusses the technical process of building the ActDisease corpus from digitizing patient organization periodicals to OCR post-processing. It touches upon the methodological questions and challenges of curating a corpus of fragmented and heterogeneous historical source material tailored to a specific project. Gijs Aangenendt Maria Skeppstedt Ylva Söderfeldt Copyright (c) 2024 Gijs Aangenendt, Maria Skeppstedt, Ylva Söderfeldt https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 76 82 10.3384/ecp205011 On two SweLL learner corpora – SweLL-pilot and SweLL-gold https://ecp.ep.liu.se/index.php/hic/article/view/896 SweLL – Swedish Learner Language – is a unifying term for the infrastructure module for research on Swedish as a Second Language (L2), deployed and maintained as a part of bigger infrastructure of Språkbanken Text at the University of Gothenburg, Sweden. The SweLL infrastructure module consists of a number of learner data collections, and tools for annotation and management of learner data. As a result, many of its components contain the prefix SweLL in their names, which has created some confusion, especially with regards to the two corpora. In this article we shortly introduce the various SweLL-components with a special focus on the differences between the two SweLL corpora. Elena Volodina Copyright (c) 2024 Elena Volodina https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 83 94 10.3384/ecp205012 STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker https://ecp.ep.liu.se/index.php/hic/article/view/897 Föreliggande artikel introducerar STUND, ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker som möjliggör parallella syntaktiska sökningar. Vi demonstrerar dess praktiska tillämpning i en fallstudie på tempusformen presens perfekt i svenska och engelska. Resultaten visar att presens perfekt används i ungefär lika stor utsträckning i båda språken, men att det förekommer viss variation som verkar bero på språkspecifika konventioner och översättningsstrategier. Arianna Masciolini Márton A. Tóth Copyright (c) 2024 Arianna Masciolini, Márton A. Tóth https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 95 109 10.3384/ecp205013 DASH Swedish National Doctoral School in Digital Humanities: From Local Expertise to National Research Infrastructure https://ecp.ep.liu.se/index.php/hic/article/view/898 This paper presents the Swedish National Doctoral School in Digital Humanities: Data, Culture, and Society – Critical Perspectives (DASH) that is run in 2023–2027 by Uppsala University, Umeå University, Linnaeus University, and Gothenburg University. Though Swedish universities have established PhD courses, MA programmes and training in digital humanities previously, DASH is the first encompassing educational programme in digital humanities at the doctoral level. The present paper discusses the rationale behind the DASH doctoral school, its role in the landscape of Swedish humanities infrastructures, and provides insights from the first PhD courses and seminars. The focus of DASH is to equip PhD candidates in humanities and social sciences with knowledge and skills necessary to pursue high quality, innovative and critical research in digital humanities. DASH aims to provide knowledge in relation to digital research, its methods, tools, and critical perspectives, and to build and strengthen the networks among early career scholars. DASH facilitates access and use of the resources in the national infrastructures in the humanities, but also emerges as an element in the infrastructure by providing new resources and competences. Matti La Mela Daniel Brodén Coppélie Cocq Anna Foka Koraljka Golub Clelia LaMonica Jonathan Westin Copyright (c) 2024 Matti La Mela, Daniel Brodén, Coppélie Cocq, Anna Foka, Koraljka Golub, Clelia LaMonica, Jonathan Westin https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 110 114 10.3384/ecp205014 Research stories on Twitter https://ecp.ep.liu.se/index.php/hic/article/view/899 This paper aims to study what type of research seems to interest the users of a social network platform and then complement the data with data from an open catalogue for research, exemplifying with Twitter and Open Alex. The basic idea is to get an overview of the stories the platform content tells during three months regarding topics, disciplines, and open access status. The findings suggest that the picture look very different between the approaches to map the topics, especially when looking at the articles most mentioned compared to the ones that are most retweeted. The study mainly highlights the methodological opportunities of combining text analysis and link relationships to explore the content and public interest in academic research. David G. Lorentzen Gustaf Nelhans Copyright (c) 2024 David G. Lorentzen, Gustaf Nelhans https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 115 121 10.3384/ecp205015 Humanistic AI: Towards a new field of interdisciplinary expertise and research https://ecp.ep.liu.se/index.php/hic/article/view/900 The Gothenburg Research Infrastructure in Digital Humanities (GRIDH) have participated in projects within various humanities fields that utilise as well as develop research tools and infrastructural resources that incorporate applications of ‘artificial intelligence’ (AI). These applications can include natural language processing, machine learning, computer vision, large language models, image recognition algorithms, classification, clustering, and deep learning. This paper advances the term ‘humanistic AI’ to describe an emergent form of interdisciplinary practice that uses and develops AI-based research applications to answer humanities research questions together with its entangled humanistic reflection. We coin this term to make implicit and visible the epistemological and material particularities of its practice and the new forms of knowledge its affordances make possible. The paper presents GRIDH projects within ‘humanistic AI’ together with its developed AI resources and applications. Mats Fridlund David Alfter Daniel Brodén Ashely Green Aram Karimi Cecilia Lindhé Copyright (c) 2024 Mats Fridlund, David Alfter, Daniel Brodén, Ashely Green, Aram Karimi, Cecilia Lindhé https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 122 127 10.3384/ecp205016 Designing digitally-driven integrative interdisciplinarity: Professionalism between protocol and judgement https://ecp.ep.liu.se/index.php/hic/article/view/901 While there is a growing discussion of the importance of developing collaborative workflows for interdisciplinary research within DH, there is a lack of blueprints and consideration of specific expertise. This paper conceptualizes the practice of what we tentatively call digitally-driven integrative interdisciplinary project design in order to highlight a certain professional practice for integrating collaboration between technical expertise and traditional HSS researchers when developing research project applications, digital resources, etc. We begin by highlighting the need for protocol for workflow- oriented approaches to integrative interdisciplinary collaboration, but also an embodied expertise in need of being put into focus in discussions of integrative workflows within digital humanities. Then, we argue that judgement is also a crucial but often overlooked part of the professionalism involved. We conclude by discussing how to further develop the conceptualization of interdisciplinary digital project design and the expertise involved. Daniel Brodén Mats Fridlund Cecilia Lindhé Copyright (c) 2024 Daniel Brodén, Mats Fridlund, Cecilia Lindhé https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 128 134 10.3384/ecp205017 From the Arctics to Antarctica - A multimodular visualisation of data https://ecp.ep.liu.se/index.php/hic/article/view/902 This paper outlines the structure of Multimodal Map, a tool developed at GRIDH to access and visualise place-based datasets. The Multimodal Map frontend, which is developed with a Vue3 framework that fetches data from a backend built in Django, is arranged as a series of distinct and interconnected views that lets the user interact with the material at different scale and level of abstraction. To support the wide variety of formats the different projects need to handle, Multimodal Map makes use of both custom solutions and several open frameworks and libraries. These include Open Layers for the geographical visualisations, OpenSeadragon for IIIF-images, potree.js for point clouds, 3D Heritage Online Presenter (3DHOP) for meshes, and relight-viewer.js for RTI Photography. Jonathan Westin Tristan Bridge Matteo Tomasini Copyright (c) 2024 Jonathan Westin, Tristan Bridge, Matteo Tomasini https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 135 140 10.3384/ecp205018 The DIGARV Platform: A collaborative platform for working with cultural heritage data and research data https://ecp.ep.liu.se/index.php/hic/article/view/903 This article covers an easy-to-use research tool for collaborative work. The tool has been adapted for structured data and high-resolution images within four research projects at GRIDH. The platform is especially designed for working with temporal and spatial data. Furthermore, the platform gives researchers access to a relational database system through input forms and access to external cultural heritage data including high-resolution images. This way the platform also aims to utilize external data published as Linked Open Data (LOD) and, at the same time, prepare its own research data for publishing as LOD. Because of the spatial and temporal nature of the data, it is visualized in time and space through maps and timelines to give overview and context during the data management phase. Johan Åhlfeldt Arild Matsson Copyright (c) 2024 Johan Åhlfeldt, Arild Matsson https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 141 147 10.3384/ecp205019 Samförfattande som datadriven tvärvetenskap: Pragmatiska lärdomar från SweTerror-projektet https://ecp.ep.liu.se/index.php/hic/article/view/904 Terrorism i svensk politik (SweTerror) är ett storskaligt tvärvetenskapligt forskningsprojekt med forskare från såväl human- och samhällsvetenskaperna som datavetenskaperna. Samtidigt använder och utvecklar SweTerror nationell forskningsinfrastruktur för riksdagsdata. Detta paper beskriver användningen av samförfattande som en datadriven tvärvetenskaplig praktik för att integrera olika vetenskapliga perspektiv och skapa samsyn i projektforskningen. Vi tar fasta på betydelsen av valet att koncentrera samarbetsformen kring konferenspapers inom specifikt digital humaniora och diskuterar erfarenheten av att samskrivande försvagar vetenskapligt revirtänkande, liksom ett iterativt förhållningssätt till forskningsdata kopplade till forskningsinfrastrukturer under uppbyggnad. Avslutningsvis betonar vi datadrivet samförfattande som en pragmatisk praktik för att stärka kollaborativt samarbete och kunskapsbryggor inom en tvärvetenskaplig forskargrupp. Daniel Brodén Mats Fridlund Leif-Jöran Olsson Magnus P. Ängsal Patrik Öhberg Copyright (c) 2024 Daniel Brodén, Mats Fridlund, Leif-Jöran Olsson, Magnus P. Ängsal, Patrik Öhberg https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 148 153 10.3384/ecp205020 Accessing centuries of documentation - Resources to improve access to Swedish rock art documentation and metadata https://ecp.ep.liu.se/index.php/hic/article/view/905 The archive of rock art documentation maintained by SHFA provides a valuable resource to archaeologists and others who study rock art. The archive includes images of rock art documentation, sites, and the documentation process, from the 17th century to the more recent high resolution 3D recording and visualizations. In the last few years, GRIDH, in collaboration with SHFA, have begun to improve access to the archive through a Django-based solution and new digital resources. In this paper, we introduce the images in the archive, provide details on the new digital resources, and reflect on how the new website will impact data availability and rock art research. Ashely Green Tristan Bridge Christian Horn Siska Humlesjö Aram Karimi Johan Ling Jonathan Westin Copyright (c) 2024 Ashely Green, Tristan Bridge, Christian Horn, Siska Humlesjö, Aram Karimi, Johan Ling, Jonathan Westin https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 154 160 10.3384/ecp205021 Konsten att bedriva svensk ordforskning utan att kränka upphovsrätten https://ecp.ep.liu.se/index.php/hic/article/view/906 Vi beskriver KB-labb och Språkbanken Texts samarbete för att underlätta ordforskning på de upphovsrätts-skyddade korpusar som finns i Kungliga bibliotekets samlingar. Satsningen har hittils lett till två öppna datasamlingar, Kubord 1 och 2, som ger tillgång till ordstatistik och ordsamförekomststatistik. Vi beskriver även Kubord-fastText, en samling vektormodeller som är baserade på samma korpusar, som är under utveckling. Gerlof Bouma Markus Forsberg Justyna Sikora Emma Sköldberg Copyright (c) 2024 Gerlof Bouma, Markus Forsberg, Justyna Sikora, Emma Sköldberg https://creativecommons.org/licenses/by/4.0/ 2024-01-04 2024-01-04 161 167 10.3384/ecp205022