Huminfra Conference

https://ecp.ep.liu.se/index.php/hic/issue/feed Huminfra Conference 2024-01-04T11:07:42+01:00 Huminfra info@huminfra.se Open Journal Systems <p>Huminfra is a Swedish national research infrastructure supporting digital and experimental research in the Humanities by providing users with a single entry point for finding existing Swedish materials and research tools, as well as developing national methods courses.</p> <p><a href="https://www.huminfra.se/">https://www.huminfra.se/</a></p> https://ecp.ep.liu.se/index.php/hic/article/view/885 Open Brain AI: An AI Research Platform 2024-01-04T08:28:14+01:00 Charalambos Themistocleous

Language assessment is pivotal in identifying therapeutic interventions for speech, language, and communication disorders stemming from neurogenic origins, developmental or acquired, and student performance in the classroom. Traditional assessment techniques, however, are predominantly manual, necessitating extensive time and effort for administration and scoring. Such procedures can exacerbate the stress experienced by patients. In response to these inherent challenges, we introduced Open Brain AI (https://openbrainai.com). This state-of-the-art computational platform leverages advanced AI methodologies, encompassing machine learning, natural language processing, large language models, and automated speech-to-text transcription. These capabilities enable Open Brain AI to autonomously analyze multilingual spoken and written language productions. This work aims to present the development and evolution of Open Brain AI, elucidating its AI-driven language processing components and the intricate linguistic metrics it employs to evaluate the overarching and granular discourse structures. Open Brain AI significantly reduces the workload on researchers, clinicians, and teachers by facilitating rapid and automated language analysis. It allows healthcare and education professionals to optimize their operational processes, reallocating precious time and resources to more personalized user interactions. Moreover, Open Brain AI provides clinicians, researchers, and educators the autonomy to undertake essential data analytics, freeing up more bandwidth to focus on other vital facets of therapeutic intervention and care.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Charalambos Themistocleous https://ecp.ep.liu.se/index.php/hic/article/view/886 Profiles for Swedish as a Second Language: Lexis, Grammar, Morphology 2024-01-04T08:28:15+01:00 Elena Volodina David Alfter Therese Lindström Tiedemann

This article gives a short introduction to the Swedish Second Language Profile, a tool that visualizes language in Swedish learner corpora from different angles, such as vocabulary, grammar and morphology. The tool is aimed at research on Second Language Acquisition, development of NLP models, teaching of Swedish as a second language, automatic approaches for second language teaching and learning, and at a number of other fields.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Elena Volodina, David Alfter, Therese Lindström Tiedemann https://ecp.ep.liu.se/index.php/hic/article/view/887 Digital History and Immaterial Infrastructure: A Bottom-Up Approach 2024-01-04T08:28:15+01:00 Sune Bechmann Pedersen Marie Cronqvist Kajsa Weber

This paper argues for an expanded view of research infrastructure. Drawing on our experiences leading the research platform DigitalHistory@Lund, it shows how research capacity can be unlocked “bottom-up”, by providing scholars with comparatively cheap—yet often inaccessible— technological support. By engaging researchers in digitally enabled scholarly practices, the platform yielded a multiplying effect that has seen participants produce highly competitive grant applications and eventually bring home external funding currently worth eight times the platform’s original costs. The platform thus demonstrates the importance of “immaterial” infrastructure in the sense of basic organisational structures that facilitate collaboration and communication.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Sune Bechmann Pedersen, Marie Cronqvist, Kajsa Weber https://ecp.ep.liu.se/index.php/hic/article/view/888 Documentation of data making, processing and use facilitates future reuse of research data: the CAPTURE project 2024-01-04T08:28:15+01:00 Isto Huvila Stefan Ekman

Reuse of research data requires knowing what the data is about but also of how it was created and previously processed, interpreted and used. The major challenges in capturing enough – but not too much – such process information, termed paradata, are to know what to document and how to document it in adequate detail and form. This paper showcases research and findings from the ERC-funded research project CAPTURE, which develops in-depth understanding of how paradata is being created and used today and which elicits and explores methods for capturing paradata. From a research infrastructure perspective, the most challenging question for managing paradata is how to enable and support the creation of paradata that is sufficient, relevant for its future reusers, and not too labour-intensive to produce and maintain. Considering the significant extent to which paradata is coincidental and exists because of the lack of data cleaning and management, a major challenge is also how to strike a balance between too much and too little standardisation.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Isto Huvila, Stefan Ekman https://ecp.ep.liu.se/index.php/hic/article/view/889 Queerlit – a bibliography of Swedish fiction with LGBTQI topics 2024-01-04T08:28:16+01:00 Siska Humlesjö Jenny Bergenmar Arild Matsson

This paper summarizes the project Queerlit: Metadata and Searchability for LGBTQ+ Literary Heritage 2020-2023 and discusses some challenges in the development of this resource. The Queerlit project consist of four parts: 1. Creating a bibliography of Swedish fiction with LGBTQI themes 2. Creating a Swedish thesaurus (QLIT), adapted from the of the linked open data thesaurus Homosaurus 3. Assigning all material in the bibliography with subject headings from QLIT. 4. A web user interface for searching the material All four parts are integrated with the Swedish union catalog, Libris, making the results of the project available for all under a CC0 license. QLIT is the first external thesaurus integrated in the linked open data framework used in the technical platform of Libris, XL. The bibliography spans from rune stones from the 7th century to recently published fiction. When applying subject headings for the material both general aspects of the work and specific LGBTQI topics are described, making this the most comprehensive retrospective indexing project of Swedish literature to date. The underlying knowledge organization is made a prominent method of interacting with the search interface, which is empirically designed around the needs of various user groups.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Siska Humlesjö, Jenny Bergenmar, Arild Matsson https://ecp.ep.liu.se/index.php/hic/article/view/890 From Zipf distribution to Universal Dependencies – Interactive Notebooks for Swedish Text Analysis 2024-01-04T08:28:17+01:00 Dimitrios Kokkinakis

Notebook-based environments are powerful (web-based) interactive development resources for conducting exploratory (textual) data analysis (EDA). These environments allow the embedding of code (code snippets in ‛code cells’) which can be easily executed with the results immediately presented into the user’s window. This paper introduces some basic exploratory tools and techniques using JupyterLab notebooks, applied to Swedish using a subcorpus that address various topics related to the COVID-19 pandemic published during January-December 2021.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Dimitrios Kokkinakis https://ecp.ep.liu.se/index.php/hic/article/view/891 Tradita innovare, innovata tradere 2024-01-04T08:28:17+01:00 Lars Borin Louise Holmer

Swedish computational lexicography has a long history at the University of Gothenburg, both in its primary role as a central aspect of the scientific study of vocabulary and also as an infrastructural component for conducting research based on language data. Starting in the 1960s, the Språkdata research group pioneered corpus-supported lexicography for Swedish, forming the basis for successive editions of the two main descriptive dictionaries of contemporary Swedish, SAOL and SO. Language technological lexical resources for Swedish have been developed by the research unit/research infrastructure Språkbanken Text since the turn of the millennium, most recently in the framework of the Swedish FrameNet++ initiative. After two decades of separation, these two largely mutually independently developed strands of computational lexicography have now joined forces under the umbrella of Språkbanken’s lexical research infrastructure to advance the field technically, methodologically, and scientifically.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Lars Borin, Louise Holmer https://ecp.ep.liu.se/index.php/hic/article/view/892 Collectio: a software especially designed for creating dynamic libraries for fluid and multilingual text traditions 2024-01-04T08:28:18+01:00 Britt Dahlman

This contribution presents a new software, Collectio, which can be used for creating highly complex relational MySQL databases, or more accurately, dynamic libraries. These libraries prove particularly well-suited for texts where the material has been organized in different ways and thus represents a ‘fluid’ textual tradition, or in traditions transmitted in many languages. So far, two libraries have been created using Collectio: APDB (the Apophthegmata Patrum Database) and HIPPO, which contains pre-modern hippiatric material. The sources included in the libraries are mainly in the form of manuscripts, editions and modern translations. Collectio employs a unique input model, built upon .txt and .csv files stored in an archive in the folder of the library. The contents of the database tables in the master database are generated from these documents. Since not only texts are registered but also the detailed structure and parallel text segments in other sources, both texts and structures can be systematically compared and analysed within and across language boundaries. In addition to the advanced research tools for comparing texts and structures, the application contains search options, indexes of names, places and concepts, metadata on the sources, pre-written SQL commands and more. A new way of encoding text, which can be converted into TEI/XML, is also introduced.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Britt Dahlman https://ecp.ep.liu.se/index.php/hic/article/view/893 AI, Data Curation and the Data Readiness of Heritage Collections: Exploring the Swedish Newspaper Archive at KBLab 2024-01-04T08:28:19+01:00 Justyna Sikora Chris Haffenden

The increasing availability of digital material and tools for large-scale computational analysis has produced a growing interest in big data approaches in the humanities and social sciences. However, the vital role of data curation as a precondition for such projects remains underappreciated. This paper details the work of KBLab at the National Library of Sweden in testing AI tools to help curate the digitized newspaper archive and make it more amenable to quantitative, machine learning-based research. It provides a description of the library’s newspaper data to offer orientation to researchers interested in the material, before turning to recount the results of our exploration with automated data curation. It concludes by sketching possible next steps for these exploratory efforts, as well as situating this project within a broader recent turn to conceptualize and prioritize the notion of data readiness. Its principal argument is in drawing attention to data curation as an essential part of any digital research project, not something prior to or external from the research process.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Justyna Sikora, Chris Haffenden https://ecp.ep.liu.se/index.php/hic/article/view/894 SAOL och svensk språkvetenskaplig infrastruktur – nu och i framtiden 2024-01-04T08:28:19+01:00 Louise Holmer Ann Lillieström Emma Sköldberg Jonatan Uppström

Svenska Akademiens ordlista (SAOL 14, 2015 [1]) spelar en viktig roll inom svensk språkvetenskaplig infrastruktur, något som framkommer i denna artikel. Vidare presenteras preliminära resultat av en undersökning av hur frekventa uppslagsorden i SAOL egentligen är i olika delkorpusar med modern allmänspråklig svenska. För att ordlistan även fortsättningsvis ska kunna användas inom svensk ordforskning, vid språkstudier m.m., men också bli mer central inom språkteknologiska sammanhang, är det avgörande att SAOL:s uppslagsord vilar på vetenskaplig grund, moderna språkteknologiska metoder och uppdaterade korpusmaterial. Fokus i artikeln ligger på de uppslagsord som inte finns belagda i korpusmaterialet, och som därmed kan tänkas mönstras ut inför den kommande femtonde upplagan.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Louise Holmer, Ann Lillieström, Emma Sköldberg, Jonatan Uppström https://ecp.ep.liu.se/index.php/hic/article/view/895 Curating a historical source corpus of 20th century patient organization periodicals 2024-01-04T08:28:20+01:00 Gijs Aangenendt Maria Skeppstedt Ylva Söderfeldt

Acting out Disease: How Patient Organizations Shaped Modern Medicine (ActDisease) explores the history of patient organizations in 20th century Europe. By combining traditional historiographic methods with text mining techniques, the project aims to shed light on how patient organizations co-constructed concepts of and management of disease. Part of the project is to digitize print sources and build a digital corpus for historical text mining. The corpus consists of periodical publications from selected British, French, German and Swedish patient organizations, a type of material that poses a number of challenges in scan quality, layout, and lack of consistency. This paper discusses the technical process of building the ActDisease corpus from digitizing patient organization periodicals to OCR post-processing. It touches upon the methodological questions and challenges of curating a corpus of fragmented and heterogeneous historical source material tailored to a specific project.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Gijs Aangenendt, Maria Skeppstedt, Ylva Söderfeldt https://ecp.ep.liu.se/index.php/hic/article/view/896 On two SweLL learner corpora – SweLL-pilot and SweLL-gold 2024-01-04T08:28:21+01:00 Elena Volodina

SweLL – Swedish Learner Language – is a unifying term for the infrastructure module for research on Swedish as a Second Language (L2), deployed and maintained as a part of bigger infrastructure of Språkbanken Text at the University of Gothenburg, Sweden. The SweLL infrastructure module consists of a number of learner data collections, and tools for annotation and management of learner data. As a result, many of its components contain the prefix SweLL in their names, which has created some confusion, especially with regards to the two corpora. In this article we shortly introduce the various SweLL-components with a special focus on the differences between the two SweLL corpora.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Elena Volodina https://ecp.ep.liu.se/index.php/hic/article/view/897 STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker 2024-01-04T08:28:21+01:00 Arianna Masciolini Márton A. Tóth

Föreliggande artikel introducerar STUND, ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker som möjliggör parallella syntaktiska sökningar. Vi demonstrerar dess praktiska tillämpning i en fallstudie på tempusformen presens perfekt i svenska och engelska. Resultaten visar att presens perfekt används i ungefär lika stor utsträckning i båda språken, men att det förekommer viss variation som verkar bero på språkspecifika konventioner och översättningsstrategier.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Arianna Masciolini, Márton A. Tóth https://ecp.ep.liu.se/index.php/hic/article/view/898 DASH Swedish National Doctoral School in Digital Humanities: From Local Expertise to National Research Infrastructure 2024-01-04T08:28:22+01:00 Matti La Mela Daniel Brodén Coppélie Cocq Anna Foka Koraljka Golub Clelia LaMonica Jonathan Westin

This paper presents the Swedish National Doctoral School in Digital Humanities: Data, Culture, and Society – Critical Perspectives (DASH) that is run in 2023–2027 by Uppsala University, Umeå University, Linnaeus University, and Gothenburg University. Though Swedish universities have established PhD courses, MA programmes and training in digital humanities previously, DASH is the first encompassing educational programme in digital humanities at the doctoral level. The present paper discusses the rationale behind the DASH doctoral school, its role in the landscape of Swedish humanities infrastructures, and provides insights from the first PhD courses and seminars. The focus of DASH is to equip PhD candidates in humanities and social sciences with knowledge and skills necessary to pursue high quality, innovative and critical research in digital humanities. DASH aims to provide knowledge in relation to digital research, its methods, tools, and critical perspectives, and to build and strengthen the networks among early career scholars. DASH facilitates access and use of the resources in the national infrastructures in the humanities, but also emerges as an element in the infrastructure by providing new resources and competences.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Matti La Mela, Daniel Brodén, Coppélie Cocq, Anna Foka, Koraljka Golub, Clelia LaMonica, Jonathan Westin https://ecp.ep.liu.se/index.php/hic/article/view/899 Research stories on Twitter 2024-01-04T08:28:23+01:00 David G. Lorentzen Gustaf Nelhans

This paper aims to study what type of research seems to interest the users of a social network platform and then complement the data with data from an open catalogue for research, exemplifying with Twitter and Open Alex. The basic idea is to get an overview of the stories the platform content tells during three months regarding topics, disciplines, and open access status. The findings suggest that the picture look very different between the approaches to map the topics, especially when looking at the articles most mentioned compared to the ones that are most retweeted. The study mainly highlights the methodological opportunities of combining text analysis and link relationships to explore the content and public interest in academic research.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 David G. Lorentzen, Gustaf Nelhans https://ecp.ep.liu.se/index.php/hic/article/view/900 Humanistic AI: Towards a new field of interdisciplinary expertise and research 2024-01-04T08:28:23+01:00 Mats Fridlund David Alfter Daniel Brodén Ashely Green Aram Karimi Cecilia Lindhé

The Gothenburg Research Infrastructure in Digital Humanities (GRIDH) have participated in projects within various humanities fields that utilise as well as develop research tools and infrastructural resources that incorporate applications of ‘artificial intelligence’ (AI). These applications can include natural language processing, machine learning, computer vision, large language models, image recognition algorithms, classification, clustering, and deep learning. This paper advances the term ‘humanistic AI’ to describe an emergent form of interdisciplinary practice that uses and develops AI-based research applications to answer humanities research questions together with its entangled humanistic reflection. We coin this term to make implicit and visible the epistemological and material particularities of its practice and the new forms of knowledge its affordances make possible. The paper presents GRIDH projects within ‘humanistic AI’ together with its developed AI resources and applications.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Mats Fridlund, David Alfter, Daniel Brodén, Ashely Green, Aram Karimi, Cecilia Lindhé https://ecp.ep.liu.se/index.php/hic/article/view/901 Designing digitally-driven integrative interdisciplinarity: Professionalism between protocol and judgement 2024-01-04T08:28:24+01:00 Daniel Brodén Mats Fridlund Cecilia Lindhé

While there is a growing discussion of the importance of developing collaborative workflows for interdisciplinary research within DH, there is a lack of blueprints and consideration of specific expertise. This paper conceptualizes the practice of what we tentatively call digitally-driven integrative interdisciplinary project design in order to highlight a certain professional practice for integrating collaboration between technical expertise and traditional HSS researchers when developing research project applications, digital resources, etc. We begin by highlighting the need for protocol for workflow- oriented approaches to integrative interdisciplinary collaboration, but also an embodied expertise in need of being put into focus in discussions of integrative workflows within digital humanities. Then, we argue that judgement is also a crucial but often overlooked part of the professionalism involved. We conclude by discussing how to further develop the conceptualization of interdisciplinary digital project design and the expertise involved.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Daniel Brodén, Mats Fridlund, Cecilia Lindhé https://ecp.ep.liu.se/index.php/hic/article/view/902 From the Arctics to Antarctica - A multimodular visualisation of data 2024-01-04T08:28:25+01:00 Jonathan Westin Tristan Bridge Matteo Tomasini

This paper outlines the structure of Multimodal Map, a tool developed at GRIDH to access and visualise place-based datasets. The Multimodal Map frontend, which is developed with a Vue3 framework that fetches data from a backend built in Django, is arranged as a series of distinct and interconnected views that lets the user interact with the material at different scale and level of abstraction. To support the wide variety of formats the different projects need to handle, Multimodal Map makes use of both custom solutions and several open frameworks and libraries. These include Open Layers for the geographical visualisations, OpenSeadragon for IIIF-images, potree.js for point clouds, 3D Heritage Online Presenter (3DHOP) for meshes, and relight-viewer.js for RTI Photography.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Jonathan Westin, Tristan Bridge, Matteo Tomasini https://ecp.ep.liu.se/index.php/hic/article/view/903 The DIGARV Platform: A collaborative platform for working with cultural heritage data and research data 2024-01-04T08:28:26+01:00 Johan Åhlfeldt Arild Matsson

This article covers an easy-to-use research tool for collaborative work. The tool has been adapted for structured data and high-resolution images within four research projects at GRIDH. The platform is especially designed for working with temporal and spatial data. Furthermore, the platform gives researchers access to a relational database system through input forms and access to external cultural heritage data including high-resolution images. This way the platform also aims to utilize external data published as Linked Open Data (LOD) and, at the same time, prepare its own research data for publishing as LOD. Because of the spatial and temporal nature of the data, it is visualized in time and space through maps and timelines to give overview and context during the data management phase.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Johan Åhlfeldt, Arild Matsson https://ecp.ep.liu.se/index.php/hic/article/view/904 Samförfattande som datadriven tvärvetenskap: Pragmatiska lärdomar från SweTerror-projektet 2024-01-04T08:28:26+01:00 Daniel Brodén Mats Fridlund Leif-Jöran Olsson Magnus P. Ängsal Patrik Öhberg

Terrorism i svensk politik (SweTerror) är ett storskaligt tvärvetenskapligt forskningsprojekt med forskare från såväl human- och samhällsvetenskaperna som datavetenskaperna. Samtidigt använder och utvecklar SweTerror nationell forskningsinfrastruktur för riksdagsdata. Detta paper beskriver användningen av samförfattande som en datadriven tvärvetenskaplig praktik för att integrera olika vetenskapliga perspektiv och skapa samsyn i projektforskningen. Vi tar fasta på betydelsen av valet att koncentrera samarbetsformen kring konferenspapers inom specifikt digital humaniora och diskuterar erfarenheten av att samskrivande försvagar vetenskapligt revirtänkande, liksom ett iterativt förhållningssätt till forskningsdata kopplade till forskningsinfrastrukturer under uppbyggnad. Avslutningsvis betonar vi datadrivet samförfattande som en pragmatisk praktik för att stärka kollaborativt samarbete och kunskapsbryggor inom en tvärvetenskaplig forskargrupp.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Daniel Brodén, Mats Fridlund, Leif-Jöran Olsson, Magnus P. Ängsal, Patrik Öhberg https://ecp.ep.liu.se/index.php/hic/article/view/905 Accessing centuries of documentation - Resources to improve access to Swedish rock art documentation and metadata 2024-01-04T08:28:27+01:00 Ashely Green Tristan Bridge Christian Horn Siska Humlesjö Aram Karimi Johan Ling Jonathan Westin

The archive of rock art documentation maintained by SHFA provides a valuable resource to archaeologists and others who study rock art. The archive includes images of rock art documentation, sites, and the documentation process, from the 17th century to the more recent high resolution 3D recording and visualizations. In the last few years, GRIDH, in collaboration with SHFA, have begun to improve access to the archive through a Django-based solution and new digital resources. In this paper, we introduce the images in the archive, provide details on the new digital resources, and reflect on how the new website will impact data availability and rock art research.

2024-01-04T00:00:00+01:00 Copyright (c) 2024 Ashely Green, Tristan Bridge, Christian Horn, Siska Humlesjö, Aram Karimi, Johan Ling, Jonathan Westin https://ecp.ep.liu.se/index.php/hic/article/view/906 Konsten att bedriva svensk ordforskning utan att kränka upphovsrätten 2024-01-04T08:28:28+01:00 Gerlof Bouma Markus Forsberg Justyna Sikora Emma Sköldberg

Vi beskriver KB-labb och Språkbanken Texts samarbete för att underlätta ordforskning på de upphovsrätts-skyddade korpusar som finns i Kungliga bibliotekets samlingar. Satsningen har hittils lett till två öppna datasamlingar, Kubord 1 och 2, som ger tillgång till ordstatistik och ordsamförekomststatistik. Vi beskriver även Kubord-fastText, en samling vektormodeller som är baserade på samma korpusar, som är under utveckling.