The LiRI Corpus Platform
DOI:
https://doi.org/10.3384/ecp210010Abstract
We present the LiRI Corpus Platform (LCP), a software system and infrastructure for querying a vast array of corpora of different kinds. It heavily relies on the PostgreSQL relational database management system, employing state-of-the-art data representation and indexing techniques, which lead to significant performance gains when querying, even for structurally complex queries involving nested logical operations and quantifiers. In this work, we describe the requirements that led to the development of this novel system, discuss methods from corpus linguistics and beyond that we considered key for such a system, and provide details on a number of technological features that we take advantage of. Our platform also comes with its own query language tailored both to the requirements in terms of information need and our philosophy of how to define corpora in an abstract way.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Johannes Gra¨en, Jonathan Schaber, Daniel McDonald, Igor Mustač, Nikolina Rajovi´c, Gerold Schneider, Teodora Vukovi´c, Jeremy Zehr, Noah Bubenhofer
This work is licensed under a Creative Commons Attribution 4.0 International License.