The CLARIN-DK Text Tonsorium

Authors

  • Bart Jongejan

DOI:

https://doi.org/10.3384/ecp18013

Keywords:

workflows, usability, metadata

Abstract

The Text Tonsorium (TT) is a workflow management system (WMS) for Natural Language Processing (NLP). The software implements a design goal that sets it apart from other WMSes: it operates without manually composed workflow designs. The TT invites tool providers to register and integrate their tools, without having to think about the workflow designs that new tools can become part of. Both input and output of new tools are specified by expressing language, file format, type of content, etc. in terms of an ontology. Likewise, users of the TT define their goal in terms of this ontology and let the TT compute the workflow designs that fulfill that goal. When the user has chosen one of the proposed workflow designs, the TT enacts it with the user’s input. This untraditional approach to workflows requires some familiarization. In principle, the TT cannot predict which of the proposed workflow designs is most appropriate, because the text may have  peculiarities that are as yet uncharted. The user has to make the choice. In this paper, we reflect on the experiences with providing, testing and using workflows aimed at annotating transcripts of parliamentary debates. We propose possible improvements of the TT that can facilitate its use by the wider clarin community.

Downloads

Published

2021-06-22