Term Spotting: A Quick-and-dirty Method for Extracting Typological Features of Language from Grammatical Descriptions
DOI:
https://doi.org/10.3384/ecp184172Keywords:
linguistic typology, keyword spotting, information extractionAbstract
Starting from a large collection of digitized raw-text descriptions of languages of the world, we address the problem of extracting information of interest to linguists from these. We describe a general technique to extract properties of the described languages associated with a specific term. The technique is simple to implement, simple to explain, requires no training data or annotation, and requires no manual tuning of thresholds. The results are evaluated on a large gold standard database on classifiers with accuracy results that match or supersede human inter-coder agreement on similar tasks. Although accuracy is competitive, the method may still be enhanced by a more rigorous probabilistic background theory and usage of extant NLP tools for morphological variants, collocations and vector-space semantics.Downloads
Published
2021-08-12
Issue
Section
Contents
License
Copyright (c) 2021 Harald Hammarström, One-Soon Her, Marc Tang
This work is licensed under a Creative Commons Attribution 4.0 International License.