Term Spotting: A Quick-and-dirty Method for Extracting Typological Features of Language from Grammatical Descriptions
Keywords:linguistic typology, keyword spotting, information extraction
AbstractStarting from a large collection of digitized raw-text descriptions of languages of the world, we address the problem of extracting information of interest to linguists from these. We describe a general technique to extract properties of the described languages associated with a specific term. The technique is simple to implement, simple to explain, requires no training data or annotation, and requires no manual tuning of thresholds. The results are evaluated on a large gold standard database on classifiers with accuracy results that match or supersede human inter-coder agreement on similar tasks. Although accuracy is competitive, the method may still be enhanced by a more rigorous probabilistic background theory and usage of extant NLP tools for morphological variants, collocations and vector-space semantics.
Copyright (c) 2021 Harald Hammarström, One-Soon Her, Marc Tang
This work is licensed under a Creative Commons Attribution 4.0 International License.