Term Spotting: A Quick-and-dirty Method for Extracting Typological Features of Language from Grammatical Descriptions

Authors

  • Harald Hammarström
  • One-Soon Her
  • Marc Tang

DOI:

https://doi.org/10.3384/ecp184172

Keywords:

linguistic typology, keyword spotting, information extraction

Abstract

Starting from a large collection of digitized raw-text descriptions of languages of the world, we address the problem of extracting information of interest to linguists from these. We describe a general technique to extract properties of the described languages associated with a specific term. The technique is simple to implement, simple to explain, requires no training data or annotation, and requires no manual tuning of thresholds. The results are evaluated on a large gold standard database on classifiers with accuracy results that match or supersede human inter-coder agreement on similar tasks. Although accuracy is competitive, the method may still be enhanced by a more rigorous probabilistic background theory and usage of extant NLP tools for morphological variants, collocations and vector-space semantics.

Downloads

Published

2021-08-12