A semi-automatic part-of-speech tagging system for Middle English corpora: overcoming the challenges.
PDF

How to Cite

Sánchez Reed, M., & Miranda García, A. (2019). A semi-automatic part-of-speech tagging system for Middle English corpora: overcoming the challenges. SELIM. Journal of the Spanish Society for Medieval English Language and Literature., 16, 121–147. https://doi.org/10.17811/selim.16.2009.121-147

Abstract

Historical corpus annotation is very much a manual, time-consuming task. The last few years have witnessed advances in the use of computational tools for the annotation of Middle English corpora. In 2007 an attempt at creating a semi-automatic system for part-of-speech (POS) tagging, based on the use of parallel texts, was developed at the University of Texas. Although this work still revealed manual annotation to be more accurate, it proved the potential of computational tools for the creation of tagging systems. We propose the development of a semi-intelligent and semi-automatic POS tagging program for ME corpora capable of tagging any given ME text with a high rate of success; no such computational system is currently available. This task entails challenges of a two-fold nature: a) linguistic diffi culties; and b) computational limitations. This paper discusses these diffi culties and provides possible solutions to them in order to create a tool that will facilitate POS tagging and help searching for linguistic information.

Keywords: POS tagging, Middle English, historical corpora, computational linguistics.

https://doi.org/10.17811/selim.16.2009.121-147
PDF

Downloads

Download data is not yet available.