A semi-automatic part-of-speech tagging system for Middle English corpora: overcoming the challenges.

Melania Sánchez Reed; Antonio Miranda García

doi:10.17811/selim.16.2009.121-147

Vol. 16 (2009), Articles

Vol. 16 (2009)

A semi-automatic part-of-speech tagging system for Middle English corpora: overcoming the challenges.

Articles

https://doi.org/10.17811/selim.16.2009.121-147

Published 2019-02-20

Melania Sánchez Reed
Antonio Miranda García

Melania Sánchez Reed

Antonio Miranda García

PDF

How to Cite

Sánchez Reed, M., & Miranda García, A. (2019). A semi-automatic part-of-speech tagging system for Middle English corpora: overcoming the challenges. SELIM. Journal of the Spanish Society for Medieval English Language and Literature, 16, 121–147. https://doi.org/10.17811/selim.16.2009.121-147

Abstract

Historical corpus annotation is very much a manual, time-consuming task. The last few years have witnessed advances in the use of computational tools for the annotation of Middle English corpora. In 2007 an attempt at creating a semi-automatic system for part-of-speech (POS) tagging, based on the use of parallel texts, was developed at the University of Texas. Although this work still revealed manual annotation to be more accurate, it proved the potential of computational tools for the creation of tagging systems. We propose the development of a semi-intelligent and semi-automatic POS tagging program for ME corpora capable of tagging any given ME text with a high rate of success; no such computational system is currently available. This task entails challenges of a two-fold nature: a) linguistic diffi culties; and b) computational limitations. This paper discusses these diffi culties and provides possible solutions to them in order to create a tool that will facilitate POS tagging and help searching for linguistic information.

Keywords: POS tagging, Middle English, historical corpora, computational linguistics.

https://doi.org/10.17811/selim.16.2009.121-147

PDF

Downloads

Download data is not yet available.

A semi-automatic part-of-speech tagging system for Middle English corpora: overcoming the challenges.

How to Cite

Download Citation

Abstract

Downloads