Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Error Analysis in Croatian Morphosyntactic Tagging

Downloads

Downloads per month over past year

Agić, Željko and Tadić, Marko and Dovedan, Zdravko. (2009). Error Analysis in Croatian Morphosyntactic Tagging. In: ITI 2009 31st International Conference on INFORMATION TECHNOLOGY INTERFACES, June 22-25, 2009, Cavtat.

[img]
Preview
PDF (English)
Download (137kB) | Preview

Abstract

In this paper, we provide detailed insight on properties of errors generated by a stochastic morphosyntactic tagger assigning Multext-East morphosyntactic descriptions to Croatian texts. Tagging the Croatia Weekly newspaper corpus by the CroTag tagger in stochastic mode revealed that approximately 85 percent of all tagging errors occur on nouns, adjectives, pronouns and verbs. Moreover, approximately 50 percent of these are shown to be incorrect assignments of case values. We provide various other distributional properties of errors in assigning morphosyntactic descriptions for these and other parts of speech. On the basis of these properties, we propose rule-based and stochastic strategies which could be integrated in the tagging module, creating a hybrid procedure in order to raise overall tagging accuracy for Croatian.

Item Type: Published conference work (Paper)
Uncontrolled Keywords: Morphosyntactic tagging, part-of-speech tagging, error analysis, error distribution, Croatian language, hybrid tagging
Subjects: Information sciences > Social-humanistic informatics
Linguistics
Departments: Department of Information Science
Department of Linguistics
Date Deposited: 03 Feb 2016 12:14
Last Modified: 03 Feb 2016 12:14
URI: http://darhiv.ffzg.unizg.hr/id/eprint/5953

Actions (login required)

View Item View Item