Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Recognizing Verb-based Croatian Idiomatic MWUs


Downloads per month over past year

Kocijan, Kristina and Librenjak, Sara. (2016). Recognizing Verb-based Croatian Idiomatic MWUs. In: Automatic Processing of Natural-Language Electronic Texts with NooJ. Communications in Computer and Information Science, 608 . Springer, pp. 96-106.

PDF (English) - Published Version
Download (922kB) | Preview


This paper tackles the computational problems of Croatian verbal idioms. Croatian language has very rich phraseme structure, as described in Matešić (1982), Menac (2003; 2007) and Menac-Mihalić (2004), as well as many others. This work is one of the few attempts of computational analyis of idioms in Croatian language as multi-word units. We used rule-based approach and NooJ syntactic grammars in order to recognize any verb based idiom (of the ~1500 analyzed) in any syntactic position. The Croatian Dictionary of Idioms (Menac et al., 2003) was used for the initial list, which was implemented with new additions during training phase. Grammars were tested within the corpora constructed specifically for this work, and used to calculate statistical measures of recall, precision and f-measure for our grammars. With the final results of recall <98%, precision < 96% and f-measure <97%, we consider this a successful attempt in the recognition of verb based idioms in Croatian language.

Item Type: Book Section
Uncontrolled Keywords: Croatian, idioms, verbal phrases, NooJ, MWU, frozen expressions, semi-frozen expressions
Subjects: Information sciences > Social-humanistic informatics
Departments: Department of Information Science
Date Deposited: 21 Feb 2017 11:39
Last Modified: 01 Nov 2017 00:15

Actions (login required)

View Item View Item