Bekavac, Božo and Kocijan, Kristina and Tadić, Marko. (2015). Near Language Identification Using NooJ. In: Formalising Natural Languages with NooJ 2014: Selected Papers from the NooJ 2014 International Conference. Cambridge Scholars Publishing, Newcastle upon Tyne, pp. 152-166. ISBN 1-4438-7558-9
PDF
(English) - Published Version
- Registered users only
Download (810kB) | Request a copy |
|
Other (power point show)
(English) - Presentation
Download (626kB) |
Abstract
In this work we took a linguistic knowledge aware approach tailored for a specific pair of languages. We use NooJ as a core part of a system designed for automatic identification of near languages, Croatian and Serbian in particular. We use several levels of NooJ processing capabilities. First, we apply specially designed lexical transducers for the detection of the typical morphological issues in language. Then we apply the syntactic grammars for the detection of syntagmas characteristic for Serbian and Croatian languages. Finally, we measure discrepancies between properties of texts provided by text processing. The output is generated according to predefined voting principle using AutoHotkey program. Our results show high precision of 99.82 % for language identification of Croatian and Serbian texts.
Item Type: | Book Section |
---|---|
Uncontrolled Keywords: | near language identification, Croatian language, Serbian language, local grammars, NooJ |
Subjects: | Information sciences > Social-humanistic informatics Linguistics |
Departments: | Department of Information Science Department of Linguistics |
Date Deposited: | 09 Jun 2015 09:58 |
Last Modified: | 14 Jan 2016 10:32 |
URI: | http://darhiv.ffzg.unizg.hr/id/eprint/5261 |
Actions (login required)
View Item |