Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Near Language Identification Using NooJ

Downloads

Downloads per month over past year

Bekavac, Božo and Kocijan, Kristina and Tadić, Marko. (2015). Near Language Identification Using NooJ. In: Formalising Natural Languages with NooJ 2014: Selected Papers from the NooJ 2014 International Conference. Cambridge Scholars Publishing, Newcastle upon Tyne, pp. 152-166. ISBN 1-4438-7558-9

[img] PDF (English) - Published Version - Registered users only
Download (810kB) | Request a copy
[img] Other (power point show) (English) - Presentation
Download (626kB)

Abstract

In this work we took a linguistic knowledge aware approach tailored for a specific pair of languages. We use NooJ as a core part of a system designed for automatic identification of near languages, Croatian and Serbian in particular. We use several levels of NooJ processing capabilities. First, we apply specially designed lexical transducers for the detection of the typical morphological issues in language. Then we apply the syntactic grammars for the detection of syntagmas characteristic for Serbian and Croatian languages. Finally, we measure discrepancies between properties of texts provided by text processing. The output is generated according to predefined voting principle using AutoHotkey program. Our results show high precision of 99.82 % for language identification of Croatian and Serbian texts.

Item Type: Book Section
Uncontrolled Keywords: near language identification, Croatian language, Serbian language, local grammars, NooJ
Subjects: Information sciences > Social-humanistic informatics
Linguistics
Departments: Department of Information Science
Department of Linguistics
Date Deposited: 09 Jun 2015 09:58
Last Modified: 14 Jan 2016 10:32
URI: http://darhiv.ffzg.unizg.hr/id/eprint/5261

Actions (login required)

View Item View Item