Vučković, Kristina. (2012). Corpus Analysis with NooJ. In: LREC 2012, 21-27 May 2012, Istanbul, Turska. (Submitted)
Microsoft PowerPoint
(Croatian) - Presentation
Download (8MB) |
Abstract
NooJ is a freeware language-engineering development environment used to formalize and integrate nine levels of linguistic phenomena: orthography and typography, lexical, inflectional and derivational morphology, local, structural and transformational syntax, semantics. For each of these levels, NooJ provides linguists with one or more formal framework specifically designed to facilitate the description of each phenomenon, as well as parsing, development and debugging tools designed to be as computationally efficient as possible, from Finite-State to Turing machines. This approach distinguishes NooJ from other computational linguistic frameworks that provide a unique formalism that is supposed to cover all linguistic phenomena. As an Engineering development environment, NooJ contains tools to help construct, test, debug, maintain and accumulate large sets of linguistic resources, as well as tools to process large texts and corpora. The system has been developed since 2002 and it has been used to build over 20 language modules. As a corpus processing tool, NooJ allows researchers in various social sciences to extract information from any text or corpus (i.e. not tagged) by applying sophisticated queries based on concepts rather than word forms and build indices and concordances, automatically annotating texts, perform statistical analyses on concepts, etc. NooJ is freely available, runs on Windows, LINUX, SOLARIS and Mac OSX ; linguistic modules can already be freely downloaded for over a dozen languages. See www.nooj4nlp.net for more information on NooJ ; the page “doc & help” provides references to NooJ-related publications. This workshop intends to help participants to master three basic NooJ functionalities: corpus processing, formalization of linguistic units, syntactic parsing and the automatic annotation of texts.
Item Type: | Conference presentation | ||||
---|---|---|---|---|---|
Related URLs: |
|
||||
Uncontrolled Keywords: | corpus processing, linguistic units, queries, annotations, morphology, syntax | ||||
Subjects: | Information sciences > Social-humanistic informatics Linguistics |
||||
Departments: | Department of Information Science | ||||
Date Deposited: | 18 Dec 2014 10:29 | ||||
Last Modified: | 09 Feb 2015 09:56 | ||||
URI: | http://darhiv.ffzg.unizg.hr/id/eprint/5024 |
Actions (login required)
View Item |