Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Comparative Analysis of Automatic Term and Collocation Extraction


Downloads per month over past year

Seljan, Sanja and Dalbelo Bašić, Bojana and Šnajder, Jan and Delač, Davor and Šamec-Gjurin, Matija and Crnec, Dina. (2009). Comparative Analysis of Automatic Term and Collocation Extraction. In: 2nd International Conference “The Future of Information Sciences: INFuture2009 – Digital Resources and Knowledge Sharing”, 4-6 November 2009, Zagreb, Croatia.

PDF (English)
Download (190kB) | Preview


Monolingual and multilingual terminology and collocation bases, covering a specific domain, used independently or integrated with other resources, have become a valuable electronic resource. Building of such resources could be assisted by automatic term extraction tools, combining statistical and linguistic approaches. In this paper, the research on term extraction from monolingual corpus is presented. The corpus consists of publicly accessible English legislative documents. In the paper, results of two hybrid approaches are compared: extraction using the TermeX tool and an automatic statistical extraction procedure followed by linguistic filtering through the open source linguistic engineering tool. The results have been elaborated through statistical measures of precision, recall, and F-measure.

Item Type: Published conference work (Lecture)
Uncontrolled Keywords: automatic extraction, term and collocation base, English language,evaluation metrics
Subjects: Information sciences > Social-humanistic informatics
Information sciences > Natural language processing, lexicography and encyclopedic science
Departments: Department of Information Science
Date Deposited: 24 Feb 2017 09:45
Last Modified: 24 Feb 2017 09:45

Actions (login required)

View Item View Item