Načinović, Lucia and Martinčić-Ipšić, Sanda and Ipšić, Ivo. (2009). Statistical Language Models for Croatian Weather-domain Corpus. In: 2nd International Conference “The Future of Information Sciences: INFuture2009 – Digital Resources and Knowledge Sharing”, 4-6 November 2009, Zagreb, Croatia.
|
PDF
(English)
Download (261kB) | Preview |
Abstract
Statistical language modelling estimates the regularities in natural languages. Language models are used in speech recognition, machine translation and other applications for speech and language technologies. In this paper we will present a procedure for language models building for the Croatian weather domain corpus. Different types of n-gram statistic language models and smoothing methods for language modelling are presented. Those models are compared in terms of their estimated perplexity.
Item Type: | Published conference work (Lecture) |
---|---|
Uncontrolled Keywords: | statistical language modelling, n-gram, smoothing methods, Croatian weather-domain corpus |
Subjects: | Information sciences > Social-humanistic informatics Information sciences > Natural language processing, lexicography and encyclopedic science Linguistics |
Departments: | Department of Information Science |
Date Deposited: | 19 May 2017 09:29 |
Last Modified: | 19 May 2017 09:29 |
URI: | http://darhiv.ffzg.unizg.hr/id/eprint/8392 |
Actions (login required)
View Item |