Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Statistical Machine Translation of Croatian Weather Forecast: How Much Data Do We Need?


Downloads per month over past year

Ljubešić, Nikola and Bago, Petra and Boras, Damir. (2010). Statistical Machine Translation of Croatian Weather Forecast: How Much Data Do We Need?. In: ITI 2010 32nd International Conference on INFORMATION TECHNOLOGY INTERFACES, June 21-24, 2010, Cavtat.

PDF (English)
Download (220kB) | Preview


This research is a first step towards a system for translating Croatian weather forecast into multiple languages. This steps deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic consisting of 7,893 sentence pairs. Evaluation is performed by best known automatic evaluation measures BLUE, NIST and METEOR, as well as by evaluating manually a sample of 200 translations. In this research we have shown that with a small-sized training set and the state-of-the art Moses system, decoding can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is to be expected by increasing the training set size.

Item Type: Published conference work (Paper)
Uncontrolled Keywords: statistical machine trans-lation, Croatian language, English language, automatic evaluation, manual evaluation
Subjects: Information sciences > Social-humanistic informatics
Departments: Department of Information Science
Date Deposited: 03 Feb 2016 11:28
Last Modified: 03 Feb 2016 11:28

Actions (login required)

View Item View Item