Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?

Downloads

Downloads per month over past year

Ljubešić, Nikola and Bago, Petra and Boras, Damir. (2010). Statistical Machine Translation of Croatian Weather Forecasts: How Much Data Do We Need?. CIT - Journal of computing and information technology, 18(4). pp. 303-308. ISSN 1330-1136

[img] PDF (English) - Repository staff only
Download (169kB) | Request a copy

Abstract

This research is the first step towards developing a system for translating Croatian weather forecasts into multiple languages. This step deals with the Croatian-English language pair. The parallel corpus consists of a one-year sample of the weather forecasts for the Adriatic, consisting of 7,893 sentence pairs. Evaluation is performed by the automatic evaluation measures BLUE, NIST and METEOR, as well as by manually evaluating a sample of 200 translations. We have shown that with a small-sized training set and the state-of-the artMoses system, decoding can be done with 96% accuracy concerning adequacy and fluency. Additional improvement is expected by increasing the training set size. Finally, the correlation of the recorded evaluation measures is explored.

Item Type: Article
Uncontrolled Keywords: statistical machine translation; automatic evaluation; manual evaluation; correlation between evaluation measures
Subjects: Information sciences > Social-humanistic informatics
Departments: Department of Information Science
Date Deposited: 18 Oct 2012 15:59
Last Modified: 22 Feb 2016 10:04
URI: http://darhiv.ffzg.unizg.hr/id/eprint/1862

Actions (login required)

View Item View Item