Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Automatic prediction and modelling of Croatian prosodic features based on text

Downloads

Downloads per month over past year

Načinović Prskalo, Lucia. (2016). Automatic prediction and modelling of Croatian prosodic features based on text. PhD Thesis. Filozofski fakultet u Zagrebu, Department of Information Science.
(Poslijediplomski doktorski studij informacijskih i komunikacijskih znanosti) [mentor Mikelić Preradović, Nives].

[img]
Preview
PDF (Croatian)
Download (3MB) | Preview

Abstract

Human speech conveys a wide range of information on the pitch accent, intonation, duration, rhythm, pauses, speech rate, and these characteristics are often collectively referred to as prosody. Because of the many roles of prosody in human communication, its predicting and modelling is important and can be applied in many areas of natural language processing such as automatic speech recognition, speech synthesis, automatic identification of speakers and languages, determining emotional states etc. Previous to this research no extensive research on the prediction of prosodic characteristics and their modelling had been conducted for the Croatian language. In this doctoral thesis the applicability of the methods for prosodic features predicting and their modelling was tested for Croatian. The possibility of improving their performance with the inclusion of linguistic features and linguistic specificities typical for the Croatian language (for example - lexical stress) was explored. The Croatian language is a pitch accent language in which the tone contour realized in the prominent words carries lexical information. Therefore a prerequisite for modelling the prosody of Croatian is the existence of the lexicon in which lexical stress of both basic and derived forms of words is marked. Such a lexicon was created by implementing the rules for constructing derived forms of words based on the addition of the appropriate extension and on the place of stress moving if necessary. The entries in the lexicon are comprised of all derived words written without and with its corresponding stress and morph syntactic description (MSD) or part-of-speech tag (POS). Croatian belongs to the group of under-resourced languages and it is therefore considered that the importance of the lexicon will be significant and that it will be greatly applicable in various fields of natural language processing. The lexicon is comprised of 72,366 words in their basic form and over 1.000,00 derived word forms. Besides the lexicon, the product of the implementation of the rules for constructing derived forms of words is a system for automatic stress assignment for Croatian. The accuracy of the system based on the rules is tested by comparing the results of its implementation to a text to the same text in which the stress to the words was assigned by an expert. The obtained results are very good with the accuracy of 78% if the MSD tags are assigned automatically to the words, and 87,7% if the MSD tags were corrected by hand. There are words in Croatian that are written independently, but when it comes to their stress, they do not have one, but are prosodically leaning to the next or previous word. Such words are called clitics (proclitics and enclitics). There are cases in Croatian when the stress from the word that usually bears stress moves to the proclitic. Those rules are also implemented in the system and their implementation increased the accuracy of the system to 92,8%. Sometimes words from the text cannot be found in the lexicon. For such cases, a system for automatic lexical stress assignment to the words was developed. The system consists of two models trained on the data from the above-described lexicon. One model was trained for the place of the stress prediction and the other for the category of the stress prediction (there are four possible stress categories in Croatian). The accuracy of the model for place of the stress prediction measured by tenfold cross-validation is 90,56%, and the accuracy of the model for category of the stress prediction is 86,02%. The accuracy of the models are also tested on the text which was used for the evaluation of the system based on the rules. The achieved accuracy for the place of the stress prediction is 97,4%, for the category of the stress 82,4%, and for both place and category of the stress the achieved accuracy is 80,1%. The system based on the rules achieved batter accuracy compared to the system for automatic stress assignment based on the models. However, because there were words that were not assigned the stress after the implementation of the system based on the rules, the system for automatic stress assignment based on the models was used as a supplement to the system based on the rules in such cases. Such a hybrid approach achieved the accuracy of 95,3%. In this doctoral thesis an analysis of syllable duration for Croatian was conducted and duration model developed. It was determined that the position of the syllable within word and sentence has impact to the duration of the syllable. In average, the duration of the syllable increased by 41,4% compared to the reference value if its position was at the beginning of the word and by 37,0% if its position was at the end of the word. If the position of the syllable was at the beginning of the sentence, its duration increased by 71,8% compared to the reference value, and by 104,75% if the syllable was in the end of the sentence. The analysis also showed that the contextual features have impact to the duration of the syllables. The duration of the syllable increased by different percentages according to the category of the consonants that followed after the observed syllable. There were three categories of features taken into consideration in the duration model that was developed for Croatian - positional, contextual and those related to the stress. First, the accuracy of the duration model was tested after taking into consideration all three categories of the features. Then the accuracy of the model was tested after leaving out one of the category in order to determine how each category of the features contributes to the accuracy of the duration model. It was determined that all three categories impact the accuracy of the model in certain percentage and the greatest impact have features that belong to the positional category. For intonation modelling of the Croatian language, Tilt intonation model was applied. For that purpose, a database of 500 sentences was labelled with corresponding tilt labels. The best RMSE value that was obtained by comparing the obtained F0 contour to the original is 22,2.

Item Type: PhD Thesis
Uncontrolled Keywords: Croatian prosody, Croatian accent lexicon, automatic lexical stress assignment, duration analysis, duration model, Tilt intonation model
Subjects: Information sciences > Social-humanistic informatics
Departments: Department of Information Science
Supervisor: Mikelić Preradović, Nives
Additional Information: Poslijediplomski doktorski studij informacijskih i komunikacijskih znanosti
Date Deposited: 12 Sep 2016 11:20
Last Modified: 12 Sep 2016 11:21
URI: http://darhiv.ffzg.unizg.hr/id/eprint/6912

Actions (login required)

View Item View Item