Knjižnica Filozofskog fakulteta
Sveučilišta u Zagrebu
Faculty of Humanities and Social Sciences Institutional Repository

Provjera modificiranog Heapsovog zakona na korpusu zakona Europske unije na hrvatskom jeziku

Downloads

Downloads per month over past year

Turković, Jasna. (2012). Provjera modificiranog Heapsovog zakona na korpusu zakona Europske unije na hrvatskom jeziku. Diploma Thesis. Filozofski fakultet u Zagrebu, Department of Information Science. [mentor Tuđman, Miroslav].

[img] PDF (Croatian) - Registered users only
Download (391kB) | Request a copy

Abstract

This paper deals with the problem of calculating the vocabulary size of legal texts with the equation of the modified Heaps’ law VRt=(Kn)β on the corpus of 18 European Union laws in Croatian with the total number of 1177735 tokens. Following a statistical analysis of the corpus, the values of the constant K, 23, and constant β, 0.64, were calculated. These values are approximate to the values of the constants calculated for the corpus of literary texts in Croatian, which proves that the constants are not dependent on the registers within a language. By using the constants, the values of other parameters were calculated. The calculated values have proven that the modified Heaps’ law equation cannot calculate the vocabulary size of a text with an adequate correlation with the real vocabulary size. The reason behind this is the text structure, i.e. the number of types, in legal texts because of which the founding presupposition of the Heaps’ law – that the number of tokens and types is growing exponentially in the same ratio – is invalid. As a result, neither the equation for calculating the number of hapax legomena nor the maximal frequency is valid for this corpus. It has been proven that the number of hapax legomena is not dependant on the register while the maximal frequency is dependent on the register of the texts in a corpus.

Item Type: Diploma Thesis
Uncontrolled Keywords: modified Heaps' law, bibliometry, legal register, vocabulary size, Croatian language
Subjects: Information sciences > Social-humanistic informatics
Departments: Department of Information Science
Supervisor: Tuđman, Miroslav
Date Deposited: 12 Jun 2014 10:40
Last Modified: 09 Jul 2014 23:20
URI: http://darhiv.ffzg.unizg.hr/id/eprint/4281

Actions (login required)

View Item View Item