Matytcina M.S., Prokhorova O.N., Chekulai I.V. Combinability and Stability Analysis of Lexical Units by Statistical Methods (Exemplified by the Verb Take)
DOI: https://doi.org/10.15688/jvolsu2.2024.4.9
Marina S. Matytcina
Doctor of Sciences (Philology), Professor, Department of Foreign Languages, Lipetsk State Technical University
Moskovskaya St, 30, 398055 Lipetsk, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0001-6102-4397
Olga N. Prokhorova
Doctor of Sciences (Philology), Professor, Director, Institute of Intercultural Communication and International Relations, Belgorod State National Research University
Pobedy St, 85, Bld. 10, 308015 Belgorod, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0001-9441-819X
Igor V. Chekulai
Doctor of Sciences (Philology), Professor, Department of English Philology and Intercultural Communication, Belgorod State National Research University
Pobedy St, 85, Bld. 10, 308015 Belgorod, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0001-8599-1699
Abstract. This article is devoted to the issues related to the definition of stable word combinability in speech. The research relevance is sustained by the existing need in profound linguistic knowledge about the factors that determine the formation of stable relationships between the elements of a word combination. The English Web Corpus (enTenTen) and its subcorpora are chosen as the source. The authors consider bigrams of a two-word combination: the verb take with an adjacent word. In addition to a critical examination of the measures used to determine word cohesion, the nature of the relationships between collocation elements is analysed. Particular attention is paid to the comparison of collocations in subcorpora, which contain texts of different genres and topics. More than 100 bigrams obtained through the association measures t-score, MI-score and Log Dice are analysed. The t-score measure differs across the investigated subcorpora, which demonstrates the correlation of the findings with the size of the subcorpora. It is concluded that it is not possible to determine the degree of stability of the associative relationship in the bigrams of the verb take based on this measure alone. The data obtained using the MI-score and Log Dice measures show little difference between subcorpora, demonstrating their independence of the corpus size. The variable nature of the relationships between the collocation elements has been revealed to lie in the dependency of the degree of coherence of words in a word combination on the frequency of their occurrence in the texts of different genres, registers and modalities. Special attention is given to the issue of identifying the degree of effectiveness of the measures in extracting verb collocations and their application to specific professional tasks.
Key words: linguistic corpus, subcorpus, collocation, measures of association, English Web Corpus (enTenTen), t-score, MI-score, Log Dice.
Citation. Matytcina M.S., Prokhorova O.N., Chekulai I.V. Combinability and Stability Analysis of Lexical Units by Statistical Methods (Exemplified by the Verb Take). Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2. Yazykoznanie [Science Journal of Volgograd State University. Linguistics], 2024, vol. 23, no. 4, pp. 106-118. (in Russian). DOI: https://doi.org/10.15688/jvolsu2.2024.4.9
Combinability and Stability Analysis of Lexical Units by Statistical Methods (Exemplified by the Verb Take) by Matytcina M.S., Prokhorova O.N., Chekulai I.V. is licensed under CC BY 4.0