Baranov V.A., Zuga O.V. Quantitative Investigation of the Panteleymon Gospel Dating from the Late 12 th to the Early 13 th Centuries (Three Statistical Experiments)
Victor A. Baranov
Doctor of Sciences (Philology), Professor, Head of the Department of Linguistics, Kalashnikov Izhevsk State Technical University
Studencheskaya St, 7, 426069 Izhevsk, Russia
Oksana V. Zuga
Candidate of Sciences (Philology), Associate Professor, Department of the Russian Language, Theoretical and Applied Linguistics, Udmurt State University
Universitetskaya St, 1, 426034 Izhevsk, Russia
Abstract. The work presents the results of the quantitative and statistical comparative analysis of the most frequent word forms and combinations of the Old Russian of the Panteleymon Gospel (RNB, Sof. 1). The work aims to reveal the degree of closeness of the Panteleymon Gospel to the other gospels and the medieval Slavonic texts of other genres, represented in sub-corpora of historical corpus "Manuscript: Slavic Written Heritage". The work was carried out with the help of the special modules of statistics and n-grams. The comparison of the lists of single-, two- and three-component linguistic units, automatically extracted from the manuscripts, with the respective lists of several sub-corpora points to the presence of the quantitative-statistical characteristics of the linguistic components of the manuscripts which can be recognized as important. The data of the three experiments are summarized. The first experiment showed that the smallest differences of the frequency lists exist between the Panteleymon Gospel and the sub-corpus of complete aprakoses and the greatest differences between the manuscript being analyzed and the sub-corpus of short aprakoses. This makes possible to recognize that the composition of the lists, the order and the relative frequency of the forms in them are the important characteristics of the manuscript or the sub-corpus. The application of the Weirdness measure helped to extract from the Panteleymon Gospel the word forms which are supposed to be significant – those, having the highest weight within the sub-corpora of different genres (вамъ, имъ, азъ, емоу, рече, аще). It has been established that the volume and composition of contrasted sub-corpus do not influence the result, and the use of the collections of complete and short aprakoses as contrast sub-corpora helped to specify the list of such forms (яко, къ, бо, о(т), имъ, есть, аще). The investigation of two- and three-component combinations, extracted with the help of the statistical measure T-score, gave the following results: a list of fixed combinations – invariable composition formulas (ев[ан](г)[елие] ѡ(т) ма[т](ѳ)[ея] etc.), inherent to all gospels, was made; entire grammatical structures (ѧже далъ ѥси etc.) were listed, as well as stable semantic complexes and their parts ([да] любите дроугъ дроуга etc.). Statistically important sequences having in the Panteleymon Gospel a statistical weight, which is considerably higher than in the contrast sub-corpora – нѣсте ли чьли, имать животъ вѣчьныи etc. have been revealed.
Key words: Old Russian manuscripts, Panteleymon Gospel, statistical methods, key words, n-grams.
Citation. Baranov V.A., Zuga O.V. Quantitative Investigation of the Panteleymon Gospel Dating from the Late 12 th to the Early 13 th Centuries (Three Statistical Experiments). Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2. Yazykoznanie [Science Journal of Volgograd State University. Linguistics], 2020, vol. 19, no. 6, pp. 43-57. (in Russian). DOI:

