Gorban O.A., Kosova M.V., Sheptukhina E.M., Svetlov A.V. Corpus of the Archival Documents of the Don Cossack Army: Problems of Morphological Analysis

DOI: https://doi.org/10.15688/jvolsu2.2022.6.4

Oksana A. Gorban

Doctor of Sciences (Philology), Professor, Department of Russian Philology and Journalism, Volgograd State University

Prosp. Universitetsky, 100, 400062 Volgograd, Russia

This email address is being protected from spambots. You need JavaScript enabled to view it.


Marina V. Kosova

Doctor of Sciences (Philology), Professor, Department of Russian Philology and Journalism, Volgograd State University

Prosp. Universitetsky, 100, 400062 Volgograd, Russia

This email address is being protected from spambots. You need JavaScript enabled to view it.


Elena M. Sheptukhina

Doctor of Sciences (Philology), Professor, Department of Russian Philology and Journalism, Volgograd State University

Prosp. Universitetsky, 100, 400062 Volgograd, Russia

This email address is being protected from spambots. You need JavaScript enabled to view it.


Andrey V. Svetlov

Candidate of Sciences (Physics and Mathematics), Associate Professor, Department of Mathematical Analysis and Function Theory, Volgograd State University

Prosp. Universitetsky, 100, 400062 Volgograd, Russia

This email address is being protected from spambots. You need JavaScript enabled to view it.


Abstract. The article presents the results of the collective project aimed at comprising a special annotated diachronic corpus of documents of the 18 th – 19 th cen. from the "Mikhailovsky Stanitsa Ataman" Archive Fund (State Archive of Volgograd Region, Russia). In the course of the work, linguistic, technical and software tasks related to meta-marking, morphological tagging and representation of marked texts in an electronic search environment were solved. The texts are written in cursive script of the 18 th cen. with the use of the old Cyrillic letters, which have spelling specificity. To work correctly with them, an add-on to the stemming tool MyStem by I. Segalovich was created. This application adds to the MyStem the following capabilities: the option to work with the old Cyrillic symbols, a convenient graphical interface; it provides the opportunity to remove homonymy manually, enables marked text exporting to an external data storage and processing system. Morphological analysis of some texts revealed the presence of nominal case form variants, which were not noted in the "Russian Grammar" by M.V. Lomonosov, in modern studies of literary texts of the 18 th century. These findings point to effectiveness of automatic tagging which allows word form correction. The research results substantiated text tagging software tools adjustment for the extension of homonymous forms grammatical analysis options, aimed at identification and manual removal of homonymy. A quantitative analysis of these variants will allow the authors to evaluate their significance for the regional administrative language. The information obtained confirms the importance of the corpus creation for studying the history of the Russian language.

Key words: history of the Russian language, regional business writing, linguistic corpus, morphological markup, variants of case forms, grammatical homonymy.

Citation. Gorban O.A., Kosova M.V., Sheptukhina E.M., Svetlov A.V. Corpus of the Archival Documents of the Don Cossack Army: Problems of Morphological Analysis. Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2. Yazykoznanie [Science Journal of Volgograd State University. Linguistics], 2022, vol. 21, no. 6, pp. 47-56. (in Russian). DOI: https://doi.org/10.15688/jvolsu2.2022.6.4

Creative Commons License
Corpus of the Archival Documents of the Don Cossack Army: Problems of Morphological Analysis by Gorban O.A., Kosova M.V., Sheptukhina E.M., Svetlov A.V. is licensed under a Creative Commons Attribution 4.0 International License.
Download this file (4_Gorban_etc..pdf) 4_Gorban_etc..pdf
URL: https://l.jvolsu.com/index.php/en/component/attachments/download/2686