Gorban O.A., Kosova M.V., Sheptukhina E.M. Structural Markup of Official Documents in Diachronic Linguistic Corpus: Problems and Solutions
DOI: https://doi.org/10.15688/jvolsu2.2021.4.1
Oksana A. Gorban
Doctor of Sciences (Philology), Professor, Department of Russian Philology and Journalism, Volgograd State University
Prosp. Universitetsky, 100, 400062 Volgograd, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0002-2345-3673
Marina V. Kosova
Doctor of Sciences (Philology), Professor, Department of Russian Philology and Journalism, Volgograd State University
Prosp. Universitetsky, 100, 400062 Volgograd, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0003-2854-8759
Elena M. Sheptukhina
Doctor of Sciences (Philology), Professor, Department of Russian Philology and Journalism, Volgograd State University
Prosp. Universitetsky, 100, 400062 Volgograd, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0002-8007-6042
Abstract. The research relevance is determined by the need to annotate official documents of Don Cossack Host written in the middle of the 18 th century and kept in "Mikhailovsky Stanitsa Ataman" archive fund of the State Archive of the Volgograd Region (SAVR, fund 332, inventory 1), so as to compile a linguistic corpus. The authors characterize the problems of the deposited documentary text structural division. These difficulties occur due to the specifics of the form, the dynamics of genres and the syntactical peculiarities of business communication in the middle of the 18 th century. It is revealed that the complexity of documentary text division depends on the degree of its narrativity. The choice of a structural-semantic segment that coincides with a sentence or several closely connected sentences as a layout unit is motivated. A complex method of document segmentation for the structural markup is justified. The approach is based on genre parameterization of documents and their syntactic segmentation. It has been established that the segment boundaries can be indicated by the complex of graphic symbols, speech formulas that perform the function of details of payments, lexical and grammatical means. As a result of the study, it has been shown that the succession of procedures implemented for text segmentation, and targeted at genre and speech organization of the document identification, makes it possible to present in the diachronic corpus the information, which is necessary and sufficient for the user to conclude about the properties of the document text and its units.
Key words: history of the Russian language, document, Don Cossack Host, linguistic corpus, structural markup, genre, text segmentation.
Citation. Gorban O.A., Kosova M.V., Sheptukhina E.M. Structural Markup of Official Documents in Diachronic Linguistic Corpus: Problems and Solutions. Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2. Yazykoznanie [Science Journal of Volgograd State University. Linguistics], 2021, vol. 20, no. 4, pp. 5-18. (in Russian). DOI: https://doi.org/10.15688/jvolsu2.2021.4.1
Structural Markup of Official Documents in Diachronic Linguistic Corpus: Problems and Solutions by Gorban O.A., Kosova M.V., Sheptukhina E.M. is licensed under a Creative Commons Attribution 4.0 International License.