Kupriyanov R.V., Solnyshkina M.I., Lekhnitskaya P.A. Parametric Taxonomy of Educational Texts

DOI: https://doi.org/10.15688/jvolsu2.2023.6.6

Roman V. Kupriyanov

Candidate of Sciences (Psychology), Senior Researcher, Text Analytics Laboratory, Kazan Federal University

Kremlevskaya St, 18, 420008 Kazan, Russia

Associate Professor, Department of Social Work, Pedagogy and Psychology, Kazan National Research Technological University

Karla Marxa St, 68, 420015 Kazan, Russia

This email address is being protected from spambots. You need JavaScript enabled to view it.

https://orcid.org/0000-0001-9794-9607

Marina I. Solnyshkina

Doctor of Sciences (Philology), Head and Chief Researcher, Text Analytics Laboratory, Professor, Department of Theory and Practice of Teaching Foreign Languages, Kazan Federal University

Kremlevskaya St, 18, 420008 Kazan, Russia

This email address is being protected from spambots. You need JavaScript enabled to view it.

https://orcid.org/0000-0003-1885-3039

Polina A. Lekhnitskaya

Research Laboratory Assistant, Neurocognitive Research Laboratory, Kazan Federal University

Kremlevskaya St, 18, 420008 Kazan, Russia

This email address is being protected from spambots. You need JavaScript enabled to view it.

https://orcid.org/0000-0002-3689-3213


Abstract. The article is aimed at considering the issue of the discursive text typology and developing a parametric model of the elementary school texts for the ontological domain by employing a corpus-based approach and methods of linguistic statistics. The research corpus of over 90,000 tokens comprises texts of 13 textbooks acknowledged in the 2 nd grade of Russian schools. The applied multifactor discriminant analysis enabled identification and validation of typological characteristics of the texts under study, offering the formula for referring educational texts to a subject domain on Philology, Mathematics, and Natural Sciences. The discriminant analysis results confirmed the hypothesis that each type of text corresponds to a parametric model, which includes six constants: the average number of words in a sentence, the average number of nouns, the average number of verbs and the average number of adjectives per sentence, local noun overlap, global argument overlap. The assessment of linguistic parameters was performed by an automatic Russian text analyzer RuLingva. The classification accuracy of the parametric model was identified as 80%, which ensures its high reliability and allows for the data obtained to be employed in linguistic expertise, as well as for in automated linguistic profiling of texts. The prospect of the research implies installation of the model in RuLingva and development of similar models for texts of other subject domains.

Key words: discourse, subject domain, lexical parameters, syntactic parameters, mathematical model, discriminant analysis.

Citation. Kupriyanov R.V., Solnyshkina M.I., Lekhnitskaya P.A. Parametric Taxonomy of Educational Texts. Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2. Yazykoznanie [Science Journal of Volgograd State University. Linguistics], 2023, vol. 22, no. 6, pp. 80-94. (in Russian). DOI: https://doi.org/10.15688/jvolsu2.2023.6.6

Parametric Taxonomy of Educational Texts by Kupriyanov R.V., Solnyshkina M.I., Lekhnitskaya P.A. is licensed under CC BY 4.0

Attachments:
Download this file (2_Kupriyanov_et al.pdf) 2_Kupriyanov_et al.pdf
URL: https://l.jvolsu.com/index.php/en/component/attachments/download/2867
84 Downloads