Radbil T.B., Markina M.V. Russian Text Author’s Gender Identification in Forensic Examination: Probability-and-Statistics Method
https://doi.org/10.15688/jvolsu2.2021.5.4
Timur B. Radbil
Doctor of Sciences (Philology), Professor, Head of the Department of Theoretical and Applied Linguistics, Lobachevsky State University of Nizhny Novgorod
Prosp. Gagarina, 23, 603950 Nizhny Novgorod, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
ResearcherID: AAO-6983-2020
ScopusID: 57210390493
https://orcid.org/0000-0002-7516-6705
Marina V. Markina
Candidate of Sciences (Physics and Mathematics), Associated Professor, Department of Theoretical, Computer and Experimental Mechanics, Lobachevsky State University of Nizhny Novgorod
Prosp. Gagarina, 23, 603950 Nizhny Novgorod, Russia
This email address is being protected from spambots. You need JavaScript enabled to view it.
https://orcid.org/0000-0002-1042-8006
Abstract. The article discusses intermediate research results in the development and improvement of a computerized model of Russian texts authorization, which is based on complex application of probabilistic-and-statistical methods. The study aims to describe the new capabilities of the created system in the aspect of its application to diagnostic examinations in text authorization for detection of the gender of the alleged author of the text. The work presents the next stage of fine-tuning and testing of the improved version of the computer program "CTA" (computerized text authorization), which at this stage was adapted for the task of determining and comparing stable relative frequencies of correlation coefficients (the ratio of specified linguistic phenomena of different levels of the language system) in the texts, the authors of which are men and women. The research material is the continuously updated primary bases of literary texts of the 19 th and 21 st centuries (4 bases, respectively). The work shows that for the texts written by men and women, significant differences can be noted in such correlation coefficients as average word length, average sentence length, objectivity coefficient, quality coefficient, activity coefficient, dynamism coefficient, connectivity coefficient, etc. Verification of the results obtained experimentally has demonstrated that the accuracy of gender determining at this stage of the study is approximately 65%. This indicator can be significantly exceeded with an increase in the volume and quality specification of databases and/or when using new models for calculating the correlation coefficients (Spearman's model, etc.).
Key words: text authorization, computer text authorization, gender, forensic studies in text authorization, automatic text processing, probability-and-statistics method, applied linguistics.
Citation. Radbil T.B., Markina M.V. Russian Text Author's Gender Identification in Forensic Examination: Probability-and-Statistics Method. Vestnik Volgogradskogo gosudarstvennogo universiteta. Seriya 2. Yazykoznanie [Science Journal of Volgograd State University. Linguistics], 2021, vol. 20, no. 5, pp. 43-55. (in Russian). DOI: https://doi.org/10.15688/jvolsu2.2021.5.4
Russian Text Author’s Gender Identification in Forensic Examination: Probability-and-Statistics Method by Radbil T.B., Markina M.V. is licensed under a Creative Commons Attribution 4.0 International License.