<?xml version="1.0" encoding="UTF-8"?>
<article xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.4" article-type="research-article" xml:lang="en"><front><journal-meta><journal-title-group><journal-title xml:lang="ru">Вестник Волгоградского государственного университета. Серия 2. Языкознание</journal-title></journal-title-group><journal-id journal-id-type="issn">1998-9911</journal-id><journal-id journal-id-type="eissn">2409-1979</journal-id></journal-meta><article-meta><article-id pub-id-type="doi">10.15688/jvolsu2.2024.5.1</article-id><title-group><article-title xml:lang="ru">Лексикографические проблемы систем машинного перевода: на пути от буквального до нейронного</article-title><trans-title-group xml:lang="en"><trans-title>Lexicographic Problems of Machine Translation Systems: On the Way from Literal to Neural</trans-title></trans-title-group></title-group><contrib-group><contrib contrib-type="author"><name><surname>Беляева</surname><given-names>Лариса Николаевна</given-names></name><name-alternatives><name xml:lang="ru"><surname>Беляева</surname><given-names>Лариса Николаевна</given-names></name><name xml:lang="en"><surname>Beliaeva</surname><given-names>Larisa</given-names></name></name-alternatives><xref ref-type="aff" rid="aff1"/><email>lauranbel@gmail.com</email><contrib-id contrib-id-type="orcid">0000-0002-8622-4595</contrib-id></contrib><contrib contrib-type="author"><name><surname>Камшилова</surname><given-names>Ольга Николаевна</given-names></name><name-alternatives><name xml:lang="ru"><surname>Камшилова</surname><given-names>Ольга Николаевна</given-names></name><name xml:lang="en"><surname>Kamshilova</surname><given-names>Olga</given-names></name></name-alternatives><xref ref-type="aff" rid="aff1"/><xref ref-type="aff" rid="aff2"/><email>onkamshilova@gmail.com</email><contrib-id contrib-id-type="orcid">0000-0002-1488-2206</contrib-id></contrib><aff-alternatives id="aff1"><aff><institution xml:lang="en">Herzen State Pedagogical University of Russia (Saint Petersburg, Russian Federation)</institution></aff><aff><institution xml:lang="ru">Российский государственный педагогический университет им. А.И. Герцена (Санкт-Петербург, Российская Федерация)</institution></aff></aff-alternatives><aff-alternatives id="aff2"><aff><institution xml:lang="en">Saint Petersburg University of Management Technologies and Economics (Saint Petersburg, Russian Federation)</institution></aff><aff><institution xml:lang="ru">Санкт-Петербургский университет технологий управления и экономики (Санкт-Петербург, Российская Федерация)</institution></aff></aff-alternatives></contrib-group><pub-date pub-type="epub" iso-8601-date="2024-12-27"><day>27</day><month>12</month><year>2024</year></pub-date><volume>23</volume><issue>5</issue><fpage>6</fpage><lpage>19</lpage><history><date date-type="received" iso-8601-date="2024-05-13"><day>13</day><month>05</month><year>2024</year></date><date date-type="accepted" iso-8601-date="2024-08-20"><day>20</day><month>08</month><year>2024</year></date></history><permissions><license><license-p xml:lang="ru">CC BY 4.0</license-p></license></permissions><abstract xml:lang="ru"><p>В статье рассматриваются актуальные вопросы интерпретации современными системами машинного перевода (МП) лексики, неизвестной этим системам (out-of-vocabulary words), в контексте изменений форм и ведения автоматического словаря. Дан критический очерк типологии систем МП и стратегий их развития. Описаны особенности этих стратегий и влияние на них развивающихся программных средств и технологий. Проанализированы формы ведения словарной поддержки, меняющиеся под воздействием технологических условий. Показано, что при любой системе МП ее лингвистическое обеспечение и структура автоматических словарей становятся принципиально важными для поддержания качества перевода. При всем успехе развития нейронных систем МП (НМП) их автоматически пополняемые словарные базы не фиксируют слова, характеризующиеся терминологической спецификой и низкой частотой в массивах и корпусах текстов, на которых обучается система. На примере анализа результатов двух востребованных НМП – Google Translate и Yandex Translate – доказано, что обработка и унификация перевода слов, не вошедших в словари системы, прежде легко решавшаяся пользователями всех типов систем МП на основе пополнения и ведения автоматического словаря, остается по-прежнему актуальной проблемой и требует особого подхода при редактировании результатов НМП.</p></abstract><trans-abstract xml:lang="en"><p>The article discusses some current issues of interpreting out-of-vocabulary words by modern machine translation systems (MT systems) in the context of changing forms and ways of maintaining an automatic dictionary. It provides a critical outline of the typology of MT systems and strategies for their development. It describes the impact of fast developing software and technologies on these strategies and analyzes the changes they bring into the forms of dictionary support. The research shows that the linguistic support and the structure of automatic dictionaries, whatever the MT system is, are fundamentally important for ensuring the quality of translation. Despite all the success of neural MT (NMT) systems, their automatically updated vocabulary databases do not record words characterized by terminological specificity and low frequency in the special texts and text corpora on which the system is trained. Analysis of translations performed by two popular NMT systems – Google Translate and Yandex Translate – has proven that they fail to process and unify the translation of words that are not entered in the system dictionaries, a task used to be solved easily by users of all types of MT systems with the help of automatic dictionaries. With statistic-based automatic dictionaries it remains a pressing problem and requires a special approach when editing MP results.</p></trans-abstract><kwd-group xml:lang="en"><kwd>machine translation strategy</kwd><kwd>machine translation</kwd><kwd>typology of machine translation systems</kwd><kwd>automatic dictionary</kwd><kwd>out-of-vocabulary words</kwd><kwd>linguistic support</kwd></kwd-group><kwd-group xml:lang="ru"><kwd>машинный перевод</kwd><kwd>стратегия машинного перевода</kwd><kwd>типология систем машинного перевода</kwd><kwd>автоматический словарь</kwd><kwd>неизвестное слово</kwd><kwd>лингвистическая поддержка</kwd></kwd-group></article-meta></front><back><ref-list><ref id="ref1"><mixed-citation xml:lang="ru">Беляева Л. Н., 2016. Лингвистические технологии в современном сетевом пространстве: language worker в индустрии локализации. СПб. : Кн. дом. 134 с.</mixed-citation></ref><ref id="ref2"><mixed-citation xml:lang="ru">Беляева Л. Н., 2022. Машинный перевод в современной технологии процесса перевода // Известия РГПУ им. А.И. Герцена. № 203. С. 22–30.</mixed-citation></ref><ref id="ref3"><mixed-citation xml:lang="ru">Беляева Л. Н., Камшилова О. Н., Шубина Н. Л., 2023. Научная статья в технологическом пространстве машинного перевода: правила и процедуры редактирования : учеб. пособие. СПб. : Кн. дом. 90 с.</mixed-citation></ref><ref id="ref4"><mixed-citation xml:lang="ru">Нуриев В. А., 2019. Архитектура системы нейронного машинного перевода // Информатика и ее применения. Т. 13, № 3. С. 90–96. DOI: https://doi.org/10.14357/19922264190313</mixed-citation></ref><ref id="ref5"><mixed-citation xml:lang="ru">Раренко М. Б., 2021. Машинный перевод: от перевода «по правилам» к нейронному переводу (Обзор) // Социальные и гуманитарные науки. Отечественная и зарубежная литература. Серия 6, Языкознание : РЖ. № 3. С. 70–79. DOI: https://doi.org/10.31249/ling/2021.03.05</mixed-citation></ref><ref id="ref6"><mixed-citation xml:lang="ru">Almansoori A., Al Mansoori S., Alshamsi M., Salloum S. A., Shaalan K., 2020. Development of Machine Translation Models: A Systematic Review // International Journal of Control and Automation. Vol. 13, № 2. P. 1462–1483.</mixed-citation></ref><ref id="ref7"><mixed-citation xml:lang="ru">Araabi A., Monz C., Niculae V., 2022. How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation? URL: https://arxiv.org/abs/2208.05225v1</mixed-citation></ref><ref id="ref8"><mixed-citation xml:lang="ru">Brottrager J., Stahl A., Arslan A., Brandes U., Weitin T., 2022. Modeling and Predicting Literary Reception // Journal of Computational Literary Studies. Vol. 1, iss. 1. P. 1–27. DOI: 10.26083/tuprints-00023250</mixed-citation></ref><ref id="ref9"><mixed-citation xml:lang="ru">Dankers V., Bruni E., Hupkes D., 2022. The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study // Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Vol. 1. Long Papers. P. 4154–4175. DOI: https://doi.org/10.48550/arXiv.2108.05885</mixed-citation></ref><ref id="ref10"><mixed-citation xml:lang="ru">Devlin J., Chang M.-W., Lee K., Toutanova K., 2019. Pre-Training of Deep Bidirectional Transformers for Language Understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1. Long and Short Papers. P. 4171–4186. DOI: https://doi.org/10.18653/v1/N19-1423</mixed-citation></ref><ref id="ref11"><mixed-citation xml:lang="ru">Khoong E. C., Rodriguez J. A., 2022. A Research Agenda for Using Machine Translation in Clinical Medicine // Journal of General Internal Medicine. Vol. 37, iss. 5. P. 1275–1277. DOI: 10.1007/ s11606-021-07164- y</mixed-citation></ref><ref id="ref12"><mixed-citation xml:lang="ru">Lankford S., Afli H., Way A., 2021. Transformers for Low-Resource Languages: Is Feґidir Linn! // Proceedings of the 18th Biennial Machine Translation Summit Virtual USA, August 16–20. Vol. 1. MT Research Track. P. 48–61. DOI: https://doi.org/10.48550/arXiv.2403.01985</mixed-citation></ref><ref id="ref13"><mixed-citation xml:lang="ru">Liu X., Sun T., He J., Wu J., Wu L., Zhang X., Jiang H., Cao Z., Huang X., Qiu X., 2022. Towards Efficient NLP: A Standard Evaluation and a Strong Baseline // Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle : Association for Computational Linguistics. P. 3288–3303.</mixed-citation></ref><ref id="ref14"><mixed-citation xml:lang="ru">Peris Б., Casacuberta F., 2019. Online Learning for Effort Reduction in Interactive Neural Machine Translation // Computer Speech &amp; Language. Vol. 58. P. 98–126. DOI: https://doi.org/10.48550/arXiv.1802.03594</mixed-citation></ref><ref id="ref15"><mixed-citation xml:lang="ru">Popoviж M., 2017. chrF++: Words Helping Character n-Grams // Proceedings of the Second Conference on Machine Translation. Copenhagen : [s. n.]. P. 612–618.</mixed-citation></ref><ref id="ref16"><mixed-citation xml:lang="ru">Sennrich R., Haddow B., Birch A., 2015. Neural Machine Translation of Rare Words with Subword Units. arXiv:1508.07909v5 [cs.CL]. DOI: https://doi.org/10.48550/arXiv.1508.07909</mixed-citation></ref><ref id="ref17"><mixed-citation xml:lang="ru">Tars M., Tдttar A., Fiљel M., 2022. Cross-Lingual Transfer From Large Multilingual Translation Models to Unseen Under-Resourced Languages // Baltic Journal of Modern Computing. Vol. 10, iss. 3. P. 435–446. DOI: https://doi.org/10.22364/bjmc.2022.10.3.16</mixed-citation></ref><ref id="ref18"><mixed-citation xml:lang="ru">Toral A., 2019. Post-Editese: An Exacerbated Translationese // Proceedings of Machine Translation Summit XVII. Vol. 1. Research Track. Dublin : European Association for Machine Translation. P. 273–281.</mixed-citation></ref><ref id="ref19"><mixed-citation xml:lang="ru">Zhu C., Yu H., Cheng Sh., Luo W., 2020. Language-Aware Interlingua for Multi-Lingual Neural Machine Translation // Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroutsburg : Association for Computational Linguistics. P. 1650–1655.</mixed-citation></ref><ref id="ref20"><mixed-citation xml:lang="ru">Zhuang F., Qi Z, Duan K., Xi D., Zhu Y., Zhu H., Xiong H., He Q., 2021. A Comprehensive Survey on Transfer Learning // Proceedings of the IEEE. Vol. 109, iss. 1. P. 43–76. doi: 10.1109/JPROC. 2020.3004555</mixed-citation></ref><ref id="ref21"><mixed-citation xml:lang="en">Belyaeva L.N., 2016. Lingvisticheskiye tekhnologii v sovremennom setevom prostranstve: language worker v industrii lokalizatsii [Linguistic Technologies in the Modern Network Space: Language Worker in the Localization Industry]. Saint Petersburg, Kn. dom Publ. 134 p.</mixed-citation></ref><ref id="ref22"><mixed-citation xml:lang="en">Belyaeva L.N., 2022. Mashinnyy perevod v sovremennoy tekhnologii protsessa perevoda [Machine Translation in Modern Translation Technology]. Izvestiya RGPU im. A.I. Gercena [Izvestia: Herzen University Journal of Humanities &amp; Sciences)], no. 203, pp. 22-30.</mixed-citation></ref><ref id="ref23"><mixed-citation xml:lang="en">Belyaeva L.N., Kamshilova O.N., Shubina N.L., 2023. Nauchnaya statya v tekhnologicheskom prostranstve mashinnogo perevoda: pravila i procedury redaktirovaniya: ucheb. posobie [Scientific Article in the Technological Space of Machine Translation: Editing Rules and Procedures. Textbook]. Saint Petersburg, Kn. dom Publ. 90 p.</mixed-citation></ref><ref id="ref24"><mixed-citation xml:lang="en">Nuriev V.A., 2019. Arkhitektura sistemy neyronnogo mashinnogo perevoda [Architecture of a Machine Translation System]. Informatika i ee primeneniya [Informatics and Applications], vol. 13, no. 3, pp. 90-96. DOI: https://doi.org/10.14357/19922264190313</mixed-citation></ref><ref id="ref25"><mixed-citation xml:lang="en">Rarenko M.B., 2021. Mashinnyy perevod: ot perevoda «po pravilam» k neyronnomu perevodu (Obzor) [Machine Translation: From Translation “According to the Rules” to Neural Translation (Review)]. Sotsialnye i gumanitarnye nauki. Otechestvennaya i zarubezhnaya literatura. Seriya 6. Yazykoznanie: RZh [Social Sciences and Humanities. Domestic and Foreign Literature. Series 6. Linguistics. Abstract Journal. INION RAN], no. 3, pp. 70-79. DOI: https://doi.org/10.31249/ling/2021.03.05</mixed-citation></ref><ref id="ref26"><mixed-citation xml:lang="en">Almansoori A., Al Mansoori S., Alshamsi M., Salloum S.A., Shaalan K., 2020. Development of Machine Translation Models: A Systematic Review. International Journal of Control and Automation, vol. 13, no. 2, pp. 1462-1483.</mixed-citation></ref><ref id="ref27"><mixed-citation xml:lang="en">Araabi A., Monz C., Niculae V., 2022. How Effective Is Byte Pair Encoding for Out-Of-Vocabulary Words in Neural Machine Translation? URL: https://arxiv.org/abs/2208.05225v1</mixed-citation></ref><ref id="ref28"><mixed-citation xml:lang="en">Brottrager J., Stahl A., Arslan A., Brandes U., Weitin T., 2022. Modeling and Predicting Literary Reception. Journal of Computational Literary Studies, vol. 1, iss. 1, pp. 1-27. DOI: 10.26083/tuprints-00023250</mixed-citation></ref><ref id="ref29"><mixed-citation xml:lang="en">Dankers V., Bruni E., Hupkes D., 2022. The Paradox of the Compositionality of Natural Language: A Neural Machine Translation Case Study. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Vol. 1: Long Papers, pp. 4154-4175. DOI: https://doi.org/10.48550/arXiv.2108.05885</mixed-citation></ref><ref id="ref30"><mixed-citation xml:lang="en">Devlin J., Chang M.-W., Lee K., Toutanova K., 2019. Pre-Training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Vol. 1. Long and Short Papers, pp. 4171-4186. DOI: https://doi.org/10.18653/v1/N19-1423</mixed-citation></ref><ref id="ref31"><mixed-citation xml:lang="en">Khoong E.C., Rodriguez J.A., 2022. A Research Agenda for Using Machine Translation in Clinical Medicine. Journal of General Internal Medicine, vol. 37, iss. 5, pp. 1275-1277. DOI: 10.1007/s11606-021-07164-y</mixed-citation></ref><ref id="ref32"><mixed-citation xml:lang="en">Lankford S., Afli H., Way A., 2021. Transformers for Low-Resource Languages: Is Feґidir Linn! Proceedings of the 18th Biennial Machine Translation Summit Virtual USA, August 16–20. Vol. 1. MT Research Track, pp. 48-61. DOI: https://doi.org/10.48550/arXiv.2403.01985</mixed-citation></ref><ref id="ref33"><mixed-citation xml:lang="en">Liu X., Sun T., He J., Wu J., Wu L., Zhang X., Jiang H., Cao Z., Huang X., Qiu X., 2022. Towards Efficient NLP: A Standard Evaluation and a Strong Baseline. Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle, Association for Computational Linguistics, pp. 3288-3303.</mixed-citation></ref><ref id="ref34"><mixed-citation xml:lang="en">Peris Б., Casacuberta F., 2019. Online Learning for Effort Reduction in Interactive Neural Machine Translation. Computer Speech &amp; Language, vol. 58, pp. 98-126. DOI: https://doi.org/10.48550/arXiv.1802.03594</mixed-citation></ref><ref id="ref35"><mixed-citation xml:lang="en">Popoviж M., 2017. chrF++: Words Helping Character n-Grams. Proceedings of the Second Conference on Machine Translation. Copenhagen, s.n., pp. 612-618.</mixed-citation></ref><ref id="ref36"><mixed-citation xml:lang="en">Sennrich R., Haddow B., Birch A., 2015. Neural Machine Translation of Rare Words with Subword Units. arXiv:1508.07909v5 [cs.CL]. DOI: https://doi.org/10.48550/arXiv.1508.07909</mixed-citation></ref><ref id="ref37"><mixed-citation xml:lang="en">Tars M., Tдttar A., Fiљel M., 2022. Cross-Lingual Transfer from Large Multilingual Translation Models to Unseen Under-Resourced Languages. Baltic Journal of Modern Computing, vol. 10, iss. 3, pp. 435-446. DOI: https://doi.org/10.22364/bjmc.2022.10.3.16</mixed-citation></ref><ref id="ref38"><mixed-citation xml:lang="en">Toral A., 2019. Post-Editese: An Exacerbated Translationese. Proceedings of Machine Translation Summit XVII. Vol. 1. Research Track. Dublin, European Association for Machine Translation, pp. 273-281.</mixed-citation></ref><ref id="ref39"><mixed-citation xml:lang="en">Zhu C., Yu H., Cheng Sh., Luo W., 2020. Language-Aware Interlingua for Multi-Lingual Neural Machine Translation. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroutsburg, Association for Computational Linguistics, pp. 1650-1655.</mixed-citation></ref><ref id="ref40"><mixed-citation xml:lang="en">Zhuang F., Qi Z, Duan K., Xi D., Zhu Y., Zhu H., Xiong H., He Q., 2021. A Comprehensive Survey on Transfer Learning. Proceedings of the IEEE, vol. 109, iss. 1, pp. 43-76. doi: 10.1109/JPROC.2020.3004555</mixed-citation></ref></ref-list></back></article>
