I’m working on a research project on the topic "Text processing of mixed Russian-Kazakh language" as part of my graduation diploma. The research was performed for further commercial use of the results in development in Dialog Systems company. Below you can see some tasks I solved in this research.
- I researched scientific materials related to the topics of the work; here is a review article based on the results of the research: https://github.com/sash00k/kaz-rus-old/blob/master/report_about_problem.pdf
- I wrote a parser of text samples on mixed language using API
- I cleaned, formatted, and evaluated the quality of the collected texts; I madetokenization and categorization of them by the percentage of the words in the Kazakh language
- I tested hypotheses about the applicability of classical models of text vectorization by applying the same models to samples with different proportions of representation of the Kazakh language
10 Oct 2021 - 29 Sep 2022