New approach to searching for string median and visualization of string clusters
https://doi.org/10.22405/2226-8383-2019-20-2-93-107
About the Authors
Dmitry Viktorovich GorbachevRussian Federation
Evgenii Petrovich Ofitserov
Russian Federation
References
1. Gusfield D. Algorithms on strings, trees, and sequences: Computer science and computational biology. New York: Cambridge University Press, 1997.
2. Ofitserov E.P. Statistical model of T-cell receptor pereferic selection // Izvestiya of Tula State University. Technical sciences. 2017. No. 2. P. 138–143.
3. Ofitserov E.P. Deep model of T-cell receptor selection // Izvestiya of Tula State University. Technical sciences. 2017. No. 12, part 2. P. 350–355.
4. Ofitserov E.P. Motif based sequence classification // Chebyshevskii Sbornik. 2018. Vol. 19, no. 1. P. 187–199. (In Russ.) https://doi.org/10.22405/2226-8383-2018-19-1-187-199
5. Ofitserov E.P. Software package for solving machine learning problems on string data using soft edit distance // Izvestiya of Tula State University. Technical sciences. 2019. No. 5. P. 370–376.
6. Tou J.T., Gonzalez R.C. Pattern recognition principles. Addison-Wesley Publishing Company, 1974.
7. Boyd S., Vandenberghe L. Convex Optimization. Cambridge University Press, 2004.
8. Casacuberta F., de Antonio M. A greedy algorithm for computing approximate median strings // In: VII Simposium Nacional de Reconocimiento de Formas y An’alisis de Im’agenes. 1997. P. 193–198.
9. De la Higuera C., Casacuberta F. Topology of strings: Median string is NP-complete // Theoretical Computer Science. 2000. Vol. 230, no. 1–2. P. 39–48. doi: 10.1016/s0304-3975(97)00240-5
10. Hayashida M., Koyano H. Finding median and center strings for a probability distribution on a set of strings under Levenshtein distance based on integer linear programming // In: Fred A., Gamboa H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2016. Communications in Computer and Information Science. Springer, Cham. 2017. Vol 690. P. 108–121.
11. Kruzslicz F. Improved greedy algorithm for computing approximate median strings // Acta Cybernetica. 1999. Vol. 14, no. 2. P. 331–339.
12. MartÍnez-Hinarejos C.D., Juan A., Casacuberta F. Use of median string for classification // Proc. 15th Int. Conf. on Pattern Recognition. ICPR-2000, Barcelona, Spain. 2000. Vol. 2. P. 903–906. doi: 10.1109/ICPR.2000.906220
13. Van der Maaten L., Hinton G. Visualizing data using t-SNE // J. Mach. Learn. Res. 2008. Vol. 9. P. 2579–2605.
14. Ofitserov E., Tsvetkov V., Nazarov V. Soft edit distance for differentiable comparison of symbolic sequences // arXiv:1904.12562. 2019.
15. Olivares-Rodr’ıguez C., Oncina J. A stochastic approach to median string computation // In: da Vitoria Lobo N. et al. (eds) Structural, Syntactic, and Statistical Pattern Recognition. SSPR /SPR 2008. Lecture Notes in Computer Science. Springer, Berlin, Heidelberg. 2008. Vol. 5342. P. 431–440.
16. Ruder S. An overview of gradient descent optimization algorithms // arXiv:1609.04747. 2016.
17. Sammon J.W. A nonlinear mapping for data structure analysis // IEEE Transactions on Computers, 18(5):401–409, 1969.
18. Torgerson W.S. Multidimensional scaling I: Theory and method // Psychometrika. 1952. Vol. 17. P. 401–419.
19. Wang L., Jiang T. On the complexity of multiple sequence alignment // J. Computat. Biol. 1994. Vol. 1, no. 4. P. 337–348.
Review
For citations:
Gorbachev D.V., Ofitserov E.P. New approach to searching for string median and visualization of string clusters. Chebyshevskii Sbornik. 2019;20(2):93-107. (In Russ.) https://doi.org/10.22405/2226-8383-2019-20-2-93-107