Ofitserov Evgeny Petrovich — department of applied mathematics and computer science

E. P. Ofitserov

doi:10.22405/2226-8383-2018-19-1-187-199

Ofitserov Evgeny Petrovich — department of applied mathematics and computer science

E. P. Ofitserov

https://doi.org/10.22405/2226-8383-2018-19-1-187-199

Full Text:

PDF (Rus)

Generate QR code

Abstract

Sequence classification problems often arise in such areas as bioinformatics and natural language processing. In the last few year best results in this field were achieved by the deep learning methods, especially by architectures based on recurrent neural networks (RNN). However, the common problem of such models is a lack of interpretability, i.e., extraction of key features from data that affect the most the model’s decision. Meanwhile, using of less complicated neural network leads to decreasing predictive performance thus limiting usage of state-of-art machine learning methods in many subject areas. In this work we propose a novel interpretable deep learning architecture based on extraction of principal sets of short substrings — sequence motifs. The presence of extracted motif in the input sequence is a marker for a certain class. The key component of proposed solution is differential alignment algorithm developed by us, which provides a smooth analog of classical string comparison methods such as Levenshtein edit distance, and Smith–Waterman local alignment. Unlike previous works devoted to the motif based classification, which used CNN for shift-invariant searching, ours model provide a way to shift and gap invariant extraction of motifs.

Keywords

sequence classification, machine learning, neural network, motif extraction

About the Author

E. P. Ofitserov

Tula State University
Russian Federation

Ofitserov Evgeny Petrovich — department of applied mathematics and computer science

References

1. Hochreiter, S. & Schmidhuber, J. 1997, “Long short-term memory“, Neural computation, vol. 9, no. 8, pp. 1735–1780.

2. Cho K. et al. 2014, “Learning phrase representations using RNN encoder-decoder for statistical machine translation“, arXiv:1406.1078.

3. Chung J. et al. 2014, “Empirical evaluation of gated recurrent neural networks on sequence modeling“, arXiv:1412.3555.

4. Karpathy, A., Johnson, J. & Fei-Fei, L. 2015, “Visualizing and understanding recurrent networks“, arXiv:1506.02078.

5. Strobelt H. et al. 2018, “Lstmvis: A tool for visual analysis of hidden state dynamics in recurrent neural networks“, IEEE transactions on visualization and computer graphics, vol. 24, no. 1, pp. 667–676.

6. Zeng H. et al. 2016, “Convolutional neural network architectures for predicting DNA–protein binding“, Bioinformatics, vol. 32, no. 12, pp. i121–i127.

7. Zhou, J. & Troyanskaya, O.G. 2015, “Predicting effects of noncoding variants with deep learning–based sequence model“, Nature methods, vol. 12, no. 10, pp. 931.

8. Lanchantin J. et al. 2016, “Deep motif: Visualizing genomic sequence classifications“, arXiv:1605.01133.

9. Quang D. & Xie X. 2016, “DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences“, Nucleic acids research, vol. 44, no. 11, pp. e107– e107.

10. Levenshtein, V.I. 1965, “Binary codes with correction of fallouts, inserts and notes“, Reports of the Academy of Sciences (in Russian), vol. 163, no. 4, pp. 845–848. [in Russian]

11. Smith T.F. & Waterman M.S. 1981, “Comparison of biosequences“, Advances in applied mathematics, vol. 2, no. 4, pp. 482–489.

12. Gotoh, O. 1982, “An improved algorithm for matching biological sequences“, Journal of molecular biology, vol. 162, no. 3, pp. 705–708.

13. Manavski S.A., Valle G. 2008, “CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment“, BMC bioinformatics, vol. 9, no. 2, pp. S10.

14. Ioffe S. & Szegedy C. 2015, “Batch normalization: Accelerating deep network training by reducing internal covariate shift“, arXiv:1502.03167.

15. Hahnloser R.H.R. et al. 2000, Digital selection and analogue amplification coexist in a cortexinspired silicon circuit“, Nature, vol. 405, no. 6789, pp. 947.

Review

For citations:

Ofitserov E.P. Ofitserov Evgeny Petrovich — department of applied mathematics and computer science. Chebyshevskii Sbornik. 2018;19(1):187-199. (In Russ.) https://doi.org/10.22405/2226-8383-2018-19-1-187-199

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 2226-8383 (Print)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Chebyshevskii Sbornik

Ofitserov Evgeny Petrovich — department of applied mathematics and computer science

Full Text:

Abstract

Keywords

About the Author

References

Review

For citations:

Cookies policy