Alignment-Free Z-Curve Genomic Cepstral Coefficients and Machine Learning for Classification of Viruses
Abstract
Accurate detection of pathogenic viruses has become highly imperative. This is because viral diseases constitute a huge threat to human health and wellbeing on a global scale. However, both traditional and recent techniques for viral detection suffer from various setbacks. In codicil, some of the existing alignment-free methods are also limited with respect to viral detection accuracy. In this paper, we present the development of an alignment-free, digital signal processing based method for pathogenic viral detection named Z-Curve Genomic Cesptral Coefficients (ZCGCC). To evaluate the method, ZCGCC were computed from twenty six pathogenic viral strains extracted from the ViPR corpus. Naïve Bayesian classifier, which is a popular machine learning method was experimentally trained and validated using the extracted ZCGCC and other alignment-free methods in the literature. Comparative results show that the proposed ZCGCC gives good accuracy (93.0385%) and improved performance to existing alignment-free methods.
Keywords
Alignment-free Bayesian Classifier Naïve Pathogenic Virus ViPR ZCGCCNotes
Acknowledgement
Funding to present this work at IWBBIO 2018 was provided by the Covenant University Centre for Research, Innovation and Development, Canaanland, Ota, Nigeria.
References
- 1.Xie, G., Yu, J., Duan, Z.: New strategy for virus discovery: viruses identified in human feces in the last decade. Sci. China Life Sci. 56(8), 688–696 (2013)CrossRefGoogle Scholar
- 2.Kaushik, A., Tiwari, S., Jayant, R.D., Marty, A., Nair, M.: Towards detection and diagnosis of Ebola virus disease at point-of-care. Biosens. Bioelectron. 75, 254–272 (2016)CrossRefGoogle Scholar
- 3.Mokili, J.L., Rohwer, F., Dutilh, B.E.: Metagenomics and future perspectives in virus discovery. Curr. Opin. Virol. 2(1), 63–77 (2012)CrossRefGoogle Scholar
- 4.Mabrouk, M.S.: A study of the potential of EIIP mapping method in exon prediction using the frequency domain techniques. Am. J. Biomed. Eng. 2(2), 17–22 (2012)MathSciNetCrossRefGoogle Scholar
- 5.Sathish Kumar, S., Duraipandian, N.: An effective identification of species from DNA sequence: a classification technique by integrating DM and ANN. Int. J. Adv. Comput. Sci. Appl. 3(8), 104–114 (2012)Google Scholar
- 6.Adetiba, E., Olugbara, O.O., Taiwo, T.B.: Identification of pathogenic viruses using genomic cepstral coefficients with radial basis function neural network. In: Pillay, N., Engelbrecht, A.P., Abraham, A., du Plessis, M.C., Snášel, V., Muda, A.K. (eds.) Advances in Nature and Biologically Inspired Computing. AISC, vol. 419, pp. 281–291. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-27400-3_25CrossRefGoogle Scholar
- 7.Pickett, B.E., Greer, D.S., Zhang, Y.: Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Viruses 4, 3209–3226 (2012)CrossRefGoogle Scholar
- 8.Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R.: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73(16), 5261–5267 (2007)CrossRefGoogle Scholar
- 9.Li, Y., Tian, K., Yin, C., He, R.L., Yau, S.S.T.: Virus classification in 60-dimensional protein space. Mol. Phylogenet. Evol. 99, 53–62 (2016)CrossRefGoogle Scholar
- 10.Vinga, S., Almeida, J.: Alignment-free sequence comparison-a review. Bioinformatics 19, 513–523 (2003). https://doi.org/10.1093/bioinformatics/btg005CrossRefGoogle Scholar
- 11.Kantorovitz, M.R., Robinson, G.E., Sinha, S.: A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics 23(13), i249–i255 (2007)CrossRefGoogle Scholar
- 12.Dai, Q., Yang, Y., Wang, T.: Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison. Bioinformatics 24(20), 2296–2302 (2008)CrossRefGoogle Scholar
- 13.Sims, G.E., Jun, S.R., Wu, G.A., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. 106(8), 2677–2682 (2009)CrossRefGoogle Scholar
- 14.Deng, M., Yu, C., Liang, Q., He, R.L., Yau, S.S.T.: A novel method of characterizing genetic sequences: genome space with biological distance and applications. PLoS One 6(3), e17293 (2011)CrossRefGoogle Scholar
- 15.Yu, C., Liang, Q., Yin, C., He, R.L., Yau, S.S.T.: A novel construction of genome space with biological geometry. DNA Res. 17, 155–168 (2010)CrossRefGoogle Scholar
- 16.Yu, C., Hernandez, T., Zheng, H., Yau, S.C., Huang, H.H., He, R.L., Yau, S.S.T.: Real time classification of viruses in 12 dimensions. PLoS One 8(5), e64328 (2013)CrossRefGoogle Scholar
- 17.Huang, H.H., Yu, C., Zheng, H., Hernandez, T., Yau, S.C., He, R.L., Yau, S.S.T.: Global comparison of multiple-segmented viruses in 12-dimensional genome space. Mol. Phylogenet. Evol. 81, 29–36 (2014)CrossRefGoogle Scholar
- 18.Anastassiou, D.: DSP in genomics: processing and frequency-domain analysis of character strings. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2001), vol. 2, pp. 1053–1056. IEEE (2001)Google Scholar
- 19.Bai Arniker, S., Kwan, H.K.: Advanced numerical representation of DNA sequences. In: International Conference on Bioscience, Biochemistry and Bioinformatics IPCBEE, vol. 3, p. 1 (2012)Google Scholar
- 20.Guo, F.B., Lin, Y., Chen, L.L.: Recognition of protein-coding genes based on Z-curve algorithms. Curr. Genomics 15(2), 95–103 (2014)CrossRefGoogle Scholar
- 21.Zhang, R., Zhang, C.T.: A brief review: the z-curve theory and its application in genome analysis. Curr. Genomics 15(2), 78–94 (2014)CrossRefGoogle Scholar
- 22.Cornish-Bowden, A.: Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13(9), 3021 (1985)CrossRefGoogle Scholar
- 23.Randall, R.B.: A history of cepstrum analysis and its application to mechanical problems. In: International Conference at Institute of Technology of Chartres, France, pp. 11–16 (2013)Google Scholar
- 24.Thakur, S., Adetiba, E., Olugbara, O.O., Millham, R.: Experimentation using short-term spectral features for secure mobile internet voting authentication. Math. Probl. Eng. (2015)Google Scholar
- 25.Sakshat Virtual Labs: Cepstral Analysis of Speech (2011). iitg.vlab.co.in/?sub=59&brch=164&sim=615&cnt=1. Accessed 28 July 2016
- 26.Adetiba, E., Badejo, J.A., Thakur, S., Matthews, V.O., Adebiyi, M.O., Adebiyi, E.F.: Experimental investigation of frequency chaos game representation for in silico and accurate classification of viral pathogens from genomic sequences. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017. LNCS, vol. 10208, pp. 155–164. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56148-6_13CrossRefGoogle Scholar
- 27.Vijayan, K., Nair, V.V., Gopinath, D.P.: Classification of organisms using frequency-chaos game representation of genomic sequences and ANN. In: 10th National Conference on Technological Trends (NCTT 2009), pp. 6–7 (2009)Google Scholar
- 28.Shao, J., Yan, X., Shao, S.: SNR of DNA sequences mapped by general affine transformations of the indicator sequences. J. Math. Biol. 67(2), 433–451 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
- 29.Adetiba, E., Olugbara, O.O.: Improved classification of lung cancer using radial basis function neural network with affine transforms of Voss representation. PLoS One 10(12), e0143542 (2015)CrossRefGoogle Scholar
- 30.Mathworks, Classification Naive Bayes class. http://www.mathworks.com/help/stats/classificationnaivebayes-class.html. Accessed 28 July 2016