Advertisement

Alignment-Free Z-Curve Genomic Cepstral Coefficients and Machine Learning for Classification of Viruses

  • Emmanuel Adetiba
  • Oludayo O. Olugbara
  • Tunmike B. Taiwo
  • Marion O. Adebiyi
  • Joke A. Badejo
  • Matthew B. Akanle
  • Victor O. Matthews
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10813)

Abstract

Accurate detection of pathogenic viruses has become highly imperative. This is because viral diseases constitute a huge threat to human health and wellbeing on a global scale. However, both traditional and recent techniques for viral detection suffer from various setbacks. In codicil, some of the existing alignment-free methods are also limited with respect to viral detection accuracy. In this paper, we present the development of an alignment-free, digital signal processing based method for pathogenic viral detection named Z-Curve Genomic Cesptral Coefficients (ZCGCC). To evaluate the method, ZCGCC were computed from twenty six pathogenic viral strains extracted from the ViPR corpus. Naïve Bayesian classifier, which is a popular machine learning method was experimentally trained and validated using the extracted ZCGCC and other alignment-free methods in the literature. Comparative results show that the proposed ZCGCC gives good accuracy (93.0385%) and improved performance to existing alignment-free methods.

Keywords

Alignment-free Bayesian Classifier Naïve Pathogenic Virus ViPR ZCGCC 

Notes

Acknowledgement

Funding to present this work at IWBBIO 2018 was provided by the Covenant University Centre for Research, Innovation and Development, Canaanland, Ota, Nigeria.

References

  1. 1.
    Xie, G., Yu, J., Duan, Z.: New strategy for virus discovery: viruses identified in human feces in the last decade. Sci. China Life Sci. 56(8), 688–696 (2013)CrossRefGoogle Scholar
  2. 2.
    Kaushik, A., Tiwari, S., Jayant, R.D., Marty, A., Nair, M.: Towards detection and diagnosis of Ebola virus disease at point-of-care. Biosens. Bioelectron. 75, 254–272 (2016)CrossRefGoogle Scholar
  3. 3.
    Mokili, J.L., Rohwer, F., Dutilh, B.E.: Metagenomics and future perspectives in virus discovery. Curr. Opin. Virol. 2(1), 63–77 (2012)CrossRefGoogle Scholar
  4. 4.
    Mabrouk, M.S.: A study of the potential of EIIP mapping method in exon prediction using the frequency domain techniques. Am. J. Biomed. Eng. 2(2), 17–22 (2012)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Sathish Kumar, S., Duraipandian, N.: An effective identification of species from DNA sequence: a classification technique by integrating DM and ANN. Int. J. Adv. Comput. Sci. Appl. 3(8), 104–114 (2012)Google Scholar
  6. 6.
    Adetiba, E., Olugbara, O.O., Taiwo, T.B.: Identification of pathogenic viruses using genomic cepstral coefficients with radial basis function neural network. In: Pillay, N., Engelbrecht, A.P., Abraham, A., du Plessis, M.C., Snášel, V., Muda, A.K. (eds.) Advances in Nature and Biologically Inspired Computing. AISC, vol. 419, pp. 281–291. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-27400-3_25CrossRefGoogle Scholar
  7. 7.
    Pickett, B.E., Greer, D.S., Zhang, Y.: Virus pathogen database and analysis resource (ViPR): a comprehensive bioinformatics database and analysis resource for the coronavirus research community. Viruses 4, 3209–3226 (2012)CrossRefGoogle Scholar
  8. 8.
    Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R.: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73(16), 5261–5267 (2007)CrossRefGoogle Scholar
  9. 9.
    Li, Y., Tian, K., Yin, C., He, R.L., Yau, S.S.T.: Virus classification in 60-dimensional protein space. Mol. Phylogenet. Evol. 99, 53–62 (2016)CrossRefGoogle Scholar
  10. 10.
    Vinga, S., Almeida, J.: Alignment-free sequence comparison-a review. Bioinformatics 19, 513–523 (2003).  https://doi.org/10.1093/bioinformatics/btg005CrossRefGoogle Scholar
  11. 11.
    Kantorovitz, M.R., Robinson, G.E., Sinha, S.: A statistical method for alignment-free comparison of regulatory sequences. Bioinformatics 23(13), i249–i255 (2007)CrossRefGoogle Scholar
  12. 12.
    Dai, Q., Yang, Y., Wang, T.: Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison. Bioinformatics 24(20), 2296–2302 (2008)CrossRefGoogle Scholar
  13. 13.
    Sims, G.E., Jun, S.R., Wu, G.A., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proc. Natl. Acad. Sci. 106(8), 2677–2682 (2009)CrossRefGoogle Scholar
  14. 14.
    Deng, M., Yu, C., Liang, Q., He, R.L., Yau, S.S.T.: A novel method of characterizing genetic sequences: genome space with biological distance and applications. PLoS One 6(3), e17293 (2011)CrossRefGoogle Scholar
  15. 15.
    Yu, C., Liang, Q., Yin, C., He, R.L., Yau, S.S.T.: A novel construction of genome space with biological geometry. DNA Res. 17, 155–168 (2010)CrossRefGoogle Scholar
  16. 16.
    Yu, C., Hernandez, T., Zheng, H., Yau, S.C., Huang, H.H., He, R.L., Yau, S.S.T.: Real time classification of viruses in 12 dimensions. PLoS One 8(5), e64328 (2013)CrossRefGoogle Scholar
  17. 17.
    Huang, H.H., Yu, C., Zheng, H., Hernandez, T., Yau, S.C., He, R.L., Yau, S.S.T.: Global comparison of multiple-segmented viruses in 12-dimensional genome space. Mol. Phylogenet. Evol. 81, 29–36 (2014)CrossRefGoogle Scholar
  18. 18.
    Anastassiou, D.: DSP in genomics: processing and frequency-domain analysis of character strings. In: Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2001), vol. 2, pp. 1053–1056. IEEE (2001)Google Scholar
  19. 19.
    Bai Arniker, S., Kwan, H.K.: Advanced numerical representation of DNA sequences. In: International Conference on Bioscience, Biochemistry and Bioinformatics IPCBEE, vol. 3, p. 1 (2012)Google Scholar
  20. 20.
    Guo, F.B., Lin, Y., Chen, L.L.: Recognition of protein-coding genes based on Z-curve algorithms. Curr. Genomics 15(2), 95–103 (2014)CrossRefGoogle Scholar
  21. 21.
    Zhang, R., Zhang, C.T.: A brief review: the z-curve theory and its application in genome analysis. Curr. Genomics 15(2), 78–94 (2014)CrossRefGoogle Scholar
  22. 22.
    Cornish-Bowden, A.: Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984. Nucleic Acids Res. 13(9), 3021 (1985)CrossRefGoogle Scholar
  23. 23.
    Randall, R.B.: A history of cepstrum analysis and its application to mechanical problems. In: International Conference at Institute of Technology of Chartres, France, pp. 11–16 (2013)Google Scholar
  24. 24.
    Thakur, S., Adetiba, E., Olugbara, O.O., Millham, R.: Experimentation using short-term spectral features for secure mobile internet voting authentication. Math. Probl. Eng. (2015)Google Scholar
  25. 25.
    Sakshat Virtual Labs: Cepstral Analysis of Speech (2011). iitg.vlab.co.in/?sub=59&brch=164&sim=615&cnt=1. Accessed 28 July 2016
  26. 26.
    Adetiba, E., Badejo, J.A., Thakur, S., Matthews, V.O., Adebiyi, M.O., Adebiyi, E.F.: Experimental investigation of frequency chaos game representation for in silico and accurate classification of viral pathogens from genomic sequences. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017. LNCS, vol. 10208, pp. 155–164. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-56148-6_13CrossRefGoogle Scholar
  27. 27.
    Vijayan, K., Nair, V.V., Gopinath, D.P.: Classification of organisms using frequency-chaos game representation of genomic sequences and ANN. In: 10th National Conference on Technological Trends (NCTT 2009), pp. 6–7 (2009)Google Scholar
  28. 28.
    Shao, J., Yan, X., Shao, S.: SNR of DNA sequences mapped by general affine transformations of the indicator sequences. J. Math. Biol. 67(2), 433–451 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Adetiba, E., Olugbara, O.O.: Improved classification of lung cancer using radial basis function neural network with affine transforms of Voss representation. PLoS One 10(12), e0143542 (2015)CrossRefGoogle Scholar
  30. 30.
    Mathworks, Classification Naive Bayes class. http://www.mathworks.com/help/stats/classificationnaivebayes-class.html. Accessed 28 July 2016

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Electrical and Information Engineering, College of EngineeringCovenant UniversityOtaNigeria
  2. 2.HRA, Institute for Systems ScienceDurban University of TechnologyDurbanSouth Africa
  3. 3.ICT and Society Research GroupDurban University of TechnologyDurbanSouth Africa
  4. 4.Department of Computer and Information Science, College of Science and TechnologyCovenant UniversityOtaNigeria

Personalised recommendations