University Links: Home Page | Site Map
Covenant University Repository

DEVELOPMENT OF A COMPUTATIONAL PIPELINE FOR NEXT-GENERATION SEQUENCING DATA ANALYSES USING NEXTFLOW AND DOCKER

OWOLABI, PAUL JESUSANMI and Covenant University, Theses (2023) DEVELOPMENT OF A COMPUTATIONAL PIPELINE FOR NEXT-GENERATION SEQUENCING DATA ANALYSES USING NEXTFLOW AND DOCKER. Masters thesis, COVENANT UNIVERSITY.

[img] PDF
Download (247kB)

Abstract

Major advances in genomics studies, particularly the introduction of high-throughput sequencing and the evolution of genotyping platforms have led to the emergence of big data in the biological sciences and a growing need to make sense of this data. This has largely fostered the evolution of methods and tools for genomic data analysis (especially of diseased conditions) with the aim of uncovering the genotype-phenotype relationships in such diseased conditions. Due to the growing complexity and volume of next-generation sequencing data available in biological sciences, there is a growing need to developed pipelines that can handle these data while automating most of the steps involved in these analyses. The aim of this study is to develop a computational pipeline for the analysis of next-generation sequencing data using Nextflow and Docker. Since different steps and tools are involved in the analysis of whole genome and whole exome sequencing data, the aim of the study was achieved by developing scripts for selected genome analysis tools, building a computational pipeline for the selected tools and performing unit and integration testing for the pipeline. The pipeline which was built on the framework of the well-established GATK best-practices workflow, integrated the following tools: FastQC, MultiQC, Jellyfish, genomeScope2.0, BWA, GATK and SnpEff. These tools were involved in performing the different steps of the NGS analyses which included quality control check, genome size heterozygosity, alignment or mapping, variant calling and annotation. Nextflow was employed in this pipeline as a workflow management system and Docker was used for containerising all the tools and their software dependencies. The developed pipeline was then tested to verify its utility in NGS data analysis. Pipeline development is very important in genomics research because, it could help improve the quality and reliability of research outcomes and facilitate the sharing and comparison of data across different studies and research groups. Having a pipeline that can effectively be used in quick and simple analysis of genomes will significantly help in uncovering biologically meaningful or clinically significant variants. It is expected that the outcome of this study will significantly impact studies into the genetic basis of human diseases and precision medicine.

Item Type: Thesis (Masters)
Uncontrolled Keywords: Next-generation Sequencing, Genomic analysis, Variant calling, Nextflow, Docker, Pipeline
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QH Natural history > QH301 Biology
Divisions: UNSPECIFIED
Depositing User: AKINWUMI
Date Deposited: 03 Oct 2023 11:11
Last Modified: 03 Oct 2023 11:11
URI: http://eprints.covenantuniversity.edu.ng/id/eprint/17324

Actions (login required)

View Item View Item