NDIFON, NAOMI SIJE-OKIM and Covenant University, Theses Masters (2024) DEVELOPMENT OF A COMPUTATIONAL PIPELINE FOR THE IDENTIFICATION OF NON-CODING RNAs FROM NEXT GENERATION SEQUENCING DATA. Masters thesis, Covenant University.
PDF
Download (194kB) |
Abstract
Recent advances in genomics have revealed the critical roles that non-coding RNAs play in disease occurrence, progression, and population disparities in patient treatment outcomes. With the evolution of Next Generation Sequencing (NGS) techniques and the generation of genomic big data, the ability of researchers to further explore the functions of these non-coding RNAs has become more widely accessible. However, efficient exploration requires user-friendly computational tools that can streamline and centralize data analysis, particularly for identifying non-coding RNAs within large volumes of NGS data. Current computational pipelines for non-coding RNA identification are often limited to detecting only a single class of non-coding RNA and do not integrate the latest standalone tools. Consequently, these pipelines are not workflow efficient as they restrict the comprehensive analysis of diverse non-coding RNA classes within a single framework. The aim of this study is to develop a computational pipeline for identifying multiple classes of non-coding RNAs namely micro RNAs, long non-coding RNAs and circular RNAs from NGS data. This aim was achieved by developing scripts for the selected software tools integrated into the pipeline and incorporating these scripts as individual processes within a unified Nextflow script. The software tools integrated into the pipeline include; miRDeep2, mirnovo and sRNAtoolbox for the identification of miRNAs; CIRI and KNIFE for the identification of circRNAs; PLEK and LncDC for the identification of lncRNAs. Nextflow was used as the scientific workflow management system and Docker was used for containerizing all the integrated tools and their software dependencies for easy use and reproducibility across different computing environments. The pipeline was then evaluated using test data provided by each of the individual software tools and it successfully identified all the reported miRNAs, lncRNAs and circRNAs, thus proving its effectiveness. Beyond the reduced execution time, the pipeline offers a more efficient solution by streamlining the analysis of noncoding RNAs and eliminating the need for separate software installation and environment setup, thereby reducing the user's workload.
Item Type: | Thesis (Masters) |
---|---|
Uncontrolled Keywords: | Next Generation Sequencing, Non-coding RNA, Nextflow, Docker, Computational Pipeline, micro RNAs, long non-coding RNAs, circular RNAs |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science |
Depositing User: | nwokealisi |
Date Deposited: | 24 Sep 2024 11:26 |
Last Modified: | 24 Sep 2024 11:26 |
URI: | http://eprints.covenantuniversity.edu.ng/id/eprint/18437 |
Actions (login required)
View Item |