University Links: Home Page | Site Map
Covenant University Repository

An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction

Emmanuel, Jerry and Isewon, Itunuoluwa and Olasehinde, G. I and Oyelade, O. J. (2023) An Extended Feature Representation Technique for Predicting Sequenced-based Host-pathogen Protein-protein Interaction. Current Bioinformatics, xx (x).

[img] PDF
Download (2MB)

Abstract

Background: The use of machine learning models in sequence-based Protein-Protein Interaction prediction typically requires the conversion of amino acid sequences into feature vectors. From the literature, two approaches have been used to achieve this transformation. These are referred to as the Independent Protein Feature (IPF) and Merged Protein Feature (MPF) extraction methods. As observed, studies have predominantly adopted the IPF approach, while others preferred the MPF method, in which host and pathogen sequences are concatenated before feature encoding. Objective: This presents the challenge of determining which approach should be adopted for improved HPPPI prediction. Therefore, this work introduces the Extended Protein Feature (EPF) method. Methods: The proposed method combines the predictive capabilities of IPF and MPF, extracting essential features, handling multicollinearity, and removing features with zero importance. EPF, IPF, and MPF were tested using bacteria, parasite, virus, and plant HPPPI datasets and were deployed to machine learning models, including Random Forest (RF), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Naïve Bayes (NB), Logistic Regression (LR), and Deep Forest (DF). Results: The results indicated that MPF exhibited the lowest performance overall, whereas IPF performed better with decision tree-based models, such as RF and DF. In contrast, EPF demonstrated improved performance with SVM, LR, NB, and MLP and also yielded competitive results with DF and RF. Conclusion: In conclusion, the EPF approach developed in this study exhibits substantial improvements in four out of the six models evaluated. This suggests that EPF offers competitiveness with IPF and is particularly well-suited for traditional machine learning models.

Item Type: Article
Uncontrolled Keywords: Protein-Protein Interaction; Feature Representation; Host-Pathogen Interaction; Machine Learning; Protein Sequence; Feature Vectors
Subjects: Q Science > QA Mathematics > QA76 Computer software
Q Science > QH Natural history
Q Science > QH Natural history > QH301 Biology
Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User: Patricia Nwokealisi
Date Deposited: 31 Jul 2024 12:32
Last Modified: 31 Jul 2024 12:32
URI: http://eprints.covenantuniversity.edu.ng/id/eprint/18342

Actions (login required)

View Item View Item