k-Means Walk: Unveiling Operational Mechanism of a Popular Clustering Approach for Microarray Data

Osamor, V. C. and Adebiyi, E. F. and Enekwa, E. H. (2013) k-Means Walk: Unveiling Operational Mechanism of a Popular Clustering Approach for Microarray Data. Journal of Computer Science and System Biology, 6 (1). pp. 35-42.

PDF
Download (2MB)

Abstract

Since data analysis using technical computational model has profound influence on interpretation of the final results, basic understanding of the underlying model surrounding such computational tools is required for optimal experimental design by target users of such tools. Despite wide variation of techniques associated with clustering, cluster analysis has become a generic name in bioinformatics, and is seen to discover the natural grouping(s) of a set of patterns, points or sequences. The aim of this paper is to analyze k-means by applying a step-by-step k-means walk approach using graphic-guided analysis, to provide clear understanding of the operational mechanism of the k-means algorithm. Scattered graph was created using theoretical microarray gene expression data, which is a simplified view of a typical microarray experiment data. We designate the centroid as the first three initial data points and applied Euclidean distance metrics in the k-means algorithm, leading to assignment of these three data points as reference point to each cluster formation. A test is conducted to determine if there is a shift in centroid, before the next iteration is attained. We were able to trace out those data points in same cluster after convergence. We observed that, as both the dimension of data and gene list increases for hybridization matrix of microarray data, computational implementation of k-means algorithm becomes more rigorous. Furthermore, the understanding of this approach will stimulate new ideas for further development and improvement of the k-means clustering algorithm, especially within the confines of the biology of diseases and beyond. However, the major advantage will be to give improved cluster output for the interpretation of microarray experimental results, facilitate better understanding for bioinformaticians and algorithm experts, to tweak k-means algorithm for improved run-time of clustering.

Item Type:	Article
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:	Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User:	Mr Adewole Adewumi
Date Deposited:	26 Mar 2015 20:38
Last Modified:	26 Mar 2015 20:38
URI:	http://eprints.covenantuniversity.edu.ng/id/eprint/4336

Actions (login required)

View Item