University Links: Home Page | Site Map
Covenant University Repository

OsamorSoft: clustering index for comparison and quality validation in high throughput dataset

Osamor, I. P. and Osamor, V. C. (2020) OsamorSoft: clustering index for comparison and quality validation in high throughput dataset. Journal of Big Data, 7.

[img] PDF
Download (229kB)


The existence of some differences in the results obtained from varying clustering k-means algorithms necessitated the need for a simplified approach in validation of cluster quality obtained. This is partly because of differences in the way the algorithms select their first seed or centroid either randomly, sequentially or some other principles influences which tend to influence the final result outcome. Popular external cluster quality validation and comparison models require the computation of varying clustering indexes such as Rand, Jaccard, Fowlkes and Mallows, Morey and Agresti Adjusted Rand Index (ARIMA) and Hubert and Arabie Adjusted Rand Index (ARIHA). In literature, Hubert and Arabie Adjusted Rand Index (ARIHA) has been adjudged as a good measure of cluster validity. Based on ARIHA as a popular clustering quality index, we developed OsamorSoft which constitutes DNA_Omatrix and OsamorSpreadSheet as a tool for cluster quality validation in high throughput analysis. The proposed method will help to bridge the yawning gap created by lesser number of friendly tools available to externally evaluate the ever-increasing number of clustering algorithms. Our implementation was tested alongside with clusters created with four k-means algorithms using malaria microarray data. Furthermore, our results evolved a compact 4-stage OsamorSpreadSheet statistics that our easy-to-use GUI java and spreadsheet-based tool of OsamorSoft uses for cluster quality comparison. It is recommended that a framework be evolved to facilitate the simplified integration and automation of several other cluster validity indexes for comparative analysis of big data problems.

Item Type: Article
Uncontrolled Keywords: Clustering index  Algorithms  OsamorSoft  Validation  Rand  Automation
Subjects: Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User: Mrs Patricia Nwokealisi
Date Deposited: 04 Apr 2022 12:48
Last Modified: 04 Apr 2022 12:48

Actions (login required)

View Item View Item