An Efficient K-means Algorithm: Generating Clusters Dynamically in MapReduce Framework

Chadha, Anupama (2022) An Efficient K-means Algorithm: Generating Clusters Dynamically in MapReduce Framework. In: Novel Research Aspects in Mathematical and Computer Science Vol. 3. B P International, pp. 54-66. ISBN 978-93-5547-722-4

Full text not available from this repository.

Abstract

Background: K-Means is a widely used partition based clustering algorithm which organizes input dataset into predefined number of clusters. Simplicity and speed in clustering of massive data are two features which have made K-Means a very popular algorithm. The generation of huge amount of electronic data has resulted in modifications in data clustering algorithms to process the huge data. The performance of the K-Means can further be enhanced if we use distributed computing environment to deal with the big data. MapReduce paradigm can be used with the K-Means to give it a distributed computing environment and make it more efficient in terms of time. K-Means has a major limitation -- the number of clusters, ‘K’, need to be pre-specified as an input to the algorithm. In absence of thorough domain knowledge, or for a new and unknown dataset, this advance estimation and specification of cluster number typically leads to “forced” clustering of data, and proper clustering does not emerge.

Method: In this paper, we introduce a new algorithm based on the K-Means that takes only the numerical dataset as an input and generates appropriate number of clusters on the run using MapReduce programming style.

Findings: The new algorithm not only overcomes the limitation of providing the value of K initially but also reduces the computation time using MapReduce framework.

Item Type: Book Section
Subjects: Pustakas > Computer Science
Depositing User: Unnamed user with email support@pustakas.com
Date Deposited: 11 Oct 2023 05:42
Last Modified: 11 Oct 2023 05:42
URI: http://archive.pcbmb.org/id/eprint/1092

Actions (login required)

View Item
View Item