Chadha, Anupama (2022) An Efficient K-means Algorithm: Generating Clusters Dynamically in MapReduce Framework. In: Novel Research Aspects in Mathematical and Computer Science Vol. 3. B P International, pp. 54-66. ISBN 978-93-5547-722-4
Full text not available from this repository.Abstract
Background: K-Means is a widely used partition based clustering algorithm which organizes input dataset into predefined number of clusters. Simplicity and speed in clustering of massive data are two features which have made K-Means a very popular algorithm. The generation of huge amount of electronic data has resulted in modifications in data clustering algorithms to process the huge data. The performance of the K-Means can further be enhanced if we use distributed computing environment to deal with the big data. MapReduce paradigm can be used with the K-Means to give it a distributed computing environment and make it more efficient in terms of time. K-Means has a major limitation -- the number of clusters, ‘K’, need to be pre-specified as an input to the algorithm. In absence of thorough domain knowledge, or for a new and unknown dataset, this advance estimation and specification of cluster number typically leads to “forced” clustering of data, and proper clustering does not emerge.
Method: In this paper, we introduce a new algorithm based on the K-Means that takes only the numerical dataset as an input and generates appropriate number of clusters on the run using MapReduce programming style.
Findings: The new algorithm not only overcomes the limitation of providing the value of K initially but also reduces the computation time using MapReduce framework.
Item Type: | Book Section |
---|---|
Subjects: | Pustakas > Computer Science |
Depositing User: | Unnamed user with email support@pustakas.com |
Date Deposited: | 11 Oct 2023 05:42 |
Last Modified: | 11 Oct 2023 05:42 |
URI: | http://archive.pcbmb.org/id/eprint/1092 |