Devi D, Renuka and TA, Swetha Margaret (2024) Streamlining Big Data Mining with Cutting-Edge Approaches. In: Research Updates in Mathematics and Computer Science Vol. 2. B P International, pp. 150-175. ISBN 978-81-971889-7-8
Full text not available from this repository.Abstract
Big data refers to any assortment of data which are outsized and intricate in nature such that conventional database administration systems and data processing tools cannot process. Feature selection serves primarily to reduce the processing burden of data mining models. To expedite the processing of large volumes of data, parallel processing is implemented using the MapReduce (MR) technique. MapReduce model is applied to big datasets, which is further divided into smaller partition. However, existing algorithms often fall short in enhancing classifier performance significantly. This research advocates for the use of the MR method to conduct feature selection in parallel, thereby improving performance. Additionally, to augment classifier efficacy, this study introduces an innovative approach combining Online Feature Selection (OFS) with an Accelerated Bat Algorithm (ABA) within a framework that pre-processes features ahead of time, without prior knowledge of the feature space. The proposed OFS-ABA method is designed to select relevant and non-redundant features within the MapReduce (MR) framework. Furthermore, an Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) classifier is employed to classify dataset samples. The outputs from homogeneous IDMLP classifiers are aggregated using the EIDMPL classifier. The proposed feature selection method and classifier are extensively evaluated across three high-dimensional datasets. The results indicate that the MR-OFS-ABA method outperforms existing feature selection methods such as PSO, APSO, and ASAMO (Accelerated Simulated Annealing and Mutation Operator). Additionally, the performance of the EIDMLP classifier is compared with other existing classifiers including Naïve Bayes (NB), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage (FMCCSC)-KNN (K Nearest Neighbor). The methodology is applied to three datasets, and the results are compared across four classifiers and three state-of-the-art feature selection algorithms. Overall, the findings of this research demonstrate improved accuracy and reduced processing time.
Item Type: | Book Section |
---|---|
Subjects: | Pustakas > Mathematical Science |
Depositing User: | Unnamed user with email support@pustakas.com |
Date Deposited: | 03 Apr 2024 09:36 |
Last Modified: | 03 Apr 2024 09:36 |
URI: | http://archive.pcbmb.org/id/eprint/1933 |