Fast and Robust Majority Voting Algorithm for Big Data Regression Model

Authors

  • Hassan S. Uraibi

DOI:

https://doi.org/10.55562/jrucs.v54i1.616

Keywords:

Big data, Divide and conquer, Robust variable selection, Majority Vote

Abstract

The majority voting approach is one of divide and conquers technologies that have the capacity to analyzing, understanding, and then decision making of big data. Big regression data is massive not only in terms of volume where P and n tend to infinity, but in terms of intensity, and complexity too. For instance, the collection of massive data from different subpopulations results in a heterogeneity problem that definitely leads to the presence of outliers. Moreover, tackling such a volume of data exceeds the capacity of standard analytic tools. therefore, the processing requires the proposed algorithm to have two main properties, rapid and accurate. Unfortunately, the majority voting approach is not reliable where outliers are present in the data. Furthermore, the curse of dimensionality causes the multicollinearity problem which yields misleading results This paper proposes a fast and robust majority voting approach that is a new version of the original one with some steps. The first step is to vertically divide the design matrix into a number of blocks to conquer the curse of ultra-dimensionality. Voting on choosing the best subset of variables should be considered by using a robust variable selection method as a dimensional reduction procedure and resistant to the presence of outliers. The second step is to aggregate all best subsets in one linear regression model, and then using majority vote algorithm to get the best variables e. A simulation study has done in this paper to know the performance of the proposed technique is compared with Big-lasso which is well-known as the fastest method in the statistical literature right now. The result shows outperforming of the proposed algorithm which is faster than Big-lasso and more accurate than it.

Downloads

Download data is not yet available.

Downloads

Published

2024-01-14

How to Cite

Fast and Robust Majority Voting Algorithm for Big Data Regression Model. (2024). Journal of Al-Rafidain University College For Sciences ( Print ISSN: 1681-6870 ,Online ISSN: 2790-2293 ), 54(1), 490-497. https://doi.org/10.55562/jrucs.v54i1.616