IJACEN An Improved Partitioning Mechanism For Optimizing Massive Data Analysis Using Map Reduce

Journal Paper

Paper Title :An Improved Partitioning Mechanism For Optimizing Massive Data Analysis Using Map Reduce

Author :Bhavana Bangar, Priya Bhakre, Jyoti Dangde, Megha Ghodake

Article Citation :Bhavana Bangar ,Priya Bhakre ,Jyoti Dangde ,Megha Ghodake , (2015 ) " An Improved Partitioning Mechanism For Optimizing Massive Data Analysis Using Map Reduce " , International Journal of Advance Computational Engineering and Networking (IJACEN) , pp. 134-137, Volume-3,Issue-12

Abstract : Big data is a popular term for the data sets which are very large and complex to handle.The traditional databases can not be used for processing the data which may be structured or unstructured. Using big data, many companies and users started to move their data towards cloud storage so as to simplify data management and reduce data maintainance cost. In most companies the size of data is too big or it moves too fast and it exceeds current processing capacity. Other than these problems, big data has the ability to help companies improve operations and make faster and more intelligent decisions. MapReduce is a programming model which is an associated implementation for processing and generating for large data sets with the help of algorithm of a parallel and distributed on a cluster. The MapReduce model has two part first part of MapReduce is ”Map,” and second part is Reduce. In MapReduce Map function allows different points of the distributed cluster to a distribute their work and Reduce is designed to reduce the final form of the clusters results into one output. The problems of unbalanced load which is generated from data skew(i.e data is generated in invariant capacity) can be avoided by using data sampling. Data sampling is a statistical analysis technique. It is used to analyze ,manipulate and select a representative subset of data points in order to identify patterns and trends in the larger data set being examined. Load balancing is used to optimize resource use, maximize throughput, minimize response time, and avoid unbalncing load of any single resource. The partitioning mechanism analyze how evenly the practitioner distributes the data depends on how large and represent the sample is and on how well the samples.This project proposes an improved partitioning algorithm that overcome the unbalancing load,memory consumption and improve partitioning mechanism. Index Terms— Big Data, Hadoop, HDFS, MapReduce, Data sampling, Partitioning.

Type : Research paper

Published : Volume-3,Issue-12


	\|		PDF	\|	Viewed - 87	\|	Published on 2015-12-23

Apr. 2024
Submitted Papers	:	80
Accepted Papers	:	10
Rejected Papers	:	70
Acc. Perc	:	12%
Issue Published	:	133
Paper Published	:	1552
No. of Authors	:	4025

Published : Volume-3,Issue-12

JOURNAL SUPPORTED BY