Time estimation for large scale of data processing in Hadoop MapReduce scenario

2011

The appearance of MapReduce technology gave rise to a strong blast in IT industry. Large

companies such as Google, Yahoo and Facebook are using this technology to facilitate

their data processing[12]. As a representative technology aimed at processing large dataset

in parallel, MapReduce received great focus from various organizations. Handling large

problems, using a large amount of resources is inevitable. Therefore, how to organize them

effectively becomes an important problem. It is a common strategy to obtain some learning

experience before deploying large scale of experiments. Following this idea, this mater

thesis aims at providing some learning models towards MapReduce. These models help us

to accumulate learning experience from small scale of experiments and finally lead us to

estimate execution time of large scales of experiment.

Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2011 – Universitetet i Agder, Grimstad

Universitetet i Agder / University of Agder