Load-balancing by applying a bayesian learning automata (BLA) scheme in a non-stationary web-crawler network
Master thesis
Permanent lenke
http://hdl.handle.net/11250/137506Utgivelsesdato
2010Metadata
Vis full innførselSamlinger
Sammendrag
Distributed Web-Crawlers, i.e. , Web-Crawler Networks, have been known to retrieve
massive amount of web-data to centralized search indexes the last decade. Companies
like Google, Yahoo and lately Integrasco, have built their business model upon this retrieval.
However, to load-balance Web-Crawler Networks have been proved difficult. Especially
difficult are the geographical distributed Web-Crawler Networks that newly have
emerged. In geographically distributed Web-Crawler Networks, the load-balancing algorithm
need to consider that the capacity is constantly changing (non-stationary), and
location-aware retrieval. The novel approach in this thesis, is to apply the machinelearning
technique BLA-Kalman to dynamically load-balance Web-Crawler Networks.
We apply the technique by combining the domain of load-balancing, with machine learning
concepts and Web-Crawler Networks. A prototype algorithm named KALMANBLAWLB
is designed, and tested in a simulated environment. We measure; how fair the
KALMAN-BLAWLB is able to load-balance, system utilization and scalability. KALMANBLAWLB
outperform all the algorithms that we are able to test in the simulated environment.
Finally we conclude that KALMAN-BLAWLB is able to fairly load-balance,
achieve a decent system utilization and is scalable, but further tests are needed to confirm large-scale usage.
Beskrivelse
Masteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, Grimstad