Vis enkel innførsel

dc.contributor.authorGøytil, Thomas Hansen
dc.contributor.authorLundberg, Andreas
dc.date.accessioned2010-11-30T14:17:43Z
dc.date.available2010-11-30T14:17:43Z
dc.date.issued2010
dc.identifier.urihttp://hdl.handle.net/11250/137494
dc.descriptionMasteroppgave i informasjons- og kommunikasjonsteknologi 2010 – Universitetet i Agder, Grimstaden_US
dc.description.abstractAchieving efficient use of available resources is an important problem in the field of web mining. Monitoring and analyzing the web is extremely resource demanding, and therefore, more efficient use of resources often translates directly into improved web monitoring coverage and accuracy. One important sub problem is to reduce the memory consumption of the URL cache in a web crawler system. Utilizing the space efficient data structure Bloom filter as URL cache, will reduce the memory consumption. However, the Bloom filter introduces false positives, leading to loss of valuable web content when the filter are utilized as a URL cache in a web crawler system. Based on the latter problems of false positives, this thesis propose three novel strategies, namely a temporal, a spatial and a spatio-temporal strategy, each aiming to reduce the false positive rate introduced by the Bloom filter. During testing and evaluation of the strategies, we discovered both the spatial and temporal strategy is able to reduce the false positive in the Bloom filter. The two former strategies was then combined to test if it is possible to further decrease the false positive probability. Testing and evaluation of the combined strategies shows that it does yield a reduction in the false positive probability.en_US
dc.language.isoengen_US
dc.publisherUniversity of Agderen_US
dc.titleA novel spatio-temporal scheme for reducing the rate of false positives in bloom filter based URL-cachingen_US
dc.typeMaster thesisen_US
dc.source.pagenumber108en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel