Vis enkel innførsel

dc.contributor.authorMyrer, Svein Arild
dc.contributor.authorOlsen, Morten Goodwin
dc.date.accessioned2007-06-28T12:19:06Z
dc.date.issued2005
dc.identifier.urihttp://hdl.handle.net/11250/137178
dc.descriptionMasteroppgave i informasjons- og kommunikasjonsteknologi 2005 - Høgskolen i Agder, Grimstaden
dc.description.abstractThere is no doubt that the World Wide Web has lived up to it’s hype of being the world’s central information highway through the past years. An increasing amount of versatile services keeps finding their way onto the Web as information providers continue to embrace the possibilities that the Web can offer. Especially the possibility of producing dynamic content has been an accelerant factor and is the reason why we now conveniently can participate in online auctions or see the latest development of our favorite stocks in near real-time from our own living rooms. However, for automated data mining applications that deploy crawlers to continuously capture the information provided by this new breed of services, the highly dynamic nature of the content is not convenient at all. As a matter of fact, a complete new set of challenges emerges where traditional crawling strategies are shown to be sub-optimal. Accordingly a new class of methods for crawling operations are clearly needed. Nonetheless, the problem area has so far been given limited attention in literature. In this thesis we address the new problem area of monitoring highly dynamic data sources of different importance. We use the concept of an incremental web crawler as a basis for our novel approach where we consider the incremental crawling task as a continuous learning problem where scheduling of monitoring tasks is combined with parameter estimation in an on-line manner. By mapping the problem to two variants of the so called knapsack problem we propose two solutions based on a machine learning technique known as learning automata. We show empirically that our proposed solutions continuously improve their performance through a learning process and that they are capable of operating in non-stationary environments. We also show their performance in comparison to alternative algorithms where, most notably, our schemes are shown to outdo the traditional uniform crawling scheme by factors up to 550% in certain situations.en
dc.format.extent1074894 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoengen
dc.publisherHøgskolen i Agder
dc.publisherAgder University College
dc.subject.classificationIKT590
dc.titleIncremental web crawling as a competitive game of learning automataen
dc.typeMaster thesisen
dc.subject.nsiVDP::Matematikk og naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Algoritmer og beregnbarhetsteori: 422
dc.subject.nsiVDP::Matematikk og naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Simulering, visualisering, signalbehandling, bildeanalyse: 429


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel