Vis enkel innførsel

dc.contributor.authorZhu, Lida
dc.date.accessioned2009-03-12T14:16:03Z
dc.date.issued2008
dc.identifier.urihttp://hdl.handle.net/11250/137039
dc.descriptionMasteroppgave i informasjons- og kommunikasjonsteknologi 2008 – Universitetet i Agder, Grimstaden
dc.description.abstractIn this thesis we have presented a solution to classify websites into geographical attribute code (NUTS) and economical activities attribute codes(NACE). We propose a solution for web site classification with high accuracy. We use keywordbased document classification methods which had shown good performance. After classification, each document is assigned a class label from a set of predefined categories, which is based on a pool of pre-classified sample documents. Our solution includes to remove stop words and skip html tags, which identify the informative term, remove the non-informative or redundant terms to improve the classification accuracy; use mutual information for feature selection to reduce the dimensional feature space and produce vectors for classification; finally, use Naïve Bayes and Decision Tree algorithm to perform the classification and also provide the performance comparison.The system has shown great performance in the experiment. It classifies web sites into NACE categories with maximum accuracy of 97% performed on 46 web pages, while NUTS classification has best accuracy of 93% performed on 223 web pages.en
dc.format.extent806386 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoengen
dc.publisherUniversitetet i Agder / Agder Universityen
dc.subject.classificationIKT590
dc.titleAutomatic Categorization of Web Sitesen
dc.typeMaster thesisen
dc.subject.nsiVDP::Mathematics and natural science: 400::Information and communication science: 420::Knowledge based systems: 425en
dc.source.pagenumber69en


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel