Vis enkel innførsel

dc.contributor.authorChen, Xiongjie
dc.contributor.authorHvistendahl, Erik Victor
dc.date.accessioned2010-01-19T12:40:52Z
dc.date.available2009
dc.date.issued2009
dc.identifier.urihttp://hdl.handle.net/11250/137068
dc.descriptionMasteroppgave i informasjons- og kommunikasjonsteknologi 2009 – Universitetet i Agder, Grimstaden
dc.description.abstractThere has been written many papers on field of mining data from structured web pages. However, few if any of these papers focus on the area of retrieving specific parts of discussion board postings. A discussion board page contains a set of postings, which can be considered data-records. Our goal is to provide insight on a specific approach to identify the locations of author, content and date+time, which are parts of a complete discussion board posting data-record. Our approach consists of combining a Naive Bayes pattern classifier, structure classification and grammar to identify the sought after elements. We give a thorough evaluation of our Naive Bayes classifier and it’s components in addition to how combinations of the different parts in our approach affected the overall result. Our best results for identifying the location of the individual elements was 94% for author, 76% for content, 86% for date+time and 60% for getting every element of each post correct. While the result for getting the complete posts is not very good, it does depend a lot on the other results. We believe our approach shows promise and with further development and refinement, it will be a viable method for automatic extraction of data from on-line discussion boards.en
dc.format.extent833608 bytes
dc.format.mimetypeapplication/pdf
dc.language.isoengen
dc.publisherUniversity of Agderen
dc.titleAutomatic data extraction from online discussion boardsen
dc.typeMaster thesisen
dc.source.pagenumber95en


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel