Vis enkel innførsel

dc.contributor.authorLei, Xiaoqi
dc.date.accessioned2011-10-05T06:55:58Z
dc.date.available2011-10-05T06:55:58Z
dc.date.issued2011
dc.identifier.urihttp://hdl.handle.net/11250/137531
dc.descriptionMasteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2011 – Universitetet i Agder, Grimstaden_US
dc.description.abstractAs the growing influence and importance of social media, the need of categorizing authors of overt text information from social media by their geographic origin background is becoming more urgent than ever before. To achieve the goal, some method been developed, for instance, classifying by authors' language, timezone, or by geographic terms used in the text. This thesis explored a unique classifier to determine the social media users' geographic background: Native Language Classifier, which classifies authors' native language from the text they have written in English. The Native Language Classifier set up a training set consisting of English corpus in size of 6 million words of 800 authors from 4 different language background: Chinese, Russian, Spanish and French. And through testing 200 users (50 users from each language group) the classifier made an overall accuracy of 75% by combining result from n-gram algorithms in word level, n-gram algorithms in character level, and spell checking algorithm, to classify those authors into groups of correct language background. It would be valuable for both social media analyzers, and text classifying researchers. More than the classifying result, some interesting observations are made from the test as well. They disclosed some rules behind the languages. Therefore the method developed by this thesis would also possibly become a useful tool to help researchers analyzing the feature of the languages.en_US
dc.language.isoengen_US
dc.publisherUniversitetet i Agder / University of Agderen_US
dc.titleDetermining geographic origin of social media users with Bayesian Analysis of common syntactical and spelling errors when using foreign languagesen_US
dc.typeMaster thesisen_US
dc.source.pagenumber34en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel