Stochastic Learning-Based Estimation Methods for Pattern Recognition and Its Application to Topic Detection and Tracking

Stensby, Aleksander Mølsæther

dc.contributor.author	Stensby, Aleksander Mølsæther
dc.date.accessioned	2009-03-12T13:49:09Z
dc.date.issued	2008
dc.identifier.uri	http://hdl.handle.net/11250/137027
dc.description	Masteroppgave i informasjons- og kommunikasjonsteknologi 2008 – Universitetet i Agder, Grimstad	en
dc.description.abstract	Every Pattern Recognition (PR) problem involves a training and a testing phase. In the training phase, the system is presented with samples, using which the distribution (also called the classconditional distribution), of the features, is estimated. Traditional PR systems assume that the class-conditional distributions are stationary, and thus that they do not change with time. Recently Oommen and his co-authors have presented a strategy by which the parameters of a binomial/- multinomial distribution can be estimated when the distribution is non-stationary. In this thesis, we propose a selection of performance indexes that take into account crucial characteristics of non-stationary environments. Furthermore, we use the proposed indexes to perform a more extensive empirical evaluation of the presented strategy, and compare it with traditional estimation algorithms operating in non-stationary environments. The purpose is to bring forward the unique strengths/weaknesses of the competing approaches. This thesis will consider the design and implementation of PR-systems dealing with such nonstationary environments. In particular, we shall concentrate on the application domain that deals with language classification in multilingual Word of Mouth discussions. Unlike traditional PR systems, one novel feature of our method is that the training is achieved by learning the N-gram characteristics of every language. The testing, however, invokes the SLWE because the sample documents being classified contain parts written in different languages, interspersed with each other, without the user knowing when one language stops, and the second language starts. Our empirical testing demonstrates that our proposed method is capable of classifying multilingual documents with high overall accuracy. We show that our method scales well with regard to the dimensionality of the feature space, and that it is resistant to textual errors in the testing data. Finally, and more importantly, the classifier performs extremely well when classifying segments of moderate size (15-20 words), with a reported overall classifier accuracy of 0:989, and adequately for shorter segments (10 words per segment), yielding an accuracy of 0:9596. Thus, we believe that our results provide additional insight into the performance of the SLWE and the MLE when operating in non-stationary environments. Furthermore, it is our opinion that our proposed technique for language classification will be of benefit in applications dealing with Pattern Recognition in multilingual text documents.	en
dc.format.extent	2316266 bytes
dc.format.mimetype	application/pdf
dc.language.iso	eng	en
dc.publisher	Universitetet i Agder / Agder University	en
dc.subject.classification	IKT590
dc.title	Stochastic Learning-Based Estimation Methods for Pattern Recognition and Its Application to Topic Detection and Tracking	en
dc.type	Master thesis	en
dc.subject.nsi	VDP::Mathematics and natural science: 400::Information and communication science: 420::System development and system design: 426	en
dc.source.pagenumber	141	en

Tilhørende fil(er)

Filnavn:: stensby_master_thesis.pdf
Størrelse:: 2.208Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Master's theses in Information and Communication Technology [505]
MM500, IKT590, IKT591

Vis enkel innførsel