dc.description.abstract | Every Pattern Recognition (PR) problem involves a training and a testing phase. In the training
phase, the system is presented with samples, using which the distribution (also called the classconditional
distribution), of the features, is estimated. Traditional PR systems assume that the
class-conditional distributions are stationary, and thus that they do not change with time. Recently
Oommen and his co-authors have presented a strategy by which the parameters of a binomial/-
multinomial distribution can be estimated when the distribution is non-stationary.
In this thesis, we propose a selection of performance indexes that take into account crucial
characteristics of non-stationary environments. Furthermore, we use the proposed indexes to perform
a more extensive empirical evaluation of the presented strategy, and compare it with traditional
estimation algorithms operating in non-stationary environments. The purpose is to bring
forward the unique strengths/weaknesses of the competing approaches.
This thesis will consider the design and implementation of PR-systems dealing with such nonstationary
environments. In particular, we shall concentrate on the application domain that deals
with language classification in multilingual Word of Mouth discussions. Unlike traditional PR
systems, one novel feature of our method is that the training is achieved by learning the N-gram
characteristics of every language. The testing, however, invokes the SLWE because the sample
documents being classified contain parts written in different languages, interspersed with each
other, without the user knowing when one language stops, and the second language starts. Our
empirical testing demonstrates that our proposed method is capable of classifying multilingual
documents with high overall accuracy. We show that our method scales well with regard to the
dimensionality of the feature space, and that it is resistant to textual errors in the testing data.
Finally, and more importantly, the classifier performs extremely well when classifying segments of
moderate size (15-20 words), with a reported overall classifier accuracy of 0:989, and adequately
for shorter segments (10 words per segment), yielding an accuracy of 0:9596.
Thus, we believe that our results provide additional insight into the performance of the SLWE
and the MLE when operating in non-stationary environments. Furthermore, it is our opinion that
our proposed technique for language classification will be of benefit in applications dealing with
Pattern Recognition in multilingual text documents. | en |