Online failure prediction in UNIX systems
Abstract
This thesis investigates the possibility of enhancing an existing performance
monitoring system for UNIX servers, by adding the capability of predicting upcoming
failures, using generic UNIX operating system performance metrics like used server
memory, CPU utilization, I/O traffic etc. as input data for machine learning and pattern
recognition. In this thesis we survey possible research methods based on input data
they process, and propose a novel approach for symptom based failure predicting. In
order to make a generic solution that can be used on any UNIX computer, we have
only used open source software. We evaluate the classifiers Naive Bayes and Logistic
Regression with input data in both standard and vectorized format. Furthermore we
use the search algorithm Forward stepwise selection to find an optimal generic set of
variables (features) that improves the quality of the classification. Our empirical
testing demonstrates that our proposed method is capable of predicting symptoms with
high overall accuracy, but the uncertain quality of the monitored performance data
used as input makes it difficult to ascertain if the symptoms are actually failures.
Applying the search algorithm for feature selection and vectorizing the input data set
we improved the time for classification with an order of magnitude. In our opinion the
proposed technique for online failure prediction will benefit to applications concerning
performance monitoring and contribute to the research field of online failure
prediction with new insight.
Description
Masteroppgave i informasjons- og kommunikasjonsteknologi IKT590 2011 – Universitetet i Agder, Grimstad