Towards Detecting Textual Plagiarism Using Machine Learning Methods
Abstract
Textual plagiarism is passing off someone else’s text as your own. The current
state of the art in plagiarism detection performs well, but often uses a series of
manually determined thresholds of metrics in order to determine whether an author
is guilty of performing plagiarism or not. These thresholds are optimized for
a single data set and are not optimal for all situations or forms of plagiarism. The
detection methodologies also require a professional familiar with the algorithms
in order to be properly adjusted, due to their complexity. Using a pre-classified
data set, machine learning methods allow teachers and censors without knowledge
of the methodology to use a plagiarism detection tool specifically designed
for their needs.
This thesis demonstrates that a methodology using machine learning, without
the need to set thresholds, can match, and in some cases surpass, the top methodologies
in the current state of the art. With more work, future methodologies may
possibly outperform both the best commercial and freely available methodologies.
Description
Masteroppgave informasjons- og kommunikasjonsteknologi - Universitetet i Agder, 2015