Vis enkel innførsel

dc.contributor.authorLien, Henrik
dc.date.accessioned2021-10-19T11:25:14Z
dc.date.available2021-10-19T11:25:14Z
dc.date.issued2021
dc.identifier.citationLien, H. (2021) An exploration of semi-supervised text classification (Master's thesis). University of Agder, Grimstad.en_US
dc.identifier.urihttps://hdl.handle.net/11250/2823871
dc.descriptionMaster's thesis in Information- and communication technology (IKT590)en_US
dc.description.abstractObtaining labeled data to train natural language machine learning algorithms is often expensive and time-consuming, while unlabeled data usually is free and easy to get. Frequently a large amount of labeled data is required by supervised learning to achieve good text classification performance. Semi-supervised learning (SSL) for text classification is an exciting area of research. SSL is a technique exploiting unlabeled and labeled data to achieve better classification performance than using labeled data alone and is particularly useful with limited labeled data. This thesis explores the impact of different parameters on SSL with unsupervised pre-training and supervised fine-tuning for a text classification task. Key to this work is the study of hyperparameters, including the amount of preprocessing data and model size. We examine smaller and larger models, including feed-forward, recurrent, and seq2seq models, used for experimentation. This thesis uses SSL performance as a performance metric. It measures the difference in text classification performance of a model when using the SSL compared to the supervised learning approach. Thus, the SSL performance is an intuitive measure for investigating the benefits of SSL.en_US
dc.language.isoengen_US
dc.publisherUniversity of Agderen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.no*
dc.subjectIKT590en_US
dc.titleAn exploration of semi-supervised text classificationen_US
dc.typeMaster thesisen_US
dc.rights.holder© 2021 Henrik Lienen_US
dc.subject.nsiVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550en_US
dc.source.pagenumber118en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal