Vis enkel innførsel

dc.contributor.authorBhattarai, Bimal
dc.contributor.authorGranmo, Ole-Christoffer
dc.contributor.authorLei, Jiao
dc.date.accessioned2024-03-14T11:12:34Z
dc.date.available2024-03-14T11:12:34Z
dc.date.created2023-06-23T17:04:04Z
dc.date.issued2023
dc.identifier.citationBhattarai, B., Granmo, O.-C. & Lei, J. (2023). An Interpretable Knowledge Representation Framework for Natural Language Processing with Cross-Domain Application. In J. Kamps et al. (Eds.). Lecture Notes in Computer Science (LNCS), (13980, pp. 167–181), Springer Cham.en_US
dc.identifier.isbn978-3-031-28244-7
dc.identifier.issn1611-3349
dc.identifier.urihttps://hdl.handle.net/11250/3122393
dc.descriptionAuthor's accepted manuscripten_US
dc.description.abstractData representation plays a crucial role in natural language processing (NLP), forming the foundation for most NLP tasks. Indeed, NLP performance highly depends upon the effectiveness of the preprocessing pipeline that builds the data representation. Many representation learning frameworks, such as Word2Vec, encode input data based on local contextual information that interconnects words. Such approaches can be computationally intensive, and their encoding is hard to explain. We here propose an interpretable representation learning framework utilizing Tsetlin Machine (TM). The TM is an interpretable logic-based algorithm that has exhibited competitive performance in numerous NLP tasks. We employ the TM clauses to build a sparse propositional (boolean) representation of natural language text. Each clause is a class-specific propositional rule that links words semantically and contextually. Through visualization, we illustrate how the resulting data representation provides semantically more distinct features, better separating the underlying classes. As a result, the following classification task becomes less demanding, benefiting simple machine learning classifiers such as Support Vector Machine (SVM). We evaluate our approach using six NLP classification tasks and twelve domain adaptation tasks. Our main finding is that the accuracy of our proposed technique significantly outperforms the vanilla TM, approaching the competitive accuracy of deep neural network (DNN) baselines. Furthermore, we present a case study showing how the representations derived from our framework are interpretable. (We use an asynchronous and parallel version of Tsetlin Machine: available at https://github.com/cair/PyTsetlinMachineCUDA).en_US
dc.language.isoengen_US
dc.publisherSpringer Chamen_US
dc.relation.ispartofAdvances in Information Retrieval. ECIR 2023
dc.relation.ispartofseriesLecture Notes in Computer Science;13980
dc.titleAn Interpretable Knowledge Representation Framework for Natural Language Processing with Cross-Domain Applicationen_US
dc.typeChapteren_US
dc.typePeer revieweden_US
dc.description.versionacceptedVersionen_US
dc.rights.holder© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AGen_US
dc.subject.nsiVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550en_US
dc.source.pagenumber167-181en_US
dc.identifier.doihttps://doi.org/10.1007/978-3-031-28244-7_11
dc.identifier.cristin2157591
dc.relation.projectUniversitetet i Agder: CAIRen_US
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel