An Interpretable Knowledge Representation Framework for Natural Language Processing with Cross-Domain Application

Bhattarai, Bimal; Granmo, Ole-Christoffer; Lei, Jiao

dc.contributor.author	Bhattarai, Bimal
dc.contributor.author	Granmo, Ole-Christoffer
dc.contributor.author	Lei, Jiao
dc.date.accessioned	2024-03-14T11:12:34Z
dc.date.available	2024-03-14T11:12:34Z
dc.date.created	2023-06-23T17:04:04Z
dc.date.issued	2023
dc.identifier.citation	Bhattarai, B., Granmo, O.-C. & Lei, J. (2023). An Interpretable Knowledge Representation Framework for Natural Language Processing with Cross-Domain Application. In J. Kamps et al. (Eds.). Lecture Notes in Computer Science (LNCS), (13980, pp. 167–181), Springer Cham.	en_US
dc.identifier.isbn	978-3-031-28244-7
dc.identifier.issn	1611-3349
dc.identifier.uri	https://hdl.handle.net/11250/3122393
dc.description	Author's accepted manuscript	en_US
dc.description.abstract	Data representation plays a crucial role in natural language processing (NLP), forming the foundation for most NLP tasks. Indeed, NLP performance highly depends upon the effectiveness of the preprocessing pipeline that builds the data representation. Many representation learning frameworks, such as Word2Vec, encode input data based on local contextual information that interconnects words. Such approaches can be computationally intensive, and their encoding is hard to explain. We here propose an interpretable representation learning framework utilizing Tsetlin Machine (TM). The TM is an interpretable logic-based algorithm that has exhibited competitive performance in numerous NLP tasks. We employ the TM clauses to build a sparse propositional (boolean) representation of natural language text. Each clause is a class-specific propositional rule that links words semantically and contextually. Through visualization, we illustrate how the resulting data representation provides semantically more distinct features, better separating the underlying classes. As a result, the following classification task becomes less demanding, benefiting simple machine learning classifiers such as Support Vector Machine (SVM). We evaluate our approach using six NLP classification tasks and twelve domain adaptation tasks. Our main finding is that the accuracy of our proposed technique significantly outperforms the vanilla TM, approaching the competitive accuracy of deep neural network (DNN) baselines. Furthermore, we present a case study showing how the representations derived from our framework are interpretable. (We use an asynchronous and parallel version of Tsetlin Machine: available at https://github.com/cair/PyTsetlinMachineCUDA).	en_US
dc.language.iso	eng	en_US
dc.publisher	Springer Cham	en_US
dc.relation.ispartof	Advances in Information Retrieval. ECIR 2023
dc.relation.ispartofseries	Lecture Notes in Computer Science;13980
dc.title	An Interpretable Knowledge Representation Framework for Natural Language Processing with Cross-Domain Application	en_US
dc.type	Chapter	en_US
dc.type	Peer reviewed	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.pagenumber	167-181	en_US
dc.identifier.doi	https://doi.org/10.1007/978-3-031-28244-7_11
dc.identifier.cristin	2157591
dc.relation.project	Universitetet i Agder: CAIR	en_US
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Chapter.pdf
Størrelse:: 3.260Mb
Format:: Ukjent

Åpne

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel