Robust Interpretable Text Classification against Spurious Correlations Using AND-rules with Negation

Yadav, Rohan Kumar; Lei, Jiao; Granmo, Ole-Christoffer; Goodwin, Morten

Yadav, Rohan Kumar; Lei, Jiao; Granmo, Ole-Christoffer; Goodwin, Morten

Peer reviewed, Conference object

Accepted version

Åpne

Proceeding.pdf (1.018Mb)

Permanent lenke

https://hdl.handle.net/11250/3057374

Utgivelsesdato

2022

Metadata

Vis full innførsel

Samlinger

Originalversjon

Yadav, R. K., Lei, J., Granmo, O.-C. & Goodwin, M. (2022). Robust Interpretable Text Classification against Spurious Correlations Using AND-rules with Negation. International Joint Conferences on Artificial Intelligence, 4439-4446. https://doi.org/10.24963/ijcai.2022/616

Sammendrag

The state-of-the-art natural language processing models have raised the bar for excellent performance on a variety of tasks in recent years. However, concerns are rising over their primitive sensitivity to distribution biases that reside in the training and testing data. This issue hugely impacts the performance of the models when exposed to out-of-distribution and counterfactual data. The root cause seems to be that many machine learning models are prone to learn the shortcuts, modelling simple correlations rather than more fundamental and general relationships. As a result, such text classifiers tend to perform poorly when a human makes minor modifications to the data, which raises questions regarding their robustness. In this paper, we employ a rule-based architecture called Tsetlin Machine (TM) that learns both simple and complex correlations by ANDing features and their negations. As such, it generates explainable AND-rules using negated and non-negated reasoning. Here, we explore how non-negated reasoning can be more prone to distribution biases than negated reasoning. We further leverage this finding by adapting the TM architecture to mainly perform negated reasoning using the specificity parameter s. As a result, the AND-rules becomes robust to spurious correlations and can also correctly predict counterfactual data. Our empirical investigation of the model's robustness uses the specificity s to control the degree of negated reasoning. Experiments on publicly available Counterfactually-Augmented Data demonstrate that the negated clauses are robust to spurious correlations and outperform Naive Bayes, SVM, and Bi-LSTM by up to 20 %, and ELMo by almost 6 % on counterfactual test data.