Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records

Berge, Geir Thore; Granmo, Ole-Christoffer; Tveit, Tor Oddbjørn; Ruthjersen, Anna Linda; Sharma, Jivitesh

dc.contributor.author	Berge, Geir Thore
dc.contributor.author	Granmo, Ole-Christoffer
dc.contributor.author	Tveit, Tor Oddbjørn
dc.contributor.author	Ruthjersen, Anna Linda
dc.contributor.author	Sharma, Jivitesh
dc.date.accessioned	2023-11-02T12:04:37Z
dc.date.available	2023-11-02T12:04:37Z
dc.date.created	2023-10-10T10:14:52Z
dc.date.issued	2023
dc.identifier.citation	Berge, G. T., Granmo, O-C., Tveit, T. O., Ruthjersen, A. L., Sharma, J. (2023). Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records. BMC Medical Informatics and Decision Making, 23, 1-25.	en_US
dc.identifier.issn	1472-6947
dc.identifier.uri	https://hdl.handle.net/11250/3100270
dc.description.abstract	Background Data mining of electronic health records (EHRs) has a huge potential for improving clinical decision support and to help healthcare deliver precision medicine. Unfortunately, the rule-based and machine learning-based approaches used for natural language processing (NLP) in healthcare today all struggle with various shortcomings related to performance, efciency, or transparency. Methods In this paper, we address these issues by presenting a novel method for NLP that implements unsupervised learning of word embeddings, semi-supervised learning for simplifed and accelerated clinical vocabulary and concept building, and deterministic rules for fne-grained control of information extraction. The clinical language is automatically learnt, and vocabulary, concepts, and rules supporting a variety of NLP downstream tasks can further be built with only minimal manual feature engineering and tagging required from clinical experts. Together, these steps create an open processing pipeline that gradually refnes the data in a transparent way, which greatly improves the interpretable nature of our method. Data transformations are thus made transparent and predictions interpretable, which is imperative for healthcare. The combined method also has other advantages, like potentially being language independent, demanding few domain resources for maintenance, and able to cover misspellings, abbreviations, and acronyms. To test and evaluate the combined method, we have developed a clinical decision support system (CDSS) named Information System for Clinical Concept Searching (ICCS) that implements the method for clinical concept tagging, extraction, and classifcation. Results In empirical studies the method shows high performance (recall 92.6%, precision 88.8%, F-measure 90.7%), and has demonstrated its value to clinical practice. Here we employ a real-life EHR-derived dataset to evaluate the method’s performance on the task of classifcation (i.e., detecting patient allergies) against a range of common supervised learning algorithms. The combined method achieves state-of-the-art performance compared to the alternative methods we evaluate. We also perform a qualitative analysis of common word embedding methods on the task of word similarity to examine their potential for supporting automatic feature engineering for clinical NLP tasks. Conclusions Based on the promising results, we suggest more research should be aimed at exploiting the inherent synergies between unsupervised, supervised, and rule-based paradigms for clinical NLP.	en_US
dc.language.iso	eng	en_US
dc.publisher	BioMed Central (BMC)	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Combining unsupervised, supervised and rule-based learning: the case of detecting patient allergies in electronic health records	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© 2023 The Author(s)	en_US
dc.subject.nsi	VDP::Medisinske Fag: 700	en_US
dc.subject.nsi	VDP::Samfunnsvitenskap: 200::Biblioteks- og informasjonsvitenskap: 320::Informasjons- og kommunikasjonssystemer: 321	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.pagenumber	25	en_US
dc.source.volume	23	en_US
dc.source.journal	BMC Medical Informatics and Decision Making	en_US
dc.identifier.doi	https://doi.org/10.1186/s12911-023-02271-8
dc.identifier.cristin	2183205
dc.source.articlenumber	188	en_US
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Article.pdf
Størrelse:: 4.020Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra CRIStin [4037]
Scientific Publications in Health and Nursing Science [492]
Scientific Publications in Information Systems [147]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal