Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian

Aaby, Pernille; Biermann, Daniel; Yazidi, Anis; Borges Moreno e Mello, Gustavo; Palumbo, Fabrizio

dc.contributor.author	Aaby, Pernille
dc.contributor.author	Biermann, Daniel
dc.contributor.author	Yazidi, Anis
dc.contributor.author	Borges Moreno e Mello, Gustavo
dc.contributor.author	Palumbo, Fabrizio
dc.date.accessioned	2024-03-12T14:02:25Z
dc.date.available	2024-03-12T14:02:25Z
dc.date.created	2024-01-26T20:17:56Z
dc.date.issued	2023
dc.identifier.citation	Aaby, P., Biermann, D., Yazidi, A., Borges Moreno e Mello, G. & Palumbo, F. (2023). Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian. Lecture Notes in Computer Science (LNCS), 14381, 47-58.	en_US
dc.identifier.isbn	978-3-031-47993-9
dc.identifier.issn	1611-3349
dc.identifier.uri	https://hdl.handle.net/11250/3122000
dc.description	Author's accepted manuscript	en_US
dc.description.abstract	Contextual language models, such as transformers, can solve a wide range of language tasks ranging from text classification to question answering and machine translation. Like many deep learning models, the performance heavily depends on the quality and amount of data available for training. This poses a problem for low-resource languages, such as Norwegian, that can not provide the necessary amount of training data. In this article, we investigate the use of multilingual models as a step toward overcoming the data sparsity problem for minority languages. In detail, we study how words are represented by multilingual BERT models across two languages of our interest: English and Norwegian. Our analysis shows that multilingual models similarly encode English-Norwegian word pairs. The multilingual model automatically aligns semantics across languages without supervision. Additionally, our analysis also shows that embedding a word encodes information about the language to which it belongs. We, therefore, believe that in pre-trained multilingual models’ knowledge from one language can be transferred to another without direct supervision and help solve the data sparsity problem for minor languages.	en_US
dc.language.iso	eng	en_US
dc.publisher	Springer	en_US
dc.relation.ispartof	Artificial Intelligence XL: 43rd SGAI International Conference on Artificial Intelligence, AI 2023
dc.title	Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian	en_US
dc.type	Chapter	en_US
dc.type	Peer reviewed	en_US
dc.description.version	acceptedVersion	en_US
dc.rights.holder	© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023	en_US
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	en_US
dc.source.pagenumber	47-58	en_US
dc.source.volume	14381	en_US
dc.source.journal	Lecture Notes in Computer Science (LNCS)	en_US
dc.identifier.doi	https://doi.org/10.1007/978-3-031-47994-6_4
dc.identifier.cristin	2235774
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Article.pdf
Størrelse:: 892.4Kb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel