Vis enkel innførsel

dc.contributor.authorAaby, Pernille
dc.contributor.authorBiermann, Daniel
dc.contributor.authorYazidi, Anis
dc.contributor.authorBorges Moreno e Mello, Gustavo
dc.contributor.authorPalumbo, Fabrizio
dc.date.accessioned2024-03-12T14:02:25Z
dc.date.available2024-03-12T14:02:25Z
dc.date.created2024-01-26T20:17:56Z
dc.date.issued2023
dc.identifier.citationAaby, P., Biermann, D., Yazidi, A., Borges Moreno e Mello, G. & Palumbo, F. (2023). Exploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegian. Lecture Notes in Computer Science (LNCS), 14381, 47-58.en_US
dc.identifier.isbn978-3-031-47993-9
dc.identifier.issn1611-3349
dc.identifier.urihttps://hdl.handle.net/11250/3122000
dc.descriptionAuthor's accepted manuscripten_US
dc.description.abstractContextual language models, such as transformers, can solve a wide range of language tasks ranging from text classification to question answering and machine translation. Like many deep learning models, the performance heavily depends on the quality and amount of data available for training. This poses a problem for low-resource languages, such as Norwegian, that can not provide the necessary amount of training data. In this article, we investigate the use of multilingual models as a step toward overcoming the data sparsity problem for minority languages. In detail, we study how words are represented by multilingual BERT models across two languages of our interest: English and Norwegian. Our analysis shows that multilingual models similarly encode English-Norwegian word pairs. The multilingual model automatically aligns semantics across languages without supervision. Additionally, our analysis also shows that embedding a word encodes information about the language to which it belongs. We, therefore, believe that in pre-trained multilingual models’ knowledge from one language can be transferred to another without direct supervision and help solve the data sparsity problem for minor languages.en_US
dc.language.isoengen_US
dc.publisherSpringeren_US
dc.relation.ispartofArtificial Intelligence XL: 43rd SGAI International Conference on Artificial Intelligence, AI 2023
dc.titleExploring Multilingual Word Embedding Alignments in BERT Models: A Case Study of English and Norwegianen_US
dc.typeChapteren_US
dc.typePeer revieweden_US
dc.description.versionacceptedVersionen_US
dc.rights.holder© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023en_US
dc.subject.nsiVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550en_US
dc.source.pagenumber47-58en_US
dc.source.volume14381en_US
dc.source.journalLecture Notes in Computer Science (LNCS)en_US
dc.identifier.doihttps://doi.org/10.1007/978-3-031-47994-6_4
dc.identifier.cristin2235774
cristin.qualitycode1


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel