CArDIS: A Swedish Historical Handwritten Character and Word Dataset

Yavariabdi, Amir; Kusetogullari, Huseyin; Celik, Turgay; Thummanapally, Shivani; Rijwan, Sakib; Hall, Johan

dc.contributor.author	Yavariabdi, Amir
dc.contributor.author	Kusetogullari, Huseyin
dc.contributor.author	Celik, Turgay
dc.contributor.author	Thummanapally, Shivani
dc.contributor.author	Rijwan, Sakib
dc.contributor.author	Hall, Johan
dc.date.accessioned	2022-09-28T10:53:30Z
dc.date.available	2022-09-28T10:53:30Z
dc.date.created	2022-08-25T11:01:06Z
dc.date.issued	2022
dc.identifier.citation	Yavariabdi, A., Kusetogullari, H., Celik, T., Thummanapally, S., Rijwan, S. & Hall, J. (2022). CArDIS: A Swedish Historical Handwritten Character and Word Dataset. IEEE Access, 10, 55338-55349. doi:	en_US
dc.identifier.issn	2169-3536
dc.identifier.uri	https://hdl.handle.net/11250/3022126
dc.description.abstract	This paper introduces a new publicly available image-based Swedish historical handwritten character and word dataset named C haracter Ar kiv D igital S weden (CArDIS) ( https://cardisdataset.github.io/CARDIS/ ). The samples in CArDIS are collected from 64, 084 Swedish historical documents written by several anonymous priests between 1800 and 1900. The dataset contains 116, 000 Swedish alphabet images in RGB color space with 29 classes, whereas the word dataset contains 30, 000 image samples of ten popular Swedish names as well as 1, 000 region names in Sweden. To examine the performance of different machine learning classifiers on CArDIS dataset, three different experiments are conducted. In the first experiment, classifiers such as Support Vector Machine (SVM), Artificial Neural Networks (ANN), k-Nearest Neighbor (k-NN), Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), and Random Forest (RF) are trained on existing character datasets which are Extended Modified National Institute of Standards and Technology (EMNIST), IAM and CVL and tested on CArDIS dataset. In the second and third experiments, the same classifiers as well as two pre-trained VGG-16 and VGG-19 classifiers are trained and tested on CArDIS character and word datasets. The experiments show that the machine learning methods trained on existing handwritten character datasets struggle to recognize characters efficiently on the CArDIS dataset, proving that characters in the CArDIS contain unique features and characteristics. Moreover, in the last two experiments, the deep learning-based classifiers provide the best recognition rates.	en_US
dc.language.iso	eng	en_US
dc.publisher	IEEE	en_US
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	CArDIS: A Swedish Historical Handwritten Character and Word Dataset	en_US
dc.title.alternative	CArDIS: A Swedish Historical Handwritten Character and Word Dataset	en_US
dc.type	Peer reviewed	en_US
dc.type	Journal article	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	© 2022 Author(s)	en_US
dc.subject.nsi	VDP::Teknologi: 500	en_US
dc.source.pagenumber	55338-55349	en_US
dc.source.volume	10	en_US
dc.source.journal	IEEE Access	en_US
dc.identifier.doi	10.1109/ACCESS.2022.3175197
dc.identifier.cristin	2045917
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: Article.pdf
Størrelse:: 2.308Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Publikasjoner fra CRIStin [4038]
Scientific Publications in Engineering Sciences [706]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal