Please use this identifier to cite or link to this item:
https://scholarhub.balamand.edu.lb/handle/uob/596
Title: | Handwriting recognition of historical documents with few labeled data | Authors: | Chammas, Edgar Mokbel, Chafic Likforman-Sulem, Laurence |
Affiliations: | Department of Electrical Engineering | Keywords: | Training data Microsoft Windows Data modeling |
Subjects: | Training Writing Image segmentation Mathematical models |
Issue Date: | 2018 | Part of: | 2018 13th IAPR International Workshop on Document Analysis Systems (DAS) | Start page: | 43 | End page: | 48 | Conference: | International Workshop on Document Analysis Systems (DAS) (13th : 24-27 April 2018 : Vienna, Austria) | Abstract: | Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated text lines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multi scale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition [1]. |
URI: | https://scholarhub.balamand.edu.lb/handle/uob/596 | Ezproxy URL: | Link to full text | Type: | Conference Paper |
Appears in Collections: | Department of Electrical Engineering |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.