Please use this identifier to cite or link to this item:
Title: Handwriting recognition of historical documents with few labeled data
Authors: Chammas, Edgar
Mokbel, Chafic 
Likforman-Sulem, Laurence
Affiliations: Department of Electrical Engineering 
Keywords: Training data
Microsoft Windows
Data modeling
Subjects: Training
Image segmentation
Mathematical models
Issue Date: 2018
Part of: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS)
Start page: 43
End page: 48
Conference: International Workshop on Document Analysis Systems (DAS) (13th : 24-27 April 2018 : Vienna, Austria) 
Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated text lines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multi scale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition [1].
Ezproxy URL: Link to full text
Type: Conference Paper
Appears in Collections:Department of Electrical Engineering

Show full item record

Record view(s)

checked on Sep 22, 2022

Google ScholarTM


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.