Handwritten word recognition using web resources and recurrent neural networks

Oprean, Cristina; Likforman-Sulem, Laurence; Popescu, Adrian; Mokbel, Chafic

UOBScholar Hub

UOB Libraries created the UOBScholar Hub, an Institutional Repository (IR) for archiving and collecting all the research output of the UOB community. It aims to improve the visibility, usage and impact of research conducted at UOB. Materials included are: academic journal articles, conference papers and presentations, books and book chapters, ongoing research papers, reports and patents.

Please use this identifier to cite or link to this item: https://scholarhub.balamand.edu.lb/handle/uob/2057

DC Field	Value	Language
dc.contributor.author	Oprean, Cristina	en_US
dc.contributor.author	Likforman-Sulem, Laurence	en_US
dc.contributor.author	Popescu, Adrian	en_US
dc.contributor.author	Mokbel, Chafic	en_US
dc.date.accessioned	2020-12-23T09:05:35Z	-
dc.date.available	2020-12-23T09:05:35Z	-
dc.date.issued	2015	-
dc.identifier.uri	https://scholarhub.balamand.edu.lb/handle/uob/2057	-
dc.description.abstract	Handwriting recognition systems usually rely on static dictionaries and language models. Full coverage of these dictionaries is generally not achieved when dealing with unrestricted document corpora due to the presence of Out-Of-Vocabulary (OOV) words. We propose an approach which uses the World Wide Web as a corpus to improve dictionary coverage. We exploit the very large and freely available Wikipedia corpus in order to obtain dynamic dictionaries on the fly. We rely on recurrent neural network (RNN) recognizers, with and without linguistic resources, to detect words that are non-reliably recognized within a word sequence. Such words are labeled as non-anchor words (NAWs) and include OOVs and In-Vocabulary words recognized with low confidence. To recognize a non-anchor word, a dynamic dictionary is built by selecting words from the Web resource based on their string similarity with the NAW image, and their linguistic relevance in the NAW context. Similarity is evaluated by computing the edit distance between the sequence of characters generated by the RNN recognizer exploited as a filler model, and the Wikipedia words. Linguistic relevance is based on an N-gram language model estimated from the Wikipedia corpus. Experiments conducted on a word-segmented version of the publicly available RIMES database show that the proposed approach can improve recognition accuracy compared to systems based on static dictionaries only. The proposed approach shows even better behavior as the proportion of OOVs increases, in terms of both accuracy and dictionary coverage.	en_US
dc.language.iso	eng	en_US
dc.subject	Handwritten word recognition	en_US
dc.subject	Out-of-vocabulary word recognition	en_US
dc.subject	Web resources	en_US
dc.subject	Dynamic dictionary	en_US
dc.subject	Recurrent neural networks	en_US
dc.title	Handwritten word recognition using web resources and recurrent neural networks	en_US
dc.type	Journal Article	en_US
dc.contributor.affiliation	Department of Electrical Engineering	en_US
dc.description.volume	18	en_US
dc.description.issue	4	en_US
dc.description.startpage	287	en_US
dc.description.endpage	301	en_US
dc.date.catalogued	2019-05-28	-
dc.description.status	Published	en_US
dc.identifier.ezproxyURL	http://ezsecureaccess.balamand.edu.lb/login?url=https://link.springer.com/article/10.1007/s10032-015-0251-1	en_US
dc.identifier.OlibID	192137	-
dc.relation.ispartoftext	International journal on document analysis and recognition (IJDAR)	en_US
dc.provenance.recordsource	Olib	en_US
Appears in Collections:	Department of Electrical Engineering

Show simple item record

Record view(s)

73

checked on Nov 21, 2024

Google Scholar^TM

Check

UOBScholar Hub

Record view(s)

Google ScholarTM

Google Scholar^TM