Building annotated written and spoken arabic LRs in NEMLAR project

Yaseen, Mustafa; Attia, M; Maegaard, Bente; Choukri, Khalid; Mokbel, Chafic

UOBScholar Hub

UOB Libraries created the UOBScholar Hub, an Institutional Repository (IR) for archiving and collecting all the research output of the UOB community. It aims to improve the visibility, usage and impact of research conducted at UOB. Materials included are: academic journal articles, conference papers and presentations, books and book chapters, ongoing research papers, reports and patents.

Please use this identifier to cite or link to this item: https://scholarhub.balamand.edu.lb/handle/uob/425

DC Field	Value	Language
dc.contributor.author	Yaseen, Mustafa	en_US
dc.contributor.author	Attia, M	en_US
dc.contributor.author	Maegaard, Bente	en_US
dc.contributor.author	Choukri, Khalid	en_US
dc.contributor.author	Mokbel, Chafic	en_US
dc.date.accessioned	2020-12-23T08:30:11Z	-
dc.date.available	2020-12-23T08:30:11Z	-
dc.date.issued	2006	-
dc.identifier.uri	https://scholarhub.balamand.edu.lb/handle/uob/425	-
dc.description.abstract	The NEMLAR project: Network for Euro-Mediterranean LAnguage Resource and human language technology development and support; (www.nemlar.org) is a project supported by the EC with partners from Europe and the Middle East; whose objective is to build a network of specialized partners to promote and support the development of Arabic Language Resources in the Mediterranean region. The project focused on identifying the state of the art of LRs in the region, assessing priority requirements through consultations with language industry and communication players, and establishing a protocol for developing and identifying a Basic Language Resource Kit (BLARK) for Arabic, and to assess first priority requirements. The BLARK is defined as the minimal set of language resources that is necessary to do any pre-competitive research and education, in addition to the development of crucial components for any future NLP industry. Following the identification of high priority resources the NEMLAR partners agreed to focus on, and produce three main resources, which are: 1) Annotated Arabic written corpus of about 500 K words, 2) Arabic speech corpus for TTS applications of 2x5 hours, and 3) Arabic broadcast news speech corpus of 40 hours Modern Standard Arabic. For each of the resources underlying linguistic models and assumptions of the corpus, technical specifications, methodologies for the collection and building of the resources, validation and verification mechanisms were put and applied for the three LRs.	en_US
dc.format.extent	6 p.	en_US
dc.language.iso	eng	en_US
dc.title	Building annotated written and spoken arabic LRs in NEMLAR project	en_US
dc.type	Conference Paper	en_US
dc.relation.conference	International Conference on Language Resources and Evaluation (LREC06) (5th : May, 2006 : Genoa, Italy)	en_US
dc.contributor.affiliation	Department of Electrical Engineering	en_US
dc.description.startpage	533	en_US
dc.description.endpage	538	en_US
dc.date.catalogued	2019-05-23	-
dc.description.status	Published	en_US
dc.identifier.OlibID	192011	-
dc.identifier.openURL	http://lrec-conf.org/proceedings/lrec2006/pdf/131_pdf.pdf	en_US
dc.relation.ispartoftext	Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC06)	en_US
dc.provenance.recordsource	Olib	en_US
Appears in Collections:	Department of Electrical Engineering

Show simple item record

Record view(s)

54

checked on Nov 21, 2024

Google Scholar^TM

Check

UOBScholar Hub

Record view(s)

Google ScholarTM

Google Scholar^TM