Multi speaker text to speech transfer learning

Adra, Mira

UOBScholar Hub

UOB Libraries created the UOBScholar Hub, an Institutional Repository (IR) for archiving and collecting all the research output of the UOB community. It aims to improve the visibility, usage and impact of research conducted at UOB. Materials included are: academic journal articles, conference papers and presentations, books and book chapters, ongoing research papers, reports and patents.

Please use this identifier to cite or link to this item: https://scholarhub.balamand.edu.lb/handle/uob/6697

DC Field	Value	Language
dc.contributor.advisor	Mokbel, Chafic	en_US
dc.contributor.author	Adra, Mira	en_US
dc.date.accessioned	2023-03-07T08:27:24Z	-
dc.date.available	2023-03-07T08:27:24Z	-
dc.date.issued	2023	-
dc.identifier.uri	https://scholarhub.balamand.edu.lb/handle/uob/6697	-
dc.description	Includes bibliographical references (p. 35-36)	en_US
dc.description.abstract	Speech synthesis is experiencing a breakthrough as progressive leaps in artificial intelligence have led to a shift from the robotic standard voice to a more human-like voice with emotional inflections across multiple speakers and languages. Tacotron has been used intensively for such text-to-speech syntheses lately. Accordingly, in this thesis, I aim at studying the possibility of performing Multispeaker text-to-speech (TTS) transfer learning with Tacotron 2 in French to overcome the need of having multiple machines one per speaker. That is achieved by finetuning the Tacotron 2 training processor to allow learning the multiple speakers available in our dataset. For that, we use publicly available online French datasets that are already annotated. However, the main challenge that such models face is data efficiency and quality of the speaker audio files as well as speaker variability where each speaker might have a different accent or speaking rate. Despite that our model provided us with adequate results when presented with only a few hours of new speakers from different genders.	en_US
dc.description.statementofresponsibility	by Mira Adra	en_US
dc.format.extent	1 online resource (36 pages) : ill., tables	en_US
dc.language.iso	eng	en_US
dc.rights	This object is protected by copyright, and is made available here for research and educational purposes. Permission to reuse, publish, or reproduce the object beyond the personal and educational use exceptions must be obtained from the copyright holder	en_US
dc.subject	Text to speech, Transfer learning, Multi-speaker, Tacotron 2, French	en_US
dc.subject.lcsh	Speech synthesis	en_US
dc.subject.lcsh	Artificial intelligence	en_US
dc.subject.lcsh	Automatic speech recognition	en_US
dc.subject.lcsh	Machine learning	en_US
dc.subject.lcsh	Dissertations, Academic	en_US
dc.subject.lcsh	University of Balamand--Dissertations	en_US
dc.title	Multi speaker text to speech transfer learning	en_US
dc.type	Thesis	en_US
dc.contributor.corporate	University of Balamand	en_US
dc.contributor.department	Department of Computer Engineering	en_US
dc.contributor.faculty	Faculty of Engineering	en_US
dc.contributor.institution	University of Balamand	en_US
dc.date.catalogued	2023-03-07	-
dc.description.degree	MS in Computer Engineering	en_US
dc.description.status	Published	en_US
dc.identifier.ezproxyURL	http://ezsecureaccess.balamand.edu.lb/login?url=http://olib.balamand.edu.lb/projects_and_theses/301370.pdf	en_US
dc.identifier.OlibID	301370	-
dc.provenance.recordsource	Olib	en_US
Appears in Collections:	UOB Theses and Projects

Show simple item record

Record view(s)

121

checked on Nov 21, 2024

Google Scholar^TM

Check

UOBScholar Hub

Record view(s)

Google ScholarTM

Google Scholar^TM