Investigating the effect of speech features and the number of HMM mixtures in the quality HMM-based synthesizers

Barakat, M. S.; Gadallah, M. E.; Nazmy, T.; El Arif, T.

doi:10.21608/iceeng.2008.34635

	Investigating the effect of speech features and the number of HMM mixtures in the quality HMM-based synthesizers
The International Conference on Electrical Engineering
Article 182, Volume 6, 6th International Conference on Electrical Engineering ICEENG 2008, May 2008, Page 1-12 PDF (287.85 K)
Document Type: Original Article
DOI: 10.21608/iceeng.2008.34635
View on SCiNiTO
Authors
M. S. Barakat¹; M. E. Gadallah²; T. Nazmy³; T. El Arif³
¹Modern Academy.
²Military Technical College.
³Faculty of computer and information sciences, Ain shams University.
Abstract
Abstract: A statistical parametric speech synthesis system based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach the system simultaneously models spectrum, excitation, and duration of speech using contextdependent HMMs and generates speech waveforms from the HMMs themselves. This paper describes the HMM-based speech synthesis system and applies it to Arabic language using small size training speech database as an example, and shows that the resulting model database has the advantage of being small (can be less than 1MB). Experiments show that using Mel-cepstral coefficients as spectral parameters of speech waveforms for training gives better results than using LPC or PARCOR coefficients. Experiments also show that increasing the number of Gaussian Mixtures with this relatively small size training data has the disadvantage of poor generalization of HMMs that leads to perceivable discontinuities and clicks in the synthesized speech.
Keywords
Hidden Markov Model (HMM); speech synthesis


Statistics Article View: 92 PDF Download: 132