Investigating the effect of speech features and the number of HMM mixtures in the quality HMM-based synthesizers | ||||
The International Conference on Electrical Engineering | ||||
Article 182, Volume 6, 6th International Conference on Electrical Engineering ICEENG 2008, May 2008, Page 1-12 PDF (287.85 K) | ||||
Document Type: Original Article | ||||
DOI: 10.21608/iceeng.2008.34635 | ||||
View on SCiNiTO | ||||
Authors | ||||
M. S. Barakat1; M. E. Gadallah2; T. Nazmy3; T. El Arif3 | ||||
1Modern Academy. | ||||
2Military Technical College. | ||||
3Faculty of computer and information sciences, Ain shams University. | ||||
Abstract | ||||
Abstract: A statistical parametric speech synthesis system based on hidden Markov models (HMMs) has grown in popularity over the last few years. In this approach the system simultaneously models spectrum, excitation, and duration of speech using contextdependent HMMs and generates speech waveforms from the HMMs themselves. This paper describes the HMM-based speech synthesis system and applies it to Arabic language using small size training speech database as an example, and shows that the resulting model database has the advantage of being small (can be less than 1MB). Experiments show that using Mel-cepstral coefficients as spectral parameters of speech waveforms for training gives better results than using LPC or PARCOR coefficients. Experiments also show that increasing the number of Gaussian Mixtures with this relatively small size training data has the disadvantage of poor generalization of HMMs that leads to perceivable discontinuities and clicks in the synthesized speech. | ||||
Keywords | ||||
Hidden Markov Model (HMM); speech synthesis | ||||
Statistics Article View: 92 PDF Download: 132 |
||||