Assessment of disordered voices based on an optimized glottal source model

Assessment of disordered voices based on an optimized glottal source model

In this paper, a method for the assessment of disordered voices is proposed. A feature named mean opening quotient (MOQ) obtained from the glottal source estimation is used as an acoustic cue to summarize the degree of severity of the voice disorder. The analysis method uses the empirical mode decomposition algorithm to estimate the glottal source excitation signal from the speech signal. The logarithm of the magnitude spectrum of the speech signal is decomposed into oscillatory modes, called intrinsic mode functions, that are clustered into two classes, the spectral envelope and the harmonic component. The exploitation of the phase information jointly with the estimated harmonic component enables the estimation of the glottal source signal. An appropriate parametric model is fitted to the estimated glottal source excitation signal. The optimal parameters of the glottal source excitation model from which the MOQ is defined are obtained by using a genetic algorithm. The presented method is tested on a corpus of natural speech including the vowel [a] uttered by 22 normophonic speakers and 229 speakers with different degrees of dysphonia. Experimental results show that the proposed method is very effective for assessing the degree of severity of the voice disorder.

___

  • [1] Kondo K. Subjective Quality Measurement of Speech: Its Evaluation, Estimation and Applications. Berlin, Germany: Springer, 2012.
  • [2] Labuschagne IB, Ciocca V. The perception of breathiness: acoustic correlates and the influence of methodological factors. Acoust Sci Tech 2016; 37: 191-201.
  • [3] Kreiman J, Vanlancker-Sidtis D, Gerratt BR. Defining and measuring voice quality. In: Proceedings of the Workshop on Voice Quality; 27–29 August 2003; Geneva, Switzerland, pp. 115-120.
  • [4] Alpan A, Grenez F, Schoentgen J. Cepstral analysis of perceptually rated synthetic disordered speech stimuli. In: Proceedings of the International Workshop on Models and Analysis of Vocal Emission for Biomedical Applications; 25–27 August 2011; Florence, Italy, pp. 131-134.
  • [5] Herman-Ackah YD, Michael DD, Goding GS. The relationship between cepstral peak prominence and selected parameters of dysphonia. J Voice 2002; 16: 20-27.
  • [6] Kacha A, Grenez F, Schoentgen J. Estimation of dysperiodicities in disordered speech. Speech Commun 2006; 48: 1365-1378.
  • [7] Alpan A, Maryn Y, Kacha A, Grenez F, Schoentgen J. Multi-band dysperiodicity analyses of disordered connected speech. Speech Commun 2011; 53: 131-141.
  • [8] Zhonga Z, Jiang T, Zhang W, Yao H, Xiao S. Analyzing speech of patients with vocal polyps based on channel parameters and fuzzy logic systems. Comput Math Appl 2011; 62: 2834-2842.
  • [9] Murphy PJ, Akande OO. Noise estimation in voice signals using short-term cepstral analysis. J Acoust Soc Am 2007; 121: 1679-1690.
  • [10] Vasilakis M, Stylianou Y. Voice pathology detection based on short-term jitter estimations in running speech. Folia Phoniatr Logop 2009; 61: 153-170.
  • [11] Drugman T, Dubuisson T, Dutoit T. On the mutual information between source and jitter contributions for voice pathology detection. In: Proceedings of Interspeech 2009; 6–10 September 2009; Brighton, UK. pp. 1463-1466.
  • [12] Gomez-Vilda P, Fernandez-Baillo R, Rodellar-Biarge R, Lluis VN, Alvarez-Marquina A, Mazaira-Fernandez LM, Martinez-Olalla R, Godino-Llorente JI. Glottal source biometrical signature for voice pathology detection. Speech Commun 2009; 51: 759-781.
  • [13] Quatieri TF, Malyska N. Vocal-source biomarkers for depression: link to psychomotor activity. In: Proceedings of Interspeech 2012; 9–13 September 2012; Portland, OR, USA. pp. 1059-1062.
  • [14] Huang NE, Shen Z, Long S, Wu M, Shih H, Zheng Q, Yen N, Tung C, Liu H. The empirical mode decomposition and the Hilbert spectrum for non-linear and non-stationary time series analysis. Proc R Soc London Ser A 1998; 454: 903-995.
  • [15] Maryn Y, Corthals P, Van Cauwenberge P, Roy N, De Bodt M. Toward improved ecological validity in the acoustic measurement of overall voice quality: combining continuous speech and sustained vowels. J Voice 2010; 24: 540-555.
  • [16] Kacha A, Grenez F, Schoentgen J. Assessment of disordered voices using empirical mode decomposition in the log-spectral domain, In: Proceedings of Interspeech 2012; 9–13 September 2012; Portland, OR, USA. pp. 66-69.
  • [17] Kacha A, Grenez F, Schoentgen J. Multiband vocal dysperiodicities analysis using empirical mode decomposition in the log-spectral. Biomed Signal Process Control 2015; 17: 11-20.
  • [18] De Krom G. A cepstrum-based technique for determining a harmonic-to-noise ratio in speech signals. J Speech Hear Res 1993; 36: 254-266.
  • [19] Flandrin P, Rilling G, Con¸calv`es P. Empirical mode decomposition as a filter bank. IEEE Signal Proc Let 2004; 11: 112-114.
  • [20] Fant G, Liljencrants J, Lin Q. A four-parameter model of glottal flow. STL-QPSR 1985; 4: 1-13.
  • [21] Sahoo S, Routray A. A novel method of glottal inverse filtering. IEEE/ACM T Audio Speech Lang Process 2016; 24: 1230-1241.
  • [22] Alonso JB, Ferrer MA, Henr´ıquez P, L´opez-de-Ipina K, Cabrera J, Travieso CM. A study of glottal excitation synthesizers for different voice qualities. Neurocomputing 2015; 150: 367-376.
  • [23] de Corbiere S, Fresnel E, Freche C. La voix: la corde vocale et sa pathologie. Technical report, Coll`ege International ´ de M´edecine et Chirurgie de l’Hˆopital Am´ericain de Paris, 2001 (in French).
  • [24] Fu Q, Murphy P. Robust glottal source estimation based on joint source-filter model optimization. IEEE T Audio Speech Lang Process 2006; 14: 492-501.
  • [25] Parsa V, Jamieson DG. Identification of pathological voices using glottal noise measures. J Speech Lang Hear R 2000; 43: 469-485.
  • [26] Lowen R, Verschoren A. Foundations of Generic Optimization: Volume 2: Applications of Fuzzy Control, Genetic Algorithms and Neural Networks. Berlin, Germany: Springer Science & Business Media, 2008.
  • [27] Chiroma H, Abdulkareem S, Abubakar A, Zeki A, Gital AYU, Usman MJ. Correlation study of genetic algorithm operators: crossover and mutation probabilities. In: Proceedings of the International Symposium on Mathematical Sciences and Computing Research 2013; 6–7 December 2013; Perak, Malaysia. pp. 39-43.
  • [28] Patil VP, Pawar DD. The optimal crossover or mutation rates in genetic algorithm: a review. International Journal of Applied Engineering and Technology 2015; 5: 38-41.
  • [29] Drugman T, Dubuisson T, Dutoit T. Phase-based information for voice pathology detection. In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing; 22–27 May 2011; Prague, Czech Republic. pp. 4612-4615.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK