Chong Yen FOOK, Hariharan MUTHUSAMY, Lim Sin CHEE, Sazali Bin YAACOB, Abdul Hamid Bin ADOM

Comparison of speech parameterization techniques for the classification of speech disfluencies

Stuttering assessment through the manual classification of speech disfluencies is subjective, inconsistent, time-consuming, and prone to error. The aim of this paper is to compare the effectiveness of the 3 speech feature extraction methods, mel-frequency cepstral coefficients, linear predictive coding (LPC)-based cepstral parameters, and perceptual linear predictive (PLP) analysis, for classifying 2 types of speech disfluencies, repetition and prolongation, from recorded disfluent speech samples. Three different classifiers, the k-nearest neighbor classifier, linear discriminant analysis-based classifier, and support vector machine, are employed for the classification of speech disfluencies. Speech samples are taken from the University College London Archive of Stuttered Speech and stuttered events are identified through manual segmentation. A 10-fold cross-validation method is used for testing the reliability of the classifier results. The effect of the 2 parameters (LPC order and frame length) in the LPC- and PLP-based methods on the classification results is also investigated. The experimental results reveal that the proposed method can be used to help speech language pathologists in classifying speech disfluencies.

Anahtar Kelimeler:

Disfluent speech, mel-frequency cepstral coefficient, linear predictive coding, perceptual linear predictive analysis, support vector machine

Comparison of speech parameterization techniques for the classification of speech disfluencies

Keywords:

Disfluent speech, mel-frequency cepstral coefficient, linear predictive coding, perceptual linear predictive analysis, support vector machine,

PDF

___

S.S. Awad, “The application of digital speech processing to stuttering therapy”, IEEE Transactions on Instrumentation and Measurement, Vol. 2, pp. 1361–1367, 1997.
J. Van Borsel, E. Achten, P. Santens, P. Lahorte, T. Voet, “fMRI of developmental stuttering: a pilot study”, Brain and Language, Vol. 85, pp. 369–376, 2003.
P. Howell, S. Sackin, “Automatic recognition of repetitions and prolongations in stuttered speech”, Proceedings of First World Congress on Fluency Disorders, pp. 372–374, 1995.
P. Howell, S. Sackin, K. Glenn, “Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: I. Psychometric procedures appropriate for selection of training material for lexical dysfluency classifiers”, Journal of Speech, Language, and Hearing Research, Vol. 40, pp. 1073–1084, 1997.
P. Howell, S. Sackin, K. Glenn, “Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: II. ANN recognition of repetitions and prolongations with supplied word segment markers”, Journal of Speech, Language, and Hearing Research, Vol. 40, pp. 1085–1096, 1997.
E. N¨ oth, H. Niemann, T. Haderlein, M. Decher, U. Eysholdt, F. Rosanowski, T. Wittenberg, “Automatic stuttering recognition using hidden Markov models”, Proceedings of the International Conference on Spoken Language Processing, Vol. 4, pp. 65–68, 2000.
A. Ben-Hur, D. Horn, H.T. Siegelmann, V. Vapnik, “A support vector clustering method”, Proceedings of the 15th International Conference on Pattern Recognition, Vol. 2, pp. 2724–2727, 2000.
Y.V. Geetha, K. Pratibha, R. Ashok, S.K. Ravindra, “Classification of childhood disfluencies using neural networks”, Journal of Fluency Disorders, Vol. 25, pp. 99–117, 2000.
P. Howell, S. Davis, J. Bartrip, “The UCLASS archive of stuttered speech”, Journal of Speech, Language, and Hearing Research, Vol. 52, pp. 556–569, 2009.
A. Czyzewski, A. Kaczmarek, B. Kostek, “Intelligent processing of stuttered speech”, Intelligent Information Systems, Vol. 21, pp. 143–171, 2003.
B. Prakash, “Acoustic measures in the speech of children with stuttering and normal non fluency - a key to differential diagnosis”, Proceedings of the workshop on Spoken Language Processing, pp. 49–57, 2003.
I. Szczurowska, W. Kuniszyk-Jozkowiak, E. Smolka, “The application of Kohonen and multilayer perceptron networks in the speech nonfluency analysis”, Archives Acoustics, Vol. 31, pp. 205–210, 2006.
M. Wisniewski, W. Kuniszyk-J´ o´ zkowiak, E. Smolka, W. Suszynski, “Automatic detection of disorders in a continuous speech with the hidden Markov models approach”, Proceedings of Computer Recognition Systems 2, Vol. 45, pp. 445–453, 2008.
M. Wisniewski, W. Kuniszyk-J´ o´ zkowiak, E. Smolka, W. Suszynski, ”Automatic detection of prolonged fricative phonemes with the hidden Markov models approach”, Journal of Medical Informatics & Technologies, Vol. 11, pp. 293–298, 2007.
T.S. Tan, Helbin-Liboh, A.K. Ariff, C.M. Ting, S.H. Salleh, “Application of Malay speech technology in Malay speech therapy assistance tools”, Proceedings of IEEE Conference on Intelligent and Advanced Systems, pp. 330– 334, 2007.
K. Ravikumar, B. Reddy, R. Rajagopal, H. Nagaraj, “Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies”, Proceedings of World Academy Science, Engineering and Technology, pp. 270–273, 2008.
I. Swietlicka, W. Kuniszyk-J´ o´ zkowiak, E. Smolka, “Artificial neural networks in the disabled speech analysis”, Proceedings of Computer Recognition System 3, Vol. 57, pp. 347–354, 2009.
K.M. Ravikumar, R. Rajagopal, H.C. Nagaraj, “An approach for objective assessment of stuttered speech using MFCC features”, ICGST International Journal on Digital Signal Processing, Vol. 9, pp. 19–24, 2009.
L. Sin Chee, O. Chia Ai, M. Hariharan, S. Yaacob, “MFCC based recognition of repetitions and prolongations in stuttered speech using k-NN and LDA”, Proceedings of IEEE Student Conference on Research and Development, pp. 146–149, 2009.
L. Sin Chee, O. Chia Ai, M. Hariharan, S. Yaacob, “Automatic detection of prolongations and repetitions using LPCC”, Proceedings of IEEE International Conference on Technical Postgraduates, pp. 1–4, 2009.
M. Hariharan, L. Sin Chee, S. Yaacob, “Classification of speech dysfluencies using LPC based parameterization techniques”, Journal of Medical Systems, Vol. 36, pp. 1821–1830, 2012.
O. Chia Ai, M. Hariharan, S. Yaacob, L. Sin Chee, “Classification of speech dysfluencies with MFCC and LPCC features”, Expert Systems with Applications, Vol. 39, pp. 2157–2165, 2012.
P. Howell, M. Huckvale, “Facilities to assist people to research into stammered speech”, Stammering Research, Vol. 1, pp. 130–242, 2004.
X. Huang, A. Acero, H. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Upper Saddle River, NJ, USA, Prentice Hall, 2001.
L. Rabiner, B. Juang, Fundamentals of Speech Recognition, Upper Saddle River, NJ, USA, Prentice Hall, 1993.
M. Hariharan, L. Sin Chee, S. Yaacob, “Analysis of infant cry through weighted linear prediction cepstral coefficients and probabilistic neural network”, Journal of Medical Systems, Vol. 36, pp. 1309–1315, 2012.
H.N. Ting, J. Yunus, S. Salleh, “Speaker-independent Malay syllable recognition using singular and modular neural networks”, Jurnal Teknologi, Vol. 35, pp. 65–76, 2001.
H. Hermansky, “Perceptual linear predictive (PLP) analysis of speech”, Journal of the Acoustical Society of America, Vol. 87, pp. 1738–1752, 1990.
K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed., San Diego, CA, USA, Academic Press, 1990. R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, 2nd ed., New York, NY, USA, Wiley Interscience, 2000. K. De Brabanter, P. Karsmakers, F. Ojeda, C. Alzate, J. De Brabanter, K. Pelckmans, B. De Moor, J. Vandewalle, J.A.K. Suykens, LS-SVM Lab Toolbox User’s Guide, Leuven, Belgium, Katholieke Universiteit Leuven, 2010.
J. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, J. Vandewalle, Least Squares Support Vector Machines, Singapore, World Scientific, 2002.
K.M. Ravikumar, S. Ganesan, “Comparison of multidimensional MFCC feature vectors for objective assessment of stuttered disfluencies”, International Journal of Advanced Networking and Applications, Vol. 2, pp. 854–860, 2011.