Review of distinctive phonetic features and the Arabic share in related modern research

Most research in the field of digital speech technology has traditionally been conducted in only a few languages, such as English, French, Spanish, or Chinese. Numerous studies using distinctive phonetic features (DPFs) with different techniques and algorithms have been carried out during the last 3 decades, mainly in English, Japanese, and other languages of industrialized countries. DPF elements are based on a technique used by linguists and digital speech and language experts to distinguish between different phones by considering the lowest level of actual features during phonation. These studies have investigated the best performances, outcomes, and theories, especially those regarding digital speech recognition. The aim of this paper is to present the background of DPF theories and the usefulness thereof for digital speech and language processing. In addition, we highlight the background of Arabic language phonology compared to 2 well-known languages to enhance the current knowledge about this narrow language discipline. Finally, this work reviews the research dealing with DPF strategies for digital speech and language processing using computing and engineering techniques and theories. Based on the literature search conducted for this paper, we conclude that although the Arabic language is a very important and old Semitic language, hitherto it has suffered from a lack of modern research resources and theories on DPF elements.

Review of distinctive phonetic features and the Arabic share in related modern research

Most research in the field of digital speech technology has traditionally been conducted in only a few languages, such as English, French, Spanish, or Chinese. Numerous studies using distinctive phonetic features (DPFs) with different techniques and algorithms have been carried out during the last 3 decades, mainly in English, Japanese, and other languages of industrialized countries. DPF elements are based on a technique used by linguists and digital speech and language experts to distinguish between different phones by considering the lowest level of actual features during phonation. These studies have investigated the best performances, outcomes, and theories, especially those regarding digital speech recognition. The aim of this paper is to present the background of DPF theories and the usefulness thereof for digital speech and language processing. In addition, we highlight the background of Arabic language phonology compared to 2 well-known languages to enhance the current knowledge about this narrow language discipline. Finally, this work reviews the research dealing with DPF strategies for digital speech and language processing using computing and engineering techniques and theories. Based on the literature search conducted for this paper, we conclude that although the Arabic language is a very important and old Semitic language, hitherto it has suffered from a lack of modern research resources and theories on DPF elements.

___

  • Y.A. Alotaibi, A.H. Meftah, “Comparative evaluation of two Arabic speech corpora” Natural Language Processing and Knowledge Engineering International Conference, pp. 1–5, 2010.
  • F. Biadsy, J. Hirschberg, N. Habash, “Spoken Arabic dialect identification using phonotactic modeling”, Proceedings of the European Association for Computational Linguistics, Workshop on Computational Approaches to Semitic Languages, pp. 53–61, 2009.
  • J. Deller, J. Proakis, J.H. Hansen, Discrete-Time Processing of Speech Signal, London, Macmillan Publishers, 1993. M. Alkhouli, Alaswaat Alaghawaiyah (Linguistic Phonetics), Daar Alfalah, Jordan, 1990 (in Arabic).
  • M. Alghamdi, Arabic Phonetics, Al-Toubah Bookshop, Riyadh, 2001 (in Arabic).
  • K. Kirchhoff, J. Bilmes, S. Das, N. Duta, M. Egan, J. Gang, H. Feng, J. Henderson, L. Daben, M. Noamany, P. Schone, R. Schwartz, D. Vergyri, “Novel approaches to Arabic speech recognition: report from the 2002 JohnsHopkins summer workshop”, IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. 344–347, 2003.
  • K. Ohata, “Phonological differences between Japanese and English: several potentially problematic areas of pronunciation for Japanese ESL/EFL learners”, Asian English for Specific Purposes Journal, Vol. 6, 2004.
  • P. Ladefoged, Vowels and Consonants, Second Edition, Oxford, Blackwell Publishing, 2005.
  • N. Chomsky, M. Halle, The Sound Pattern of English, Massachusetts, MIT Press, 1991.
  • T. Fukuda, W. Yamamoto, T. Nitta, “Distinctive phonetic feature extraction for robust speech recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing Vol. 2, pp. II - 25–28, 2003.
  • T. Nitta, “Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA”, International Conference on Acoustics, Speech, and Signal Processing Vol. 1, pp. 421–424, 1999.
  • Y.A. El-Imam, “An unrestricted vocabulary Arabic speech synthesis system”, IEEE Transactions on Acoustic, Speech, and Signal Processing, Vol. 37, pp. 1829–1845, 1989.
  • R. Jakobson, G.M. Fant, M. Halle, Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates, Massachusetts, MIT Press, 1963.
  • E. Eide, “Distinctive features for use in an automatic speech recognition system”, European Conference on Speech Communication and Technology, Vol. 3, pp. 1613–1616, 2001.
  • M.N Huda, M. Ghulam, J. Horikawa, T. Nitta, “Distinctive phonetic feature (DPF) based phone segmentation using hybrid neural networks”, Proceedings of the 8th Annual Conference of the International Speech Communication Association, pp. 94–97, 2007.
  • M.N. Huda, M. Ghulam, T. Nitta, “DPF based phonetic segmentation using recurrent neural networks”, Autumn Meeting of Astronomical Society of Japan, pp. 3–4, 2006.
  • M. Alghamdi, Arabic Phonetics and Phonology, forthcoming. S. Selouani, J. Caelen, “Spotting Arabic phonetic features using modular connectionist architectures and a rulebased system”, Proceedings of the International ICSC/IFAC Symposium on Neural Computation, pp. 404–411,1998. S. Selouani, J. Caelen, “Arabic phonetic features recognition using modular connectionist architectures”, IEEE 4th Workshop, Interactive Voice Technology for Telecommunications Applications, pp. 155–160, 1998.
  • S. King, P. Taylor, “Detection of phonological features in continuous speech using neural networks”, Computer Speech and Language, Vol. 14, pp. 333–345, 2000.
  • B. Launay, O. Siohan, A. Surendran, C. Leet, “Towards knowledge-based features for HMM based large vocabulary automatic speech recognition”, IEEE International Conference on Acoustics, Speech, and Signal Processing pp. I-817–I-820, Vol. 1, 2002.
  • T. Fukuda, T. Nitta, “Canonicalization of feature parameters for automatic speech recognition”, International Conference on Spoken Language Processing Vol. 4, pp. 2537–2540, 2004.
  • M.N Huda, M. Ghulam, K. Katsurada, Y. Iribe, T. Nitta, “Distinctive phonetic feature (DPF) based phone segmentation using 2-stage multilayer neural networks”, The Research Institute of Signal Processing, International Workshop on Nonlinear Circuits and Signal Processing pp. 325–328, 2007.
  • M.N Huda, K. Katsurada, T. Nitta, “Phoneme recognition based on hybrid neural networks with inhibition/enhancement of distinctive phonetic feature (DPF) trajectories”, Proceedings of the 9th Annual Conference of the International Speech Communication Association, pp. 1529–1532, 2008.
  • D. Yu, S.M. Siniscalchi, L. Deng, CH. Lee, “Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition”, International Conference on Acoustics, Speech, and Signal Processing pp. 4169–4172, 2012.
  • H. Tolba, S. Selouani, D. O’Shaughnessy, “Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm”, International Conference on Acoustics, Speech, and Signal Processing, Vol. 1, pp. I-837–I-840, 2002.
  • H. Tolba, S.A. Selouani, D. O’Shaughnessy, “Comparative experiments to evaluate the use of auditory-based acoustic distinctive features and formant cues for automatic speech recognition using a multi-stream paradigm”, Proceeding of the 7th International Conference on Spoken Language Processing, pp. 113–2116, 2002.
  • T. Fukuda, T. Nitta, “Noise-robust ASR by using distinctive phonetic features approximated with logarithmic normal distribution of HMM”, European Conference on Speech Communication and Technology, Vol. 3, pp. 2185– 2188, 2003.
  • S. Selouani, H. Tolba, D. O’Shaughnessy, “Auditory-based acoustic distinctive features and spectral cues for robust automatic speech recognition in low-SNR car environments”, Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Companion Volume of the Proceedings of HLT-NAACL, Vol. 2, pp. 91–93, 2003.
  • S. St¨ uker, T. Schultz, F. Metze, A. Waibel, “Multilingual articulatory features”, IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, Vol. 1, pp. I-144–I-147, 2003.
  • T. Fukuda, T. Nitta, “Designing multiple distinctive phonetic feature extractors for canonicalization by using clustering technique”, European Conference on Speech Communication and Technology, pp. 3141–3144, 2005.
  • M.N Huda, M. Ghulam, T. Fukuda, K. Katsurada, T. Nitta, “Canonicalization of feature parameters for robust speech recognition based on distinctive phonetic feature (DPF) vectors”, The Institute of Electronics, Information and Communication Engineers Journal, Vol. E91–D, pp. 488–498, 2008.
  • CP. Chen, YC. Huang, CH. Wu, KD Lee, “Cross-lingual frame selection method for polyglot speech synthesis”, International Conference on Acoustics, Speech, and Signal Processing pp. 4521–4524, 2012.
  • CH. Wu, HP. Shen, YT. Yang, “Phone set construction based on context-sensitive articulatory attributes for codeswitching speech recognition”, International Conference on Acoustics, Speech, and Signal Processing pp. 4865–4868, 20