‘Listening’ To Dyslexic Children’s Reading: The Transcription And Segmentation Accuracy For ASR

Authors

  • Husniza Husni Human-Centered Computing Research Lab, School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia
  • Nik Nurhidayat Nik Him Human-Centered Computing Research Lab, School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia
  • Mohamed M. Radi Emirates Canadian University College, Umm Al Quwain, United Arab Emirates
  • Yuhanis Yusof Human-Centered Computing Research Lab, School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia
  • Siti Sakira Kamaruddin Human-Centered Computing Research Lab, School of Computing, Universiti Utara Malaysia, 06010 Sintok, Kedah, Malaysia

Keywords:

Automatic Transcription and Phonetic Labelling, Automatic Speech Recognition, Dyslexic Children Reading,

Abstract

Dyslexic children read with a lot of highly phonetically similar error that is a challenge for speech recognition (ASR). Listening to the highly phonetically similar errors are indeed difficult even for a human. To enable a computer to ‘listen’ to dyslexic children’s reading is even more challenging as we have to ‘teach’ the computers to recognize the readings as well as to adapt to the highly phonetically similar errors they make when reading. This is even more difficult when segmenting and labelling the read speech for processing prior to training an ASR. Hence, this paper presents and discusses the effects of highly phonetically similar errors on automatic transcription and segmentation accuracy and how it is somehow influenced by the spoken pronunciations. A number of 585 files of dyslexic children’s reading is used for manual transcription, force alignment, and training. The recognition of ASR engine using automatic transcription and phonetic labelling obtained an optimum result, which is with 23.9% WER and 18.1% FAR. The results are almost similar with ASR engine using manual transcription 23.7% WER and 17.9% FAR.

References

T. Athanaselis, S. Bakamidis, I. Dologlou, E. N. Argyriou, A. Symvonis, “Making assistive reading tools user friendly: A new platform for Greek dyslexic students empower by automatic speech recognition”. Multimedia Tools and Application, vol. 68, no. 3, pp. 681-699, 2014.

M. Taileb, R. Al-Saggaf, A. Al-Ghamdi, M. Al-Zebaidi, S. Al-Sahafi, “YUSR: Speech recognition software for dyslexics”. Design, User Experience, and Usability, Health, Learning, Playing, Cultural, and Cross-Cultural User Experience, vol. 8013, pp. 296-303, 2013.

J. S. Pedersen, L. B. Larsen, “A Speech Corpus for Dyslexic Reading Training,” in Proceedings of the International Conference on Language Resources and Evaluation (LREC), European Language Resources Association, pp. 2820-2823, 2010.

H. Husniza, J. Zulikha, “Dyslexic children's reading pattern as input for ASR: Data, analysis, and pronunciation model,” Journal of Information and Communication Technology, vol. 8, pp. 1-13, 2009.

X. Li, L. Deng, Y. C. Ju, A. Acero, “ Automatic children's reading tutor on hand-held devices,” in Proceedings of Annual Conference of the International Speech Communication Association, vol. 9, pp. 1733-1736, 2008.

C. Cucchiarini, H. Strik, “Automatic phonetic transcription: An overview,” in Proceedings of the International Congress of Phonetic Sciences (ICPhS), Barcelona, vol. 15, pp. 347–350, 2003.

J. P. Goldman, “EasyAlign: An automatic phonetic alignment tool under Praat,” in Proceedings of Annual Conference of the International Speech Communication Association, Folorence, vol. 12, pp. 3233-3236, 2011.

A. Dupuis, “Automatic transcription of audio files and why manual transcription may be better,” Retrieved March 23, 2015, from:http://www.researchware.com/company/blog/368-automatictranscription.html. 2011.

K. Yu, M. Gales, L. Wang, P. C. Woodland, “Unsupervised training and directed manual transcription for LVCSR,” Speech Communication, vol. 52, no. 7, pp. 652-663, 2010.

M. Dinarelli, A. Moschitti, G. Riccardi, “Concept Segmentation and Labeling for Conversational Speech,” in Proceedings of Annual Conference of the International Speech Communication Association, vol. 10, pp. 2747-2750, 2009.

T. J. Hazen, “Automatic alignment and error correction of human generated transcripts for long speech recordings,” in Proceedings of International Conference on Spoken Language Processing, Pittsburgh, vol. 9, pp. 1606-1609, 2006.

T. Bauer, L. Hitzenberger, L., Hennecle, “Effects of manual phonetic transcriptions on recognition accuracy of streetnames,” in Proceedings of the International Symposiums for Information Swissenschaft (ISI), vol. 8, pp. 21-25, 2002.

J. Yuan, N. Ryant, M. Liberman, A. Stolcke, V. Mitra, W. Wang, “Automatic phonetic segmentation using boundary models,” in Proceedings of Interspeech Annual Conference of the International Speech Communication Association, pp. 2306-2310, 2013.

H. Husniza, Y. Yuhanis, K. Siti Sakira, “Speech Malay language influence on automatic transcription and segmentation,” in Proceedings of the International Conferences on Computing and Informatics, ICOCI, Sarawak, Malaysia, vol. 4, pp. 132-137, 2013.

B. Schuppler, M. Ernestus, O. Scharenborg, L. Boves, “Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions,” Journal of Phonetics, vol. 39, no. 1, pp. 96-109, 2011.

C. Van Bael, L. Boves, H. Heuvel, H., Strik, “Automatic Phonetic Transcription of Large Speech Corpora,” Computer Speech & Language, vol. 21, no. 4, pp. 652-668, 2007.

J. P. Hosom, “A Comparison of speech recognizers created using manually-aligned and automatically-aligned training data,” Technical Report CSE-00-02, Oregon Graduate Institute of Science and Technology, Center for spoken Language Understanding, Beaverton, 2002.

F. Cangemi, F., Cutugno, B. Ludusan, D. Seppi, C. D. Van, “Automatic speech segmentation for Italian (ASSI): Tools, models, evaluation, and applications,” in Proceedings of the Associazione Italiana di Scienze della Voce (AISV), Lecce, Italy, vol. 7, pp. 337-344, 2011.

E. A. Kaur, E. T. Singh, “Segmentation of continuous Punjabi speech signal into syllables,” in Proceedings of the World Congress on Engineering and Computer Science, vol. 1, pp. 20-22, 2010.

V. Silber, N. Geri, “Can automatic speech recognition be satisfying for audio/video search? Keyword-focused analysis of Hebrew automatic and manual transcription,” Online Journal of Applied Knowledge Management, vol. 2, no. 1, pp. 104-121, 2014.

M. Sperber, “Efficient speech transcription through respeaking,” Master’s Thesis, Karlsruhe Institute of Technology Department of Computer Science, 2012.

J. D. Williams, I. D. Melamed, T. Alonso, B. Hollister, J. Wilpon, “Crowd-sourcing for difficult transcription of speech,” in Proceedings of IEEE Workshop, Automatic Speech Recognition and Understanding (ASRU), pp. 535-540, 2011.

L. J. Hieronymus, “ASCII Phonetic Symbols for the world’s Languages: Worldbet,” Bell Laboratories manuscript, 1993.

H. Husniza, “Automatic speech recognition model for dyslexic children reading in Bahasa Melayu,” Doctoral dissertation, Universiti Utara Malaysia, 2010.

M. Dinarelli, A. Moschitti, G. Riccardi, “Concept Segmentation and Labeling for Conversational Speech,” in Proceedings of Annual Conference of the International Speech Communication Association, vol. 10, pp. 2747-2750, 2009.

T. J. Hazen, “Automatic alignment and error correction of human generated transcripts for long speech recordings,” in Proceedings of International Conference on Spoken Language Processing, Pittsburgh, vol. 9, pp. 1606-1609, 2006.

D. Gibbon, “Part 1: Spoken language system and corpus design,” in Handbook of standards and resources for spoken language systems, Berlin: Mouton de Gruyter, 1997.

M. Frikha, A. B. Hamida, “A comparative survey of ANN and hybrid HMM/ANN architectures for robust speech recognition,” American Journal of Intelligent Systems, vol. 2, no. 1, pp. 1-8, 2012.

H. F. Ong, A. M. Ahmad, “Malay Language Speech Recognizer with Hybrid Hidden Markov Model and Artificial Neural Network (HMM/ANN),” International Journal of Information and Education Technology, vol. 1, no. 2, pp. 114-119, 2011.

H. Husniza, J. Zulikha, “Dyslexic children's reading pattern as input for ASR: Data, analysis, and pronunciation model,” Journal of Information and Communication Technology, vol. 8, pp. 1-13, 2009.

R. Fadhilah, R. N. Ainon, “Isolated Malay speech recognition using Hidden Markov models,” in Proceedings of the International Conferences on Computer and Communication Engineering, pp. 721-725, 2008.

H. A. Bourlard, N. Morgan, “Connectionist speech recognition: A hybrid approach,” Springer Science & Business Media, vol. 247, 2012.

J. P. Hosom, “A Comparison of speech recognizers created using manually-aligned and automatically-aligned training data,” Technical Report CSE-00-02, Oregon Graduate Institute of Science and Technology, Center for spoken Language Understanding, Beaverton, 2002.

H. Sarma, N. Saharia, U. Sharma, “Development of Assamese speech corpus and automatic transcription using HTK,” Advances in Signal Processing and Intelligent Recognition Systems, vol. 264, pp. 119-132, 2014.

F. Schiel, “Automatic phonetic transcription of non-prompted speech,” in Proceedings of International Congress on Phonetic Science, pp. 607-610, 1999.

S. Rapp, “Automatic phonemic transcription and linguistic annotation from known text with Hidden Markov Models: An Aligner for German,” 1995.

M. Melby-Lervåg, S.-A.H. Lyster, C. Hulme, “Phonological skills and their role in learning to read: a meta-analytic review,” Psychology Bulletin, vol. 138, pp. 322-352, 2012.

F.R. Vellutino, J.M. Fletcher, M.J. Snowling, D.M. Scanlon, “Specific reading disability (dyslexia): what have we learned in the past four decades?” Journal of Child Psychology & Psychiatry, vol. 45, pp. 2–40, 2004.

F. Morkena, T. Hellanda, K. Hugdahl, K. Spechta, “Reading in dyslexia across literacy development: A longitudinal study of effective connectivity,” Neuroimage, vol. 144, pp. 92-100, 2017.

L. Caroline, L. Cupple, “Thinking outside the boxes: Using current reading models to assess and treat developmental surface dyslexia,” Journal of Neuropsychological Rehabilitation, vol. 27, pp. 149-195, 2017.

T. Baumann, C. Kennington, J. Hough, D. Schlangen, “Recognising conversational speech: What an incremental ASR should do for a dialogue system and how to get there,” Lecture Notes in Electrical Engineering, vol. 999, pp 421-432, 2016.

J. Kennedy, S. Lemaignan, C. Montassier, P. Lavalade, B. Irfan, F. Papadopoulos, E. Senft, T. Belpaeme, "Child Speech Recognition in Human-Robot Interaction: Evaluations and Recommendations," 2016.

T. Athanaselis, S. Bakamidis, I. Dologlou, E. N. Argyriou, A. Symvonis, “Making assistive reading tools user friendly: A new platform for Greek dyslexic students empowered by automatic speech recognition,” Multimedia Tools and Applications, vol. 68, issue 3, pp. 681-699, 2014.

H. Husni, Z. Jamaludin, “ASR technology for children with dyslexia: Enabling immediate intervention to support reading in Bahasa Melayu,” US-China Education Review, vol. 6, no. 6, pp. 64-70, 2009.

H. Husni, Y. Yusof, S. S. Kamaruddin, , “Spoken Malay language influence on automatic transcription and segmentation,” in Proceedings of the International Conference on Computing and Informatics, ICOCI, 2013.

Downloads

Published

2017-09-15

How to Cite

Husni, H., Nik Him, N. N., M. Radi, M., Yusof, Y., & Kamaruddin, S. S. (2017). ‘Listening’ To Dyslexic Children’s Reading: The Transcription And Segmentation Accuracy For ASR. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 9(2-11), 45–49. Retrieved from https://jtec.utem.edu.my/jtec/article/view/2736

Most read articles by the same author(s)