‘Listening’ To Dyslexic Children’s Reading: The Transcription And Segmentation Accuracy For ASR
Keywords:
Automatic Transcription and Phonetic Labelling, Automatic Speech Recognition, Dyslexic Children Reading,Abstract
Dyslexic children read with a lot of highly phonetically similar error that is a challenge for speech recognition (ASR). Listening to the highly phonetically similar errors are indeed difficult even for a human. To enable a computer to ‘listen’ to dyslexic children’s reading is even more challenging as we have to ‘teach’ the computers to recognize the readings as well as to adapt to the highly phonetically similar errors they make when reading. This is even more difficult when segmenting and labelling the read speech for processing prior to training an ASR. Hence, this paper presents and discusses the effects of highly phonetically similar errors on automatic transcription and segmentation accuracy and how it is somehow influenced by the spoken pronunciations. A number of 585 files of dyslexic children’s reading is used for manual transcription, force alignment, and training. The recognition of ASR engine using automatic transcription and phonetic labelling obtained an optimum result, which is with 23.9% WER and 18.1% FAR. The results are almost similar with ASR engine using manual transcription 23.7% WER and 17.9% FAR.References
T. Athanaselis, S. Bakamidis, I. Dologlou, E. N. Argyriou, A. Symvonis, “Making assistive reading tools user friendly: A new platform for Greek dyslexic students empower by automatic speech recognition”. Multimedia Tools and Application, vol. 68, no. 3, pp. 681-699, 2014.
M. Taileb, R. Al-Saggaf, A. Al-Ghamdi, M. Al-Zebaidi, S. Al-Sahafi, “YUSR: Speech recognition software for dyslexics”. Design, User Experience, and Usability, Health, Learning, Playing, Cultural, and Cross-Cultural User Experience, vol. 8013, pp. 296-303, 2013.
J. S. Pedersen, L. B. Larsen, “A Speech Corpus for Dyslexic Reading Training,” in Proceedings of the International Conference on Language Resources and Evaluation (LREC), European Language Resources Association, pp. 2820-2823, 2010.
H. Husniza, J. Zulikha, “Dyslexic children's reading pattern as input for ASR: Data, analysis, and pronunciation model,” Journal of Information and Communication Technology, vol. 8, pp. 1-13, 2009.
X. Li, L. Deng, Y. C. Ju, A. Acero, “ Automatic children's reading tutor on hand-held devices,” in Proceedings of Annual Conference of the International Speech Communication Association, vol. 9, pp. 1733-1736, 2008.
C. Cucchiarini, H. Strik, “Automatic phonetic transcription: An overview,” in Proceedings of the International Congress of Phonetic Sciences (ICPhS), Barcelona, vol. 15, pp. 347–350, 2003.
J. P. Goldman, “EasyAlign: An automatic phonetic alignment tool under Praat,” in Proceedings of Annual Conference of the International Speech Communication Association, Folorence, vol. 12, pp. 3233-3236, 2011.
A. Dupuis, “Automatic transcription of audio files and why manual transcription may be better,” Retrieved March 23, 2015, from:http://www.researchware.com/company/blog/368-automatictranscription.html. 2011.
K. Yu, M. Gales, L. Wang, P. C. Woodland, “Unsupervised training and directed manual transcription for LVCSR,” Speech Communication, vol. 52, no. 7, pp. 652-663, 2010.
M. Dinarelli, A. Moschitti, G. Riccardi, “Concept Segmentation and Labeling for Conversational Speech,” in Proceedings of Annual Conference of the International Speech Communication Association, vol. 10, pp. 2747-2750, 2009.
T. J. Hazen, “Automatic alignment and error correction of human generated transcripts for long speech recordings,” in Proceedings of International Conference on Spoken Language Processing, Pittsburgh, vol. 9, pp. 1606-1609, 2006.
T. Bauer, L. Hitzenberger, L., Hennecle, “Effects of manual phonetic transcriptions on recognition accuracy of streetnames,” in Proceedings of the International Symposiums for Information Swissenschaft (ISI), vol. 8, pp. 21-25, 2002.
J. Yuan, N. Ryant, M. Liberman, A. Stolcke, V. Mitra, W. Wang, “Automatic phonetic segmentation using boundary models,” in Proceedings of Interspeech Annual Conference of the International Speech Communication Association, pp. 2306-2310, 2013.
H. Husniza, Y. Yuhanis, K. Siti Sakira, “Speech Malay language influence on automatic transcription and segmentation,” in Proceedings of the International Conferences on Computing and Informatics, ICOCI, Sarawak, Malaysia, vol. 4, pp. 132-137, 2013.
B. Schuppler, M. Ernestus, O. Scharenborg, L. Boves, “Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions,” Journal of Phonetics, vol. 39, no. 1, pp. 96-109, 2011.
C. Van Bael, L. Boves, H. Heuvel, H., Strik, “Automatic Phonetic Transcription of Large Speech Corpora,” Computer Speech & Language, vol. 21, no. 4, pp. 652-668, 2007.
J. P. Hosom, “A Comparison of speech recognizers created using manually-aligned and automatically-aligned training data,” Technical Report CSE-00-02, Oregon Graduate Institute of Science and Technology, Center for spoken Language Understanding, Beaverton, 2002.
F. Cangemi, F., Cutugno, B. Ludusan, D. Seppi, C. D. Van, “Automatic speech segmentation for Italian (ASSI): Tools, models, evaluation, and applications,” in Proceedings of the Associazione Italiana di Scienze della Voce (AISV), Lecce, Italy, vol. 7, pp. 337-344, 2011.
E. A. Kaur, E. T. Singh, “Segmentation of continuous Punjabi speech signal into syllables,” in Proceedings of the World Congress on Engineering and Computer Science, vol. 1, pp. 20-22, 2010.
V. Silber, N. Geri, “Can automatic speech recognition be satisfying for audio/video search? Keyword-focused analysis of Hebrew automatic and manual transcription,” Online Journal of Applied Knowledge Management, vol. 2, no. 1, pp. 104-121, 2014.
M. Sperber, “Efficient speech transcription through respeaking,” Master’s Thesis, Karlsruhe Institute of Technology Department of Computer Science, 2012.
J. D. Williams, I. D. Melamed, T. Alonso, B. Hollister, J. Wilpon, “Crowd-sourcing for difficult transcription of speech,” in Proceedings of IEEE Workshop, Automatic Speech Recognition and Understanding (ASRU), pp. 535-540, 2011.
L. J. Hieronymus, “ASCII Phonetic Symbols for the world’s Languages: Worldbet,” Bell Laboratories manuscript, 1993.
H. Husniza, “Automatic speech recognition model for dyslexic children reading in Bahasa Melayu,” Doctoral dissertation, Universiti Utara Malaysia, 2010.
M. Dinarelli, A. Moschitti, G. Riccardi, “Concept Segmentation and Labeling for Conversational Speech,” in Proceedings of Annual Conference of the International Speech Communication Association, vol. 10, pp. 2747-2750, 2009.
T. J. Hazen, “Automatic alignment and error correction of human generated transcripts for long speech recordings,” in Proceedings of International Conference on Spoken Language Processing, Pittsburgh, vol. 9, pp. 1606-1609, 2006.
D. Gibbon, “Part 1: Spoken language system and corpus design,” in Handbook of standards and resources for spoken language systems, Berlin: Mouton de Gruyter, 1997.
M. Frikha, A. B. Hamida, “A comparative survey of ANN and hybrid HMM/ANN architectures for robust speech recognition,” American Journal of Intelligent Systems, vol. 2, no. 1, pp. 1-8, 2012.
H. F. Ong, A. M. Ahmad, “Malay Language Speech Recognizer with Hybrid Hidden Markov Model and Artificial Neural Network (HMM/ANN),” International Journal of Information and Education Technology, vol. 1, no. 2, pp. 114-119, 2011.
H. Husniza, J. Zulikha, “Dyslexic children's reading pattern as input for ASR: Data, analysis, and pronunciation model,” Journal of Information and Communication Technology, vol. 8, pp. 1-13, 2009.
R. Fadhilah, R. N. Ainon, “Isolated Malay speech recognition using Hidden Markov models,” in Proceedings of the International Conferences on Computer and Communication Engineering, pp. 721-725, 2008.
H. A. Bourlard, N. Morgan, “Connectionist speech recognition: A hybrid approach,” Springer Science & Business Media, vol. 247, 2012.
J. P. Hosom, “A Comparison of speech recognizers created using manually-aligned and automatically-aligned training data,” Technical Report CSE-00-02, Oregon Graduate Institute of Science and Technology, Center for spoken Language Understanding, Beaverton, 2002.
H. Sarma, N. Saharia, U. Sharma, “Development of Assamese speech corpus and automatic transcription using HTK,” Advances in Signal Processing and Intelligent Recognition Systems, vol. 264, pp. 119-132, 2014.
F. Schiel, “Automatic phonetic transcription of non-prompted speech,” in Proceedings of International Congress on Phonetic Science, pp. 607-610, 1999.
S. Rapp, “Automatic phonemic transcription and linguistic annotation from known text with Hidden Markov Models: An Aligner for German,” 1995.
M. Melby-Lervåg, S.-A.H. Lyster, C. Hulme, “Phonological skills and their role in learning to read: a meta-analytic review,” Psychology Bulletin, vol. 138, pp. 322-352, 2012.
F.R. Vellutino, J.M. Fletcher, M.J. Snowling, D.M. Scanlon, “Specific reading disability (dyslexia): what have we learned in the past four decades?” Journal of Child Psychology & Psychiatry, vol. 45, pp. 2–40, 2004.
F. Morkena, T. Hellanda, K. Hugdahl, K. Spechta, “Reading in dyslexia across literacy development: A longitudinal study of effective connectivity,” Neuroimage, vol. 144, pp. 92-100, 2017.
L. Caroline, L. Cupple, “Thinking outside the boxes: Using current reading models to assess and treat developmental surface dyslexia,” Journal of Neuropsychological Rehabilitation, vol. 27, pp. 149-195, 2017.
T. Baumann, C. Kennington, J. Hough, D. Schlangen, “Recognising conversational speech: What an incremental ASR should do for a dialogue system and how to get there,” Lecture Notes in Electrical Engineering, vol. 999, pp 421-432, 2016.
J. Kennedy, S. Lemaignan, C. Montassier, P. Lavalade, B. Irfan, F. Papadopoulos, E. Senft, T. Belpaeme, "Child Speech Recognition in Human-Robot Interaction: Evaluations and Recommendations," 2016.
T. Athanaselis, S. Bakamidis, I. Dologlou, E. N. Argyriou, A. Symvonis, “Making assistive reading tools user friendly: A new platform for Greek dyslexic students empowered by automatic speech recognition,” Multimedia Tools and Applications, vol. 68, issue 3, pp. 681-699, 2014.
H. Husni, Z. Jamaludin, “ASR technology for children with dyslexia: Enabling immediate intervention to support reading in Bahasa Melayu,” US-China Education Review, vol. 6, no. 6, pp. 64-70, 2009.
H. Husni, Y. Yusof, S. S. Kamaruddin, , “Spoken Malay language influence on automatic transcription and segmentation,” in Proceedings of the International Conference on Computing and Informatics, ICOCI, 2013.
Downloads
Published
How to Cite
Issue
Section
License
TRANSFER OF COPYRIGHT AGREEMENT
The manuscript is herewith submitted for publication in the Journal of Telecommunication, Electronic and Computer Engineering (JTEC). It has not been published before, and it is not under consideration for publication in any other journals. It contains no material that is scandalous, obscene, libelous or otherwise contrary to law. When the manuscript is accepted for publication, I, as the author, hereby agree to transfer to JTEC, all rights including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author(s) specifically retain(s):
- All proprietary right other than copyright, such as patent rights
- The right to make further copies of all or part of the published article for my use in classroom teaching
- The right to reuse all or part of this manuscript in a compilation of my own works or in a textbook of which I am the author; and
- The right to make copies of the published work for internal distribution within the institution that employs me
I agree that copies made under these circumstances will continue to carry the copyright notice that appears in the original published work. I agree to inform my co-authors, if any, of the above terms. I certify that I have obtained written permission for the use of text, tables, and/or illustrations from any copyrighted source(s), and I agree to supply such written permission(s) to JTEC upon request.