Mobile Application for Improving Speech and Text Data Collection Approach


  • Sarah Samson Juan Institute of Social Informatics and Technological Innovations, Universiti Malaysia Sarawak, Sarawak, MALAYSIA.
  • Jennifer Fiona Wilfred Busu Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, Sarawak, MALAYSIA.


—Mobile Application, Data Collection Tools, Corpus Development,


This paper describes our work in developing a mobile application for collecting language speech and text data. The application is built to assist linguists or researchers in simplifying their tasks in data collection who of native speakers living in remote interiors. Researchers rely on numerous apparatus to carry out their tasks to capture audio or text from far to reach places, but with this mobile application, they would only need to carry one device, which can ease their logistics troubles. The mobile app, named as Kalaka, is designed for users to store details of native speakers, record speech and insert speech transcripts all in one platform. Kalaka is built on the Android platform, which allows data stored in the mobile device to be transferred to a cloud storage using WiFi networks. Usability tests performed in respondents shows, all participants in the evaluation are able to use the application to record their voices and save texts. We also received positive feedbacks on the mobile application from our survey, with more than half of the respondents gave their confidence using Kalaka and they would use the system frequently.


L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” In Conference Proceedings of IEEE, vol. 77, pp. 257-286, 1989.

T. Schultz, GlobalPhone: A multilingual speech and text database developed at Karlsruhe University, pp. 345-348, 2002.

L. Besacier, E. Barnard, A. Karpov, and T. Schultz, “Automatic speech recognition for under-resourced Languages: A Survey,” Speech Communication Journal, vol. 56, pp. 85-100, Jan. 2014.

S. Juan, Exploiting resources from closely-related languages for automatic speech recognition system for low-resource languages from Malaysia, Grenoble, France: Université Grenoble-Alpes, 2015.

S. S. Juan, L. Besacier, B. Lecouteux, and M. Dyab, “Using resources from a closely-related language to develop ASR for a very underresourced language: A case study for Iban,” In INTERSPEECH, Dresden, Germany, 2015.

G. Boulianne, L. Burget, A. Ghoshal, O. Glembek, N. Goel, M. Hannemann, P. Motlı́ček, D. Povey, Y. Qian, P. Schwarz, J. Silovský, G. Stemmer, and K. Veselý, “The Kaldi speech recognition toolkit,” In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Hawaii, 2011.

E. Albright and J. Hatton, “Wesay, a tool for engaging communities in dictionary building,” In V. D. Rau and M. Florey, eds., Language Documentation and Conservation Special Publication No. 1: Documenting and Revitalizing Austronesian Languages, p. 189201. University of Hawaii Press, 2008. Available at:

Taiwan Indigenous Council, Aboriginal Ethnic Language Dictionary, 2016:

M. Bettinson and S. Bird, “Developing a suite of mobile applications for collaborative language documentation,” In Workshop on the Use of Computational Methods in the Study of Endangered Languages, Honolulu, 2017. Available at:

S. Bird, F. R. Hanke, O. Adams, H. Lee, “Aikuma: A mobile app for collaborative language documentation,” In Workshop on the Use of Computational Methods in the Study of Endangered Languages, pp. 1- 5, Baltimore, USA, 2014.




How to Cite

Juan, S. S., & Busu, J. F. W. (2017). Mobile Application for Improving Speech and Text Data Collection Approach. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 9(3-11), 79–83. Retrieved from