Enhanced Affixation Word Stemmer with Stemming Error Reducer to Solve Affxation Stemming Errors

Authors

  • Mohamad Nizam Kassim Strategic Research, CyberSecurity Malaysia, The Mines Resort City, 43300 Seri Kembangan, Malaysia.
  • Mohd Aizaini Maarof Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai Johore, Malaysia
  • Anazida Zainal Faculty of Computing, Universiti Teknologi Malaysia, 81310 Skudai Johore, Malaysia
  • Amirudin Abdul Wahab Strategic Research, CyberSecurity Malaysia, The Mines Resort City, 43300 Seri Kembangan, Malaysia.

Abstract

Word stemming algorithm (or word stemmer) is an important preprocessing component in the information retrieval and text categorization that aims to reduce derived words to their respective root words. Most of the existing Malay word stemmers adopt rule-based affixes removal method and dictionary lookup to stem affixation words. Despite of many stemming approaches have been proposed in the past research, the existing Malay word stemmers still suffer from affixation stemming errors due to the complexity of Malay morphology. These stemming errors can be classified into over stemming, under stemming, unstem, and special variations and exceptions. Hence this paper presents the enhanced affixation word stemmer that aims to solve these stemming errors. This paper also examined the root causes of these stemming errors in the existing Malay stemmers. The experimental results indicate that the enhanced word stemmerable to stem prefixation, suffixation, confixation and infixation wordswith better stemming accuracy by using enhanced Rule Application Order and Stemming Errors Reducer.

References

Abdullah, M. T., Ahmad, F., Mahmod, R., Sembok, T. M. T., “Rules frequency order stemmer for Malay language,” International Journal of Computer Science and Network Security (IJCSNS), vol. 9, no. 2, pp. 433-438, 2009.

Ahmad, F., Yusoff, M., Sembok, T. M., “Experiments with a Stemming Algorithm for Ma-lay Words,” Journal of the American Society for Information Science, vol. 47, no. 12, pp.909-918, 2009.

Alfred, R., Leong, L. C., On, C. K., Anthony, P.,“A Literature Review and Discussion of Malay Rule-Based Affix Elimination Algorithms,”The 8th International Conference on Knowledge Management in Organizations, pp. 285-297, 2014.

Darwis, S. A., Abdullah, R., Idris, N., “Exhaustive Affix Stripping And A Malay Word Register To Solve Stemming Errors And Ambiguity Problem In Malay Stem-mers,” Malaysian Journal of Computer Science, 2012.

Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., Dhalila, M. S. S., “Simple Rules Malay Stemmer,” The International Conference on Informatics and Applications (ICIA2012), pp. 28-35, 2012.

Hassan, A., “ Morfologi,” Vol. 13, 2006.

Idris, N., Syed, S. M. F. D., “Stemming for Term Conflation in Malay Texts,” International Conference on Artificial Intelligence, 2001.

Kassim, M. N., Maarof, M. A., Zainal, A., “Enhanced Rules Application Order Approach to Stem Reduplication Words in Malay Texts,” Recent Advances on Soft Computing and Data Mining, pp. 657-665, 2014.

Leong, L. C., Basri, S., Alfred, R., “Enhancing Malay Stemming Algorithm with Background Knowledge,” PRICAI 2012 Trends in Artificial Intelligence, pp. 753-758, 2012.

Othman, A., Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen. MSc Thesis. Universiti Kebangsaan Malaysia, Bangi, 1993.

Ranaivo-Malancon, B., “Computational Analysis of Affixed Words in Malay Language,” Proceedings of the 8th International Symposium on Malay/Indonesian Linguistics, Penang, Malaysia, 2004.

Sankupellay, M., Valliappan, S., “Malay Language Stemmer,” Sunway Academic Journal, vol. 3, pp. 147-153, 2006.

Sembok, T. M. T., Yussoff, M., Ahmad, F.,“A Malay Stemming Algorithm for Information Retrieval,” Proceedings of the 4th International Conference and Exhibition on Multi-lingual Computing, Vol. 5, pp. 2-1, 1994.

Sharum, M. Y., Abdullah, M. T., Sulaiman, M. N., Murad, M. A., Hamzah, Z. Z.,“MALIM - A new computational approach of Malay morphology,” International Symposium of Information Technology (ITSim), Vol. 2, pp. 837-843, 2010.

Tai, S. Y., Ong, C. S., Abdullah, N. A.,“On Designing An Automated Malaysian Stemmer For The Malay Language,” Proceedings of the Fifth International Workshop on Infor-mation Retrieval With Asian Languages, pp. 207-208, 2000.

Yasukawa, M., Lim, H. T., Yokoo, H.,“Stemming Malay Text and Its Application in Au-tomatic Text Categorization,” IEICE transactions on information and systems, vol.92, no. 12, pp. 2351-2359, 2009.

Downloads

Published

2016-06-01

How to Cite

Kassim, M. N., Maarof, M. A., Zainal, A., & Abdul Wahab, A. (2016). Enhanced Affixation Word Stemmer with Stemming Error Reducer to Solve Affxation Stemming Errors. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 8(3), 37–41. Retrieved from https://jtec.utem.edu.my/jtec/article/view/999