Enhanced Affixation Word Stemmer with Stemming Error Reducer to Solve Affxation Stemming Errors
AbstractWord stemming algorithm (or word stemmer) is an important preprocessing component in the information retrieval and text categorization that aims to reduce derived words to their respective root words. Most of the existing Malay word stemmers adopt rule-based affixes removal method and dictionary lookup to stem affixation words. Despite of many stemming approaches have been proposed in the past research, the existing Malay word stemmers still suffer from affixation stemming errors due to the complexity of Malay morphology. These stemming errors can be classified into over stemming, under stemming, unstem, and special variations and exceptions. Hence this paper presents the enhanced affixation word stemmer that aims to solve these stemming errors. This paper also examined the root causes of these stemming errors in the existing Malay stemmers. The experimental results indicate that the enhanced word stemmerable to stem prefixation, suffixation, confixation and infixation wordswith better stemming accuracy by using enhanced Rule Application Order and Stemming Errors Reducer.
Abdullah, M. T., Ahmad, F., Mahmod, R., Sembok, T. M. T., “Rules frequency order stemmer for Malay language,” International Journal of Computer Science and Network Security (IJCSNS), vol. 9, no. 2, pp. 433-438, 2009.
Ahmad, F., Yusoff, M., Sembok, T. M., “Experiments with a Stemming Algorithm for Ma-lay Words,” Journal of the American Society for Information Science, vol. 47, no. 12, pp.909-918, 2009.
Alfred, R., Leong, L. C., On, C. K., Anthony, P.,“A Literature Review and Discussion of Malay Rule-Based Affix Elimination Algorithms,”The 8th International Conference on Knowledge Management in Organizations, pp. 285-297, 2014.
Darwis, S. A., Abdullah, R., Idris, N., “Exhaustive Affix Stripping And A Malay Word Register To Solve Stemming Errors And Ambiguity Problem In Malay Stem-mers,” Malaysian Journal of Computer Science, 2012.
Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., Dhalila, M. S. S., “Simple Rules Malay Stemmer,” The International Conference on Informatics and Applications (ICIA2012), pp. 28-35, 2012.
Hassan, A., “ Morfologi,” Vol. 13, 2006.
Idris, N., Syed, S. M. F. D., “Stemming for Term Conflation in Malay Texts,” International Conference on Artificial Intelligence, 2001.
Kassim, M. N., Maarof, M. A., Zainal, A., “Enhanced Rules Application Order Approach to Stem Reduplication Words in Malay Texts,” Recent Advances on Soft Computing and Data Mining, pp. 657-665, 2014.
Leong, L. C., Basri, S., Alfred, R., “Enhancing Malay Stemming Algorithm with Background Knowledge,” PRICAI 2012 Trends in Artificial Intelligence, pp. 753-758, 2012.
Othman, A., Pengakar Perkataan Melayu untuk Sistem Capaian Dokumen. MSc Thesis. Universiti Kebangsaan Malaysia, Bangi, 1993.
Ranaivo-Malancon, B., “Computational Analysis of Affixed Words in Malay Language,” Proceedings of the 8th International Symposium on Malay/Indonesian Linguistics, Penang, Malaysia, 2004.
Sankupellay, M., Valliappan, S., “Malay Language Stemmer,” Sunway Academic Journal, vol. 3, pp. 147-153, 2006.
Sembok, T. M. T., Yussoff, M., Ahmad, F.,“A Malay Stemming Algorithm for Information Retrieval,” Proceedings of the 4th International Conference and Exhibition on Multi-lingual Computing, Vol. 5, pp. 2-1, 1994.
Sharum, M. Y., Abdullah, M. T., Sulaiman, M. N., Murad, M. A., Hamzah, Z. Z.,“MALIM - A new computational approach of Malay morphology,” International Symposium of Information Technology (ITSim), Vol. 2, pp. 837-843, 2010.
Tai, S. Y., Ong, C. S., Abdullah, N. A.,“On Designing An Automated Malaysian Stemmer For The Malay Language,” Proceedings of the Fifth International Workshop on Infor-mation Retrieval With Asian Languages, pp. 207-208, 2000.
Yasukawa, M., Lim, H. T., Yokoo, H.,“Stemming Malay Text and Its Application in Au-tomatic Text Categorization,” IEICE transactions on information and systems, vol.92, no. 12, pp. 2351-2359, 2009.
How to Cite
TRANSFER OF COPYRIGHT AGREEMENT
The manuscript is herewith submitted for publication in the Journal of Telecommunication, Electronic and Computer Engineering (JTEC). It has not been published before, and it is not under consideration for publication in any other journals. It contains no material that is scandalous, obscene, libelous or otherwise contrary to law. When the manuscript is accepted for publication, I, as the author, hereby agree to transfer to JTEC, all rights including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author(s) specifically retain(s):
- All proprietary right other than copyright, such as patent rights
- The right to make further copies of all or part of the published article for my use in classroom teaching
- The right to reuse all or part of this manuscript in a compilation of my own works or in a textbook of which I am the author; and
- The right to make copies of the published work for internal distribution within the institution that employs me
I agree that copies made under these circumstances will continue to carry the copyright notice that appears in the original published work. I agree to inform my co-authors, if any, of the above terms. I certify that I have obtained written permission for the use of text, tables, and/or illustrations from any copyrighted source(s), and I agree to supply such written permission(s) to JTEC upon request.