Evaluation of Transformer-Based Models for Sentiment Analysis in Bahasa Malaysia
DOI:
https://doi.org/10.54554/jtec.2025.17.01.004Keywords:
Transformer-based models, Sentiment analysis, Bahasa Malaysia, Natural Language ProcessingAbstract
This study investigates the application of advanced Transformer-based models, namely BERT, DistilBERT, BERT-multilingual, ALBERT, and BERT-CNN, for sentiment analysis in Bahasa Malaysia, addressing unique challenges such as mixed-language usage and abbreviated expressions in social media text. Using the Malaya dataset to ensure linguistic diversity and domain coverage, the research incorporates robust preprocessing techniques, including synonym mapping and sentiment-aware tokenization, to enhance feature extraction. Through rigorous evaluation, BERT-CNN exhibits the best accuracy (96.3%), followed by BERT-multilingual (89.84%) and BERT (89.5%). DistilBERT and ALBERT delivered competitive performance (88.96% and 88.76%, respectively) while offering reduced computational requirements, highlighting the trade-offs between performance and efficiency. The study emphasizes optimized strategies for handling challenges in positive sentiment classification and demonstrates the efficacy of transformer architectures in nuanced sentiment detection for low-resource languages. These findings contribute to advancing Natural Language Processing (NLP) for scalable sentiment analysis across domains.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)