Statistical Validation of ACO-KNN Algorithm for Sentiment Analysis

Authors

  • Siti Rohaidah Ahmad
  • Azuraliza Abu Bakar
  • Mohd Ridzwan Yaakub
  • Nurhafizah Moziyana Mohd Yusop

Keywords:

Feature Selection, Sentiment Analysis, Statistical Analysis, Ant Colony Optimization,

Abstract

This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbour (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, the IG-GA, and the IG-RSAR algorithms. The dependency relation algorithm was used to identify actual features commented by customers by linking the dependency relation between product feature and sentiment words in customers sentences. This study evaluated the performance of the ACOKNN algorithm using precision, recall, and F-score, which was validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

References

B. Liu, Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.

H. Arafat, R. M.Elawady, S. Barakat, and N. M.Elrashidy, “Different Feature Selection for Sentiment Classification,” International Journal of Information Science and Intelligent System, vol. 1, no. 3, pp. 137–150, 2014.

A. Abbasi, S. France, Z. Zhang, and H. Chen, “Selecting Attributes for Sentiment Classification Using Feature Relation Networks,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 3, pp. 447–462, 2011.

Vinodhini G. and R. Chandrasekaran, “Effect of Feature Reduction in Sentiment Analysis of Online Reviews,” International Journal of Advance in Computer Engineering & Technology, vol. 2, no. 6, pp. 2165–2172, 2013.

S. R. Ahmad, A. Abu Bakar, and M. R. Yaakub, “Metaheuristic Algorithms for Feature Selection in Sentiment Analysis: A Review,” in Science and Information Conference (SAI), 2015, pp. 222–226.

H. Liu and L. Yu, “Toward Integrating Feature Selection Algorithms for Classification and Clustering,” IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491–502, 2005.

M. H. Aghdam, N. Ghasem-Aghaee, and M. E. Basiri, “Text Feature Selection using Ant Colony Optimization,” Journal Expert Systems with Applications: An International Journal, vol. 36, no. 3, pp. 6843–6853, 2009.

A. A. B. M. R. Y. S. R. Ahmad, “Metaheuristic algorithms for feature selection in sentiment analysis,” in Science and Information Conference (SAI), 2015, 2015, pp. 222–226.

A. Abbasi, H. Chen, and A. Salem, “Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web forums,” ACM Transactions on Information Systems, vol. 26, no. 3, pp. 1–34, 2008.

B. Agarwal and N. Mittal, “Sentiment Classification using Rough Set based Hybrid Feature Selection,” in Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA 2013), 2013, no. June, pp. 115–119.

Y. Saeys, I. Inza, and P. Larrañaga, “A Review of Feature Selection Techniques in Bioinformatics,” Bioinformatics (Oxford, England), vol. 23, no. 19, pp. 2507–2517, 2007.

R. Jensen and Q. Shen, “Fuzzy-Rough Sets Assisted Attribute Selection,” IEEE Transactions on Fuzzy Systems, vol. 15, no. 1, 2007.

R. Jensen and Q. Shen, “Fuzzy-rough attribute reduction with application to web catagorization,” In the Transaction on Fuzzy Sets and System, vol. 141, no. 3, pp. 469–485, 2004.

R. Jensen and Q. Shen, “New Approaches to Fuzzy-Rough Feature Selection,” IEEE Transactions on Fuzzy Systems, vol. 17, no. 4, 2009.

R. Kohavi and G. H. John, “Wrappers for Feature Subset Selection,” Artificial Intelligence, vol. 97, no. 1–2, pp. 273–324, 1997.

M. Mafarja and D. Eleyan, “Ant Colony Optimization based Feature Selection in Rough Set Theory,” International Journal of Computer Science and Electronics Engineering (IJCSEE), vol. 1, no. 2, pp. 244–247, 2013.

A. Unler and A. Murat, “A Discrete Particle Swarm Optimization Method for Feature Selection in Binary Classification Problems,” European Journal of Operational Research, vol. 206, no. 3. pp. 528–539, 2010.

S. C. Yusta, “Different Metaheuristic Strategies to Solve the Feature Selection Problem,” Pattern Recognition Letters, vol. 30, no. 5. pp. 525–534, 2009.

M. E. Basiri and S. Nemati, “A Novel Hybrid ACO-GA Algorithm for Text Feature Selection,” 2009 IEEE Congress on Evolutionary Computation, 2009.

J. Zhu, H. Wang, and J. T. Mao, “Sentiment Classification using Genetic Algorithm and Conditional Random Field,” in Information Management and Engineering (ICIME), 2010 The 2nd IEEE International Conference on, 2010, pp. 193–196.

P. Kalaivani and K. L. Shunmuganathan, “Feature Reduction Based on Genetic Algorithm and Hybrid Model for Opinion Mining,” Scientific Programming, p. 15, 2015.

Z. Liu, S. Liu, L. Liu, J. Sun, X. Peng, and T. Wang, “Sentiment recognition of online course reviews using multi-swarm optimization-based selected features,” Neurocomputing, vol. 185, pp. 11–20, Apr. 2016.

Y. Jin, W. Xiong, and C. Wang, “Feature Selection for Chinese Text Categorization Based on Improved Particle Swarm Optimization,” in Natural Language Processing and Knowledge Engineering (NLP-KE), 2010, pp. 1–6.

H. K. Chantar and D. W. Corne, “Feature Subset Selection for Arabic Document Categorization using BPSO-KNN,” in Nature and Biologically Inspired Computing (NaBIC), 2011 Third World Congress, 2011, pp. 546–551.

M. Dorigo and G. Di Caro, “Ant colony optimization: A new metaheuristic,” Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on, vol. 2, pp. 1470–1477, 1999.

M. Dorigo and T. Stützle, “Ant Colony Optimization,” IEEE Computational Intelligence Magazine, vol. 1, no. 4, pp. 28–39, 2004.

M. Dorigo and T. Stützle, “Ant Colony Optimization: Overview and Recent Advances,” in Handbook of Metaheuristics, vol. 146, 2010, pp. 227–263.

A. Al-Ani, “Ant Colony Optimization for Feature Subset Selection,” Society, vol. 4, no. February, pp. 35–38, 2005.

H. R. Kanan and K. Faez, “An improved feature selection method based on ant colony optimization (ACO) evaluated on face recognition system,” Applied Mathematics and Computation, vol. 205, no. 2, pp. 716–725, 2008.

M. M. Kabir, M. Shahjahan, and K. Murase, “A new hybrid ant colony optimization algorithm for feature selection,” Expert Systems with Applications, vol. 39, no. 3. pp. 3747–3763, 2012.

M. H. Aghdam, T. Jafar, A. R. Naghsh-Nilchi, and M. E. Basiri, “Combination of Ant Colony Optimization and Bayesian Classification for Feature Selection in a Bioinformatics Dataset,” Journal of Computer Science & Systems Biology, vol. 2, no. 3, pp. 186–199, 2009.

M. E. Basiri, N. Ghasem-Aghaee, and M. H. Aghdam, “Using Ant Colony Optimization-Based Selected Features for Predicting Post-synaptic Activity in Proteins,” in Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2008, pp. 12–23.

S. Nemati, M. E. Basiri, N. Ghasem-Aghaee, and M. H. Aghdam, “A novel ACO–GA hybrid algorithm for feature selection in protein function prediction,” Expert Systems with Applications, vol. 36, no. 10. pp. 12086–12094, 2009.

E. Sarac and Se. A. Ozel, “An ant colony optimization based feature selection for web page classification,” The Scientific World Journal, vol. 2014, 2014.

H. S. Alghamdi, H. L. Tang, and S. Alshomrani, “Hybrid ACO and TOFA Feature Selection Approach for Text Classification,” in WCCI 2012 IEEE World Congress on Computational Intelligence, 2012, vol. 1–6, pp. 1–6.

M. A. Jabbar, B. L. Deekshatulu, and P. Chandra, “Classification of Heart Disease Using K- Nearest Neighbor and Genetic Algorithm,” Procedia Technology, vol. 10, pp. 85–94, 2013.

N. Bhatia and Vandana, “Survey of Nearest Neighbor Techniques,” International Journal of Computer Science and Information Security, vol. 8, no. 2, p. 4, 2010.

R. Jensen, “Combining rough and fuzzy sets for feature selection,” Edinburgh University, 2005.

G. Somprasertsri and P. Lalitrojwong, “Mining Features-Opinion in Online Customer Reviews for Opinion Summarization,” Journal of Universal Computer Science, vol. 16, no. 6, pp. 938–955, 2010.

M. R. Yaakub, Y. Li, A. Algarni, and B. Peng, “Integration of Opinion into Customer Analysis Model,” in 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, 2012, pp. 164–168.

M. F. Triola, Elementary Statistics, 10, illust ed. Pearson, 2007.

J. Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” The Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.

M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining KDD 04, pp. 168–177, 2004.

I. H. Written and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edi. Elsevier Science, 2005.

Downloads

Published

2017-09-15

How to Cite

Ahmad, S. R., Abu Bakar, A., Yaakub, M. R., & Mohd Yusop, N. M. (2017). Statistical Validation of ACO-KNN Algorithm for Sentiment Analysis. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 9(2-11), 165–170. Retrieved from https://jtec.utem.edu.my/jtec/article/view/2757