Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis

Authors

  • N. Cahyani Department of Computer Science, FMIPA, Universitas Negeri Semarang, Sekaran, Gunung Pati, Semarang, Central Java 50229, Indonesia.
  • M.A Muslim Department of Computer Science, FMIPA, Universitas Negeri Semarang, Sekaran, Gunung Pati, Semarang, Central Java 50229, Indonesia.

Keywords:

C4.5 Algorithm, Classification, Chronic Kidney Disease, Correlation-based Feature Selection, Discretization.

Abstract

Data mining is a technique of research necessary hidden information in a database to find interesting pattern. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset almost half of attributes are numeric types that are continuous. Continuous attributes can make accuracy lower because the data forms are unlimited, so it need to be transformed into discrete. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. So, these attributes need to be selected in advance to get more accurate results. Classification is one technique in data mining. Which one of classification algorithms is  C4.5. Purpose of this study is increasing accuracy of C4.5 algorithm by applaying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. Accuracy improvement is done by applying discretization and CFS. Discretization is used to handle continuous value, while CFS is used as attribute selection. Experiment was conducted with WEKA (Waikato Environment for Knowledge Analysis). By applying discretization and CFS in C4.5 shows an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization are 97.25% and  accuracy of C4.5 algorithm with discretization and CFS is 97.5%.

References

M. H. A. Elhebir, A. Abraham, “A Novel Ensemble Approach to Enhance the Performance of Web Server Logs Classification”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 7, 2015, pp. 189-195.

M. A. Muslim, S. H. Rukmana, E. Sugiharti, and B. Prasetiyo, “Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease”, Journal of Physics: Conference Series, 2018.

I. H. Witten, E. Frank, Practical Machine Learning Tools and Techniques, USA: Elsevier, USA, 2016.

G. Kaur, E. A. Sharma, “Predict Chronic Kidney Disease Using Data Mining Algorithms In Hadoop”, International Journal of Engineering Researches and Management Studies, vol. 5, no. 2, pp. 34–48, 2018.

A. Widodo, S. Handoyo, “The Classification Performance Using Logistic Regression And Support Vector Machine (SVM)”, Journal of Theoretical and Applied Information Technology, vol. 95, no. 19, pp. 5184-5194, 2017.

P. Sinha, P. sinha, “Comparative Study of Chronic Kidney Disease Prediction using KNN and SVM”, International Journal of Engineering Research & Technology (IJERT), vol. 4, no.12, pp. 608- 612, 2015.

M. Zavvar, A. Yavari, S. M. Mirhassannia, M. R. Nehi, and M. H. Zavvar, “Classification of Risk in Software Development Projects using Support Vector Machine”, Journal of Telecommunication, Electronic and Computer Engineering, vol. 9, no. 1, pp. 1-5, 2017.

H. F. Eid, A. Abraham, “Adaptive Feature Selection and Classification Using Modified Whale Optimization”, International Journal of Computer Information System and Industrial Management Applications, vol. 10, 2018, pp. 174-182.

R. Asgarnezhad, M. Shekofteh, and F. Z. Boroujeni, “Improving Diagnosis of Diabetes Mellitus Using Combination of Preprocessing Techniques”, Journal of Theoretical and Applied Information Technology, vol. 95, no. 13, pp. 2889-2895, 2017.

P. Kapoor, D. Arora, and A. Kumar, “Implications of Discretization Towards Improving Classification Accuracy for”, Journal of Theoretical and Applied Information Technology, vol. 95, no. 24, pp. 6893–6901, 2017.

S. Sasikala, “Multi Filtration Feature Selection (MFFS) to improve discriminatory ability in clinical data set”, Applied Computing and Informatics, vol. 12, no. 2, pp. 117–127, 2016.

J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, CA, itd: Morgan Kaufmann, San Francisco, 2012.

G. I. Salama, M. B. Abdelhalim, and M. A. Zeid, “Experimental comparison of classifiers for breast cancer diagnosis”, 2012 Seventh International Conference on Computer Engineering & Systems (ICCES), pp. 180-185, 2012.

A. S. Levey, J. Coresh, “Chronic kidney disease”, The Lancet, vol. 379, no. 9811, pp. 165-180, 2012.

M. A. Muslim, I. I. N. Kurniawati, and E. Sugiharti, “Expert System Diagnosis Chronic Kidney Disease Based on Mamdani Fuzzy Inference System”, Journal of Theoretical and Applied Information Technology, vol. 78, no. 1, pp. 70-75, 2015.

Z. A. Altikardes, H. Erdal, A. F. Baba, A. S. Fak, and H. Korkmaz, “Performance evaluation of classification algorithms by excluding the most relevant attributes for dipper/non-dipper pattern estimation in Type-2 DM patients”, International Journal of Computer Information System and Industrial Management Applications, vol. 8. 2016, pp. 247- 256.

R. Dash, R.L. Paramguru, R. Dash. “Comparative Analysis of Supervised an Unsupervised Discretization Techniques”, International Journal of Advances in Science and Technology, vol. 2, no. 3, pp. 29- 37, 2011.

A. Al-Ibrahim, “Discretization of Continuous Attributes in Supervised Learning Algorithms”, The Research Bulletin of Jordan ACM, vol. 2, no. 4, pp. 1158, 2011.

M. Hall, Correlation-based Feature Selection for Machine Learning, Methodology, 1999.

S. H. Bouazza, K. Auhmani, and A. Zeroual, “Application of the Filter approach and the Clustering algorithm on Cancer datasets”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 10, no. 2018, pp. 068-086, 2018.

A. G. Karegowda, A. S. Manjunath, and M. A. Jayaram, “Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection”, International Journal of Information Technology and Knowledge Management, vol. 2, no. 2, pp. 271–277, 2010.

M. A. Muslim, A. Nurzahputra, and B. Prasetiyo, “Improving Accuracy of C4.5 Algorithm Using Split Feature Reduction Model and Bagging Ensemble for Credit Card Risk Prediction”, 2018 International Conference on Information and Communication Technology, pp. 141– 145, 2018.

W. Dai, W. Ji, “A MapReduce Implementation of C4.5 Decision Tree Algorithm”, International of Database Theory and Application, vol. 7, no. 1, pp. 49-60, 2018.

A. M. Alfatah, R. Arifudin, and M. A. Muslim, “Implementation of Decision Tree and Dempster Shafer on Expert System for Lung Disease Diagnosis”, Scientific Journal of Informatics, vol. 5, no. 1, pp. 50-57, 2018.

K. R. Lakshmi, Y. Nagesh, and M. Veerakrishna, “Performance Comparison of Three Data Mining Techniques for Predicting Kidney Dialysis Survivability”, International Journal of Advances in Engineering & Technology, vol. 7, no. 1, pp. 242-254, 2014.

Downloads

Published

2020-03-31

How to Cite

Cahyani, N., & Muslim, M. (2020). Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 12(1), 25–32. Retrieved from https://jtec.utem.edu.my/jtec/article/view/4922