Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis

Authors

  • N. Cahyani Department of Computer Science, FMIPA, Universitas Negeri Semarang, Sekaran, Gunung Pati, Semarang, Central Java 50229, Indonesia.
  • M.A Muslim Department of Computer Science, FMIPA, Universitas Negeri Semarang, Sekaran, Gunung Pati, Semarang, Central Java 50229, Indonesia.

Keywords:

C4.5 Algorithm, Classification, Chronic Kidney Disease, Correlation-based Feature Selection, Discretization.

Abstract

Data mining is a technique of research necessary hidden information in a database to find interesting pattern. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset almost half of attributes are numeric types that are continuous. Continuous attributes can make accuracy lower because the data forms are unlimited, so it need to be transformed into discrete. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. So, these attributes need to be selected in advance to get more accurate results. Classification is one technique in data mining. Which one of classification algorithms is  C4.5. Purpose of this study is increasing accuracy of C4.5 algorithm by applaying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. Accuracy improvement is done by applying discretization and CFS. Discretization is used to handle continuous value, while CFS is used as attribute selection. Experiment was conducted with WEKA (Waikato Environment for Knowledge Analysis). By applying discretization and CFS in C4.5 shows an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization are 97.25% and  accuracy of C4.5 algorithm with discretization and CFS is 97.5%.

Downloads

Published

2020-03-31

How to Cite

Cahyani, N., & Muslim, M. (2020). Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis. Journal of Telecommunication, Electronic and Computer Engineering (JTEC), 12(1), 25–32. Retrieved from https://jtec.utem.edu.my/jtec/article/view/4922