Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis
Keywords:
C4.5 Algorithm, Classification, Chronic Kidney Disease, Correlation-based Feature Selection, Discretization.Abstract
Data mining is a technique of research necessary hidden information in a database to find interesting pattern. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset almost half of attributes are numeric types that are continuous. Continuous attributes can make accuracy lower because the data forms are unlimited, so it need to be transformed into discrete. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. So, these attributes need to be selected in advance to get more accurate results. Classification is one technique in data mining. Which one of classification algorithms is C4.5. Purpose of this study is increasing accuracy of C4.5 algorithm by applaying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. Accuracy improvement is done by applying discretization and CFS. Discretization is used to handle continuous value, while CFS is used as attribute selection. Experiment was conducted with WEKA (Waikato Environment for Knowledge Analysis). By applying discretization and CFS in C4.5 shows an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization are 97.25% and accuracy of C4.5 algorithm with discretization and CFS is 97.5%.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)