Increasing Accuracy of C4.5 Algorithm by Applying Discretization and Correlation-based Feature Selection for Chronic Kidney Disease Diagnosis
Keywords:C4.5 Algorithm, Classification, Chronic Kidney Disease, Correlation-based Feature Selection, Discretization.
Data mining is a technique of research necessary hidden information in a database to find interesting pattern. In the health sector, data mining can be used to diagnose a disease from the patient's medical data record. This research used a Chronic Kidney Disease (CKD) dataset obtained from UCI machine learning repository. In this dataset almost half of attributes are numeric types that are continuous. Continuous attributes can make accuracy lower because the data forms are unlimited, so it need to be transformed into discrete. In certain cases, if all attributes are used, it can produce a low level of accuracy because it is irrelevant and does not have a correlation with the target class. So, these attributes need to be selected in advance to get more accurate results. Classification is one technique in data mining. Which one of classification algorithms is C4.5. Purpose of this study is increasing accuracy of C4.5 algorithm by applaying discretization and Correlation-Based Feature Selection (CFS) for chronic kidney disease diagnosis. Accuracy improvement is done by applying discretization and CFS. Discretization is used to handle continuous value, while CFS is used as attribute selection. Experiment was conducted with WEKA (Waikato Environment for Knowledge Analysis). By applying discretization and CFS in C4.5 shows an increase in accuracy of 0.5%. The C4.5 has an accuracy of 97%. The accuracy of C4.5 with discretization are 97.25% and accuracy of C4.5 algorithm with discretization and CFS is 97.5%.
M. H. A. Elhebir, A. Abraham, “A Novel Ensemble Approach to Enhance the Performance of Web Server Logs Classification”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 7, 2015, pp. 189-195.
M. A. Muslim, S. H. Rukmana, E. Sugiharti, and B. Prasetiyo, “Application of the pessimistic pruning to increase the accuracy of C4.5 algorithm in diagnosing chronic kidney disease”, Journal of Physics: Conference Series, 2018.
I. H. Witten, E. Frank, Practical Machine Learning Tools and Techniques, USA: Elsevier, USA, 2016.
G. Kaur, E. A. Sharma, “Predict Chronic Kidney Disease Using Data Mining Algorithms In Hadoop”, International Journal of Engineering Researches and Management Studies, vol. 5, no. 2, pp. 34–48, 2018.
A. Widodo, S. Handoyo, “The Classification Performance Using Logistic Regression And Support Vector Machine (SVM)”, Journal of Theoretical and Applied Information Technology, vol. 95, no. 19, pp. 5184-5194, 2017.
P. Sinha, P. sinha, “Comparative Study of Chronic Kidney Disease Prediction using KNN and SVM”, International Journal of Engineering Research & Technology (IJERT), vol. 4, no.12, pp. 608- 612, 2015.
M. Zavvar, A. Yavari, S. M. Mirhassannia, M. R. Nehi, and M. H. Zavvar, “Classification of Risk in Software Development Projects using Support Vector Machine”, Journal of Telecommunication, Electronic and Computer Engineering, vol. 9, no. 1, pp. 1-5, 2017.
H. F. Eid, A. Abraham, “Adaptive Feature Selection and Classification Using Modified Whale Optimization”, International Journal of Computer Information System and Industrial Management Applications, vol. 10, 2018, pp. 174-182.
R. Asgarnezhad, M. Shekofteh, and F. Z. Boroujeni, “Improving Diagnosis of Diabetes Mellitus Using Combination of Preprocessing Techniques”, Journal of Theoretical and Applied Information Technology, vol. 95, no. 13, pp. 2889-2895, 2017.
P. Kapoor, D. Arora, and A. Kumar, “Implications of Discretization Towards Improving Classification Accuracy for”, Journal of Theoretical and Applied Information Technology, vol. 95, no. 24, pp. 6893–6901, 2017.
S. Sasikala, “Multi Filtration Feature Selection (MFFS) to improve discriminatory ability in clinical data set”, Applied Computing and Informatics, vol. 12, no. 2, pp. 117–127, 2016.
J. Han, M. Kamber, J. Pei, Data Mining: Concepts and Techniques, CA, itd: Morgan Kaufmann, San Francisco, 2012.
G. I. Salama, M. B. Abdelhalim, and M. A. Zeid, “Experimental comparison of classifiers for breast cancer diagnosis”, 2012 Seventh International Conference on Computer Engineering & Systems (ICCES), pp. 180-185, 2012.
A. S. Levey, J. Coresh, “Chronic kidney disease”, The Lancet, vol. 379, no. 9811, pp. 165-180, 2012.
M. A. Muslim, I. I. N. Kurniawati, and E. Sugiharti, “Expert System Diagnosis Chronic Kidney Disease Based on Mamdani Fuzzy Inference System”, Journal of Theoretical and Applied Information Technology, vol. 78, no. 1, pp. 70-75, 2015.
Z. A. Altikardes, H. Erdal, A. F. Baba, A. S. Fak, and H. Korkmaz, “Performance evaluation of classification algorithms by excluding the most relevant attributes for dipper/non-dipper pattern estimation in Type-2 DM patients”, International Journal of Computer Information System and Industrial Management Applications, vol. 8. 2016, pp. 247- 256.
R. Dash, R.L. Paramguru, R. Dash. “Comparative Analysis of Supervised an Unsupervised Discretization Techniques”, International Journal of Advances in Science and Technology, vol. 2, no. 3, pp. 29- 37, 2011.
A. Al-Ibrahim, “Discretization of Continuous Attributes in Supervised Learning Algorithms”, The Research Bulletin of Jordan ACM, vol. 2, no. 4, pp. 1158, 2011.
M. Hall, Correlation-based Feature Selection for Machine Learning, Methodology, 1999.
S. H. Bouazza, K. Auhmani, and A. Zeroual, “Application of the Filter approach and the Clustering algorithm on Cancer datasets”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 10, no. 2018, pp. 068-086, 2018.
A. G. Karegowda, A. S. Manjunath, and M. A. Jayaram, “Comparative Study of Attribute Selection Using Gain Ratio and Correlation Based Feature Selection”, International Journal of Information Technology and Knowledge Management, vol. 2, no. 2, pp. 271–277, 2010.
M. A. Muslim, A. Nurzahputra, and B. Prasetiyo, “Improving Accuracy of C4.5 Algorithm Using Split Feature Reduction Model and Bagging Ensemble for Credit Card Risk Prediction”, 2018 International Conference on Information and Communication Technology, pp. 141– 145, 2018.
W. Dai, W. Ji, “A MapReduce Implementation of C4.5 Decision Tree Algorithm”, International of Database Theory and Application, vol. 7, no. 1, pp. 49-60, 2018.
A. M. Alfatah, R. Arifudin, and M. A. Muslim, “Implementation of Decision Tree and Dempster Shafer on Expert System for Lung Disease Diagnosis”, Scientific Journal of Informatics, vol. 5, no. 1, pp. 50-57, 2018.
K. R. Lakshmi, Y. Nagesh, and M. Veerakrishna, “Performance Comparison of Three Data Mining Techniques for Predicting Kidney Dialysis Survivability”, International Journal of Advances in Engineering & Technology, vol. 7, no. 1, pp. 242-254, 2014.
How to Cite
TRANSFER OF COPYRIGHT AGREEMENT
The manuscript is herewith submitted for publication in the Journal of Telecommunication, Electronic and Computer Engineering (JTEC). It has not been published before, and it is not under consideration for publication in any other journals. It contains no material that is scandalous, obscene, libelous or otherwise contrary to law. When the manuscript is accepted for publication, I, as the author, hereby agree to transfer to JTEC, all rights including those pertaining to electronic forms and transmissions, under existing copyright laws, except for the following, which the author(s) specifically retain(s):
- All proprietary right other than copyright, such as patent rights
- The right to make further copies of all or part of the published article for my use in classroom teaching
- The right to reuse all or part of this manuscript in a compilation of my own works or in a textbook of which I am the author; and
- The right to make copies of the published work for internal distribution within the institution that employs me
I agree that copies made under these circumstances will continue to carry the copyright notice that appears in the original published work. I agree to inform my co-authors, if any, of the above terms. I certify that I have obtained written permission for the use of text, tables, and/or illustrations from any copyrighted source(s), and I agree to supply such written permission(s) to JTEC upon request.