The Utilization of Decision Tree Algorithm In Order to Predict Heart Disease

Mia Mia, Anis Fitri Nur Masruriyah, Adi Rizky Pratama

Abstract


The data on heart disease patients obtained from the Ministry of Health of the Republic of Indonesia in 2020 explains that heart disease has increased every year and ranks as the highest cause of death in Indonesia, especially at productive ages. If people with heart disease are not treated properly, then in their effective period a patient can experience death more quickly. Thus, a predictive model that is able to help medical personnel solve health problems is built. This study employed the Random Forest and Decision Tree algorithm classification process by processing cardiac patient data to create a predictive model and based on the data obtained, showing that the data on heart disease was not balanced. Thus, to overcome the imbalance, an oversampling technique was carried out using ADASYN and SMOTE. This study proved that the performance of the ADASYN and SMOTE oversampling techniques on the C45 algorithm and the Random Forest Classifier had a significant effect on the prediction results. The usage of oversampling techniques to analyze data aims to handle unbalanced datasets, and the confusion matrix is used for testing Precision, Recall, and F1-SCORE, as well as Accuracy. Based on the results of research that has been carried out with the K-Fold 10 testing technique and oversampling technique, SMOTE + RF is one of the best oversampling techniques which has a greater Accuracy of 93.58% compared to Random Forest without SMOTE of 90.51% and SMOTE + ADASYN of 93.55%. The application of the SMOTE technique was proven to be able to overcome the problem of data imbalance and get better classification results than the application of the ADASYN technique.

Keywords


ADASYN; C45; Heart Disease; Random Forest; SMOTE

Full Text:

PDF

References


“Kementerian Kesehatan Republik Indonesia.” .

BHF, “UK Factsheet,” Br. Hear. Found., no. April, pp. 1–21, 2019.

J. J. Pangaribuan, C. Tedja, and S. Wibowo, “PERBANDINGAN METODE ALGORITMA C4.5 DAN EXTREME LEARNING MACHINE UNTUK MENDIAGNOSIS PENYAKIT JANTUNG KORONER,” 2019.

A. Rohman and D. M. Rochcham, “MODEL ALGORITMA C4.5 UNTUK PREDIKSI PENYAKIT JANTUNG,” 2018.

N. Khasanah, R. Komarudin, N. Afni, Y. I. Maulana, and A. Salim, “Skin Cancer Classification Using Random Forest Algorithm,” Sisfotenika, vol. 11, no. 2, p. 137, 2021, doi: 10.30700/jst.v11i2.1122.

S. Ath et al., “Jurnal Teknologi Terpadu HYBRID MACHINE LEARNING MODEL UNTUK MEMPREDIKSI PENYAKIT JANTUNG DENGAN METODE LOGISTIC REGRESSION DAN RANDOM,” vol. 8, no. 1, pp. 40–46, 2022.

I. M. El-Hasnony, O. M. Elzeki, A. Alshehri, and H. Salem, “Multi-Label Active Learning-Based Machine Learning Model for Heart Disease Prediction,” Sensors, vol. 22, no. 3, 2022, doi: 10.3390/s22031184.

S. Maldonado, J. López, and C. Vairetti, “An alternative SMOTE oversampling strategy for high-dimensional datasets,” Appl. Soft Comput. J., vol. 76, pp. 380–389, 2019, doi: 10.1016/j.asoc.2018.12.024.

D. Elreedy and A. F. Atiya, “A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance,” Inf. Sci. (Ny)., vol. 505, pp. 32–64, 2019, doi: 10.1016/j.ins.2019.07.070.

R. . Nurdin, “Pernyataan Keaslian,” Digilib.Uin-Suka.Ac.Id, no. April 2020, p. 506812, 2021.

“CDC - 2020 BRFSS Survey Data and Documentation.” .

R. Siringoringo, “KLASIFIKASI DATA TIDAK SEIMBANG MENGGUNAKAN ALGORITMA SMOTE DAN k-NEAREST NEIGHBOR,” 2018.

G. Fico, J. Montalva, A. Medrano, N. Liappas, G. Cea, and M. T. Arredondo, “EMBEC & NBC 2017,” IFMBE Proc., vol. 65, pp. 1089–1090, 2018, doi: 10.1007/978-981-10-5122-7.

S. Rahayu, T. Bharata Adji, N. Akhmad Setiawan, and D. Teknik Elektro dan Teknologi Informasi, “Penghitungan k-NN pada Adaptive Synthetic-Nominal (ADASYN-N) dan Adaptive Synthetic-kNN (ADASYN-kNN) untuk Data Nominal-Multi Kategori,” Ktrl.Inst (J.Auto.Ctrl.Inst), vol. 9, no. 2, p. 2017.

W. Sullivan, Machine Learning For Beginners Guide Algorithms, vol. 4, no. 1. 2017.

A. Cherfi, K. Nouira, and A. Ferchichi, “Very Fast C4.5 Decision Tree Algorithm,” Appl. Artif. Intell., vol. 32, no. 2, pp. 119–137, 2018, doi: 10.1080/08839514.2018.1447479.

M. Kretowski, Evolutionary Decision Trees in Large-Scale Data Mining. 2019.

A. Primajaya and B. N. Sari, “Random Forest Algorithm for Prediction of Precipitation,” Indones. J. Artif. Intell. Data Min., vol. 1, no. 1, p. 27, 2018, doi: 10.24014/ijaidm.v1i1.4903.

T. Djatna, M. K. D. Hardhienata, and A. F. N. Masruriyah, “An intuitionistic fuzzy diagnosis analytics for stroke disease,” J. Big Data, vol. 5, no. 1, 2018, doi: 10.1186/s40537-018-0142-7.

S. Zitao, “3 min of Machine Learning: Cross Vaildation,” Zitao’s Web, 2020. .




DOI: http://dx.doi.org/10.38101/sisfotek.v12i2.551

Refbacks

  • There are currently no refbacks.


 

JURNAL SISFOTEK GLOBAL

Organized by: Research Center and Community Development
Published by: Institut Teknologi dan Bisnis Bina Sarana Global
Jl. Aria Santika No.43A, Margasari, Kec. Karawaci, Kota Tangerang, Banten 15114
Phone. +62 552 2727
Email: lppm@global.ac.id

INDEXED BY:

   


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License..
Based on a work at https://journal.global.ac.id/index.php/sisfotek/index.