METODA RANDOM OVER-UNDER SAMPLING DAN RANDOM FOREST UNTUK KLASIFIKASI PREDIKSI STROKE

research
  • 01 Apr
  • 2022

METODA RANDOM OVER-UNDER SAMPLING DAN RANDOM FOREST UNTUK KLASIFIKASI PREDIKSI STROKE

Data mining mempunyai peran penting dalam memprediksi dunia medis salah satunya untuk memprediksi penyakit stroke . Salah satu permasalahan yang dihadapai pada Deteksi penyakit stroke yaitu adanya ketidakseimbangan distribusi dataset. Metode untuk mengatasi ketidakseimbangan kelas yaitu dengan metode resampling, seperti menggunakan oversampling, undersampling dan hibrida yaitu dengan menggabungkan kedua pendekatan sampling. Metode yang diusulkan pada penelitian ini adalah penerapan metode Random Over-Under Sampling (ROUS) Random Forest untuk meningkatkan kinerja akurasi klasifikasi pendeteksi penyakit stroke pada dataset dari kaggel. Karena data ini masih meiliki missing value maka di lakukan replace missing value. Setelah di lakukan missing value dilakukanlah resampling untuk menyeimbangkan dataset. Hasil pengujian menunjukkan bahwa klasifikasi tanpa melalui resampling menghasilkan kinerja akurasi rata-rata 90 % namun AUC dan Kappa nya masih rendah. Metode Random Forest memiliki nilai akurasi yang lebih baik dibandingkan dengan metode lainya . Setelah melakukan resampling dengan Random Over-under Sampling + Random forest dapat meningkatkan . Akurasi meningkat 3,6438 ,Kappa meningkat 0,5359. AUC meningkat 0,0426. Hasil penelitian ini menunjukkan bahwa penerapan resampling dengan metode Random Over-under Sampling pada algoritma Random Forest dapat meningkatkan kinerja akurasi, Kappa, AUC secara efektif pada klasifikasi prediksi penyakit stroke

Unduhan

 

REFERENSI

[1] A. Sudha, P. Gayathri, and N. Jaisankar, “Effective Analysis and Predictive Model of Stroke Disease using Classification Methods,” Int. J. Comput. Appl., vol. 43, no. 14, pp. 26–31, 2012.

[2] B. C. V. Campbell et al., “Ischaemic stroke,” Nat. Rev. Dis. Prim., vol. 5, no. 1, 2019.

[3] L. Amini et al., “Prediction and control of stroke by data mining,” Int. J. Prev. Med., vol. 4, pp. S245–S249, 2013.

[4] T. Kansadub, S. Thammaboosadee, S. Kiattisin, and C. Jalayondeja, “Stroke risk prediction model based on demographic data,” BMEiCON 2015 - 8th Biomed. Eng. Int. Conf., pp. 3–5, 2016.

[5] F. Ren, P. Cao, W. Li, D. Zhao, and O. Zaiane, “Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm,” Comput. Med. Imaging Graph., vol. 55, pp. 54–67, 2017.

[6] C. Jian, J. Gao, and Y. Ao, “A new sampling method for classifying imbalanced data based on support vector machine ensemble,” Neurocomputing, vol. 193, pp. 115–122, 2016.

[7] J. Xiao, L. Xie, C. He, and X. Jiang, “Dynamic classifier ensemble model for customer classification with imbalanced class distribution,” Expert Syst. Appl., vol. 39, no. 3, pp. 3668–3675, 2012.

[8] Y. Béjot et al., “Impact of the ageing population on the burden of stroke: The dijon stroke registry,” Neuroepidemiology, vol. 52, no. 1–2, pp. 78–85, 2019.

[9] A. Imanda, S. Martini, and K. D. Artanti, “Post hypertension and stroke: A case control study,” Kesmas, vol. 13, no. 4, pp. 164–168, 2019.

[10] F. Susilawati and N. SK, “Faktor Resiko Kejadian Stroke,” J. Ilm. Keperawatan Sai Betik, vol. 14, no. 1, p. 41, 2018.

[11] J. H. Cho et al., “Protective effect of smoking cessation on subsequent myocardial infarction and ischemic stroke independent of weight gain: A nationwide cohort study,” PLoS One, vol. 15, no. 7, p. e0235276, 2020. 47

[12] kiki F. Dimyanti, “pengaruh Antara Aktivitas fisik, kebiasaan Merokok Dan Sikap Lansia Terhadap Kejadian Osteosporosis,” J. Berk. Epidemologi, vol. 5, no. 1, pp. 95–106, 2017.

[13] S. A. Y. A. Sari, Impkementasi Data Mining Menggunakan Weka. Malang: Tim UB Press. [14] M. N. Injadat, A. Moubayed, A. B. Nassif, and A. Shami, “Systematic ensemble model selection approach for educational data mining,” Knowledge-Based Syst., vol. 200, p. 105992, 2020.

[15] N. Iriadi, “Penerapan Algoritma Klasifikasi Data Mining Dalam,” KNiST, vol. XIV, no. 2, pp. 120–129, 2012.

[16] T. Wahyudi, R. E. Indrajit, and Muh. Fauzi, Data Mining Concepts And Techniques, no. June. Morgan Kaufirman, 2011. [17] H. M. Safhi, B. Frikh, and B. Ouhbi, “Assessing reliability of Big Data Knowledge Discovery process,” Procedia Comput. Sci., vol. 148, pp. 30– 36, 2019.

[18] “jiwai han.” .

[19] A. Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, 2018.

[20] S. Babu and T. Nadu, “ENHANCING THE PERFORMANCE OF THE CLASSIFIER IN THE CONTEXT OF IMBALACED DATASET USING EMOTE + UNDERSAMPLING,” no. April, pp. 60–65, 2019.

[21] K. P. N. V Satyasree and J. V. R. Murthy, “An exhaustive literature review on class imbalance problem,” Int. J. Emerg. Trends Technol. Comput. Sci, vol. 2, no. 3, 2013.

[22] J. Xiao, L. Xie, C. He, and X. Jiang, “Dynamic classifier ensemble model for customer classification with imbalanced class distribution,” Expert Syst. Appl., vol. 39, no. 3, pp. 3668–3675, 2012.

[23] A. De, Y. Zhang, and C. Guo, “Author ’ s Accepted Manuscript To appear in : Neurocomputing,” Neurocomputing, 2015.

[24] A. I. S. Aftab and F. Matloob, “Performance Analysis of Resampling Techniques on Class Imbalance Issue in Software Defect Prediction,” Int. J. Inf. Technol. Comput. Sci., vol. 11, no. 11, pp. 44–53, 2019. 48

[25] A. Galicia, R. Talavera-Llames, A. Troncoso, I. Koprinska, and F. Martínez-Álvarez, “Multi-step forecasting for big data time series based on ensemble learning,” Knowledge-Based Syst., vol. 163, pp. 830–841, 2019.

[26] G. Wang, J. Hao, J. Ma, and H. Jiang, “A comparative assessment of ensemble learning for credit scoring,” Expert Syst. Appl., vol. 38, no. 1, pp. 223–230, 2011.

[27] L. Lin, F. Wang, X. Xie, and S. Zhong, “Random forests-based extreme learning machine ensemble for multi-regime time series prediction,” Expert Syst. Appl., vol. 83, pp. 164–176, 2017.

[28] K. Agrawal et al., “A Comparison of Class Imbalance Techniques for RealWorld Landslide Predictions,” Proc. - 2017 Int. Conf. Mach. Learn. Data Sci. MLDS 2017, vol. 2018-Janua, pp. 1–8, 2018.

[29] A. Pratama, R. C. Wihandika, and D. E. Ratnawati, “Implementasi Algoritme Support Vector Machine (SVM) untuk Prediksi Ketepatan Waktu Kelulusan Mahasiswa,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. March, pp. 1704–1708, 2018.

[30] J. Statistika, F. Matematika, D. A. N. Ilmu, and P. Alam, “Klasifikasi Berita Online Menggunakan Metode Support Vector Machine Dan KNearest Neighbor Online News Classification Using Support Vector Machine and K-Nearest,” J. Sains dan Seni ITS, vol. 5, no. 2, 2016.

[31] K. Netti and Y. Radhika, “A novel method for minimizing loss of accuracy in Naive Bayes classifier,” 2015 IEEE Int. Conf. Comput. Intell. Comput. Res. ICCIC 2015, pp. 1–4, 2016.

[32] M. Aci and M. Avci, “K nearest neighbor reinforced expectation maximization method,” Expert Syst. Appl., vol. 38, no. 10, pp. 12585– 12591, 2011.

[33] J. Sun, J. Lang, H. Fujita, and H. Li, “Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates,” Inf. Sci. (Ny)., vol. 425, pp. 76–91, 2018.

[34] Y. Qian, Y. Liang, M. Li, G. Feng, and X. Shi, “A resampling ensemble algorithm for classification of imbalance problems,” Neurocomputing, vol. 49 143, pp. 57–67, 2014.

[35] S. E. Zaluchu, “Strategi Penelitian Kualitatif dan Kuantitatif Di Dalam Penelitian Agama,” Evang. J. Teol. Injili dan Pembin. Warga Jemaat, vol. 4, no. 1, p. 28, 2020