CROSS-PROJECT DEFECT PREDICTION PADA DATASET AEEEM MENGGUNAKAN HYBRID SMOTE–TOMEK DAN ENSEMBLE LEARNING

research
  • 11 Mar
  • 2026

CROSS-PROJECT DEFECT PREDICTION PADA DATASET AEEEM MENGGUNAKAN HYBRID SMOTE–TOMEK DAN ENSEMBLE LEARNING

Penelitian ini bertujuan untuk meningkatkan kinerja Cross-project Defect Prediction (CPDP) pada dataset AEEEM melalui penerapan pendekatan hybrid preprocessing yang mengombinasikan normalisasi, reduksi dimensi menggunakan PCA, penyeimbangan kelas dengan SMOTE–Tomek, serta penyesuaian threshold keputusan. Eksperimen dilakukan pada lima proyek AEEEM, yaitu EQ, JDT, LC, ML, dan PDE, dengan dua skenario utama, yaitu single-source CPDP dan multi-source CPDP. Model yang digunakan adalah Random Forest dan Support Vector Machine (SVM), sedangkan kinerja dievaluasi menggunakan metrik F1-score dan AUC. Hasil eksperimen menunjukkan bahwa pendekatan multi-source secara umum menghasilkan kinerja yang lebih stabil dibandingkan single-source. Selain itu, penerapan hybrid preprocessing terbukti mampu meningkatkan F1-score secara signifikan dibandingkan baseline, terutama pada dataset dengan rasio defect yang rendah. Ablation study mengonfirmasi bahwa peningkatan kinerja diperoleh dari kombinasi penyeimbangan kelas dan threshold tuning, bukan dari satu komponen tunggal.

Unduhan

 

  • Surat Pernyataan Persetujuan Publikasi Karya Ilmiah.pdf

    Surat Pernyataan Persetujuan Publikasi Karya Ilmiah

    •   diunduh 11x | Ukuran 40,911,073
  • KATA PENGANTAR.pdf

    Kata Pengantar

    •   diunduh 10x | Ukuran 316,998
  • Abstrak.pdf

    Abstrak

    •   diunduh 10x | Ukuran 240,714
  • BAB V.pdf

    Bab 5 Kesimpulan dan Saran

    •   diunduh 0x | Ukuran 334,484
  • BAB 1.pdf

    Bab 1 Pendahuluan

    •   diunduh 0x | Ukuran 364,924
  • BAB IV.pdf

    Bab 4 Pembahasan

    •   diunduh 0x | Ukuran 576,909
  • LAMPIRAN.pdf

    Lampiran

    •   diunduh 0x | Ukuran 568,018
  • Draft Paper TESIS.pdf

    Draft Paper

    •   diunduh 0x | Ukuran 1,178,074

REFERENSI

[1]      M. Ali et al., “Software Defect Prediction Using an Intelligent Ensemble-Based Model,” IEEE Access, vol. 12, no. January, pp. 20376–20395, 2024, doi: 10.1109/ACCESS.2024.3358201.

[2]      M. M. Febrian, S. W. Saputro, T. H. Saragih, F. Abadi, and R. Herteno, “Hybrid Feature Selection and Balancing Data Approach for Improved Software Defect Prediction,” Indones. J. Electron. Electromed. Eng. Med. Informatics, vol. 7, no. 2, pp. 232–244, Apr. 2025, doi: 10.35882/ijeeemi.v7i2.67.

[3]      Y. Zhao, Y. Zhu, Q. Yu, and X. Chen, “SS symmetry Cross-project Defect Prediction Considering Multiple Data,” pp. 1–18, 2022.

[4]      N. Kumar, O. P. Sangwan, and S. Beniwal, “Software defect prediction using machine learning techniques,” 2025, p. 040004, doi: 10.1063/5.0299059.

[5]      A. S. Nugraha, M. R. Faisal, F. Abadi, and ..., “Deep Neural Network on Software Defect Prediction,” … Softw. …, vol. 02, no. 02, pp. 82–89, 2021, [Online]. Available: https://jurnalmahasiswamipa.ulm.ac.id/index.php/integer/article/view/44%0Ahttps://jurnalmahasiswamipa.ulm.ac.id/index.php/integer/article/download/44/20.

[6]      E. H. A. Prastyo, M. A. Yaqin, Suhartono, M. Faisal, and R. A. J. Firdaus, “Naive Bayes Classification for Software Defect Prediction,” Trans. Informatics Data Sci., vol. 1, no. 1, pp. 11–20, 2024, doi: 10.24090/tids.v1i1.12192.

[7]      Z. Li, J. Niu, and X.-Y. Jing, “Software defect prediction: future directions and challenges,” Autom. Softw. Eng., vol. 31, no. 1, p. 19, May 2024, doi: 10.1007/s10515-024-00424-1.

[8]      M. R. Kusuma, Windu Gata, Sigit Kurniawan, Dedi Dwi Saputra, and Supriadi Panggabean, “Software Defect Prediction For Quality Evaluation Using Learning Techniques Ensemble Stacking,” Inspir. J. Teknol. Inf. dan Komun., vol. 13, no. 2, pp. 1–13, Nov. 2023, doi: 10.35585/inspir.v13i2.58.

[9]      D. A. Rebro, S. Chren, and B. Rossi, “Source Code Metrics for Software Defects Prediction,” Proc. ACM Symp. Appl. Comput., pp. 1469–1472, 2023, doi: 10.1145/3555776.3577809.

[10]    A. K. Dewi, H. Y. Azmina, and Y. Sugiarti, “Penggunaan Metrik Software Defect dalam Pengembangan Perangkat Lunak: Literature Review,” Acad. J. Comput. Sci. Res., vol. 6, no. 2, p. 89, 2024, doi: 10.38101/ajcsr.v6i2.15649.

[11]    M. SALMAN Saeed, M. Salman Saeed, M. Saleem, and S. Saeed, “Cross Project Software Defect Prediction Using Machine Learning: A Review,” Pp, vol. 3, no. June, p. 52, 2024, [Online]. Available: https://www.researchgate.net/publication/374557048.

[12]    A. B. Nassif et al., “Software defect prediction using learning to rank approach,” Sci. Rep., vol. 13, no. 1, p. 18885, Nov. 2023, doi: 10.1038/s41598-023-45915-5.

[13]    M. Nevendra and P. Singh, “Cross-project Defect Prediction with Metrics Selection and Balancing Approach,” Appl. Comput. Syst., vol. 27, no. 2, pp. 137–148, Dec. 2022, doi: 10.2478/acss-2022-0015.

[14]    K. Javed, R. Shengbing, M. Asim, and M. A. Wani, “Cross-project Defect Prediction Based on Domain adaptation and LSTM Optimization,” Algorithms, vol. 17, no. 5, p. 175, Apr. 2024, doi: 10.3390/a17050175.

[15]    Y. Z. Bala, P. Abdul Samat, K. Y. Sharif, and N. Manshor, “The influence of machine learning on the predictive performance of cross-project defect prediction: empirical analysis,” TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 22, no. 4, p. 830, Aug. 2024, doi: 10.12928/telkomnika.v22i4.25916.

[16]    A. O. Balogun et al., “Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study,” Symmetry (Basel)., vol. 12, no. 7, p. 1147, Jul. 2020, doi: 10.3390/sym12071147.

[17]    E. U. Oyo-ita, E. A. Edim, A. Otiko, and D. E. Izuki, “Improving model performance for software defect detection and prediction using ensemble method and cross validation techniques Improving model performance for software defect detection and prediction using ensemble method and cross validation techniques,” no. August, 2024.

[18]    M. Dewi, T. H. Saragih, and R. Herteno, “Penerapan SMOTE-NCL untuk Mengatasi Ketidakseimbangan Kelas pada Klasifikasi Penyakit Jantung Koroner,” J. Inform. Polinema, vol. 10, no. 1, pp. 27–34, 2023, doi: 10.33795/jip.v10i1.1394.

[19]    C. B. Handoko and C. S. K. Aditya, “Penerapan Teknik SMOTE Dalam Mengatasi Imbalance Data Penyakit Diabetes Menggunakan Algoritma ANN,” Smart Comp Jurnalnya Orang Pint. Komput., vol. 14, no. 1, pp. 13–20, 2025, doi: 10.30591/smartcomp.v14i1.7045.

[20]    Y. Liu, W. Zhang, G. Qin, and J. Zhao, “A comparative study on the effect of data imbalance on software defect prediction,” Procedia Comput. Sci., vol. 214, pp. 1603–1616, 2022, doi: 10.1016/j.procs.2022.11.349.

[21]    S. R. Goyal, “A systematic review on AI based class imbalance handling in software defect prediction,” Results Eng., vol. 27, p. 106578, Sep. 2025, doi: 10.1016/j.rineng.2025.106578.

[22]    R. A. Astari, I. M. Sumertajaya, and A. M. Soleh, “A Hybrid Sampling Approach for Handling Data Imbalance in Ensemble Learning Algorithms,” Sci. J. Informatics, vol. 12, no. 2, pp. 247–258, 2025, doi: 10.15294/sji.v12i2.19163.

[23]    M. Ilham, A. Winarno, M. Lutfi, and A. Indrasetianingsih, “Handling Imbalanced Fraudulent Transaction Data Using SMOTE-Tomek and Random Forest: A Classification Approach,” BEST  J. Appl. Electr. Sci. Technol., vol. 7, no. 1, pp. 35–38, Mar. 2025, doi: 10.36456/best.vol7.no1.10335.

[24]    Emma Andini, M. R. Faisal, Rudy Herteno, R. A. Nugroho, Friska Abadi, and Muliadi, “Peningkatan Kinerja Prediksi Cacat Software Dengan Hyperparameter Tuning Pada Algoritma Klasifikasi Deep Forest,” J. Mnemon., vol. 5, no. 2, pp. 119–127, 2022, doi: 10.36040/mnemonic.v5i2.4793.

[25]    A. J. Anju and J. E. Judith, “Hybrid feature selection method for predicting software defect,” J. Eng. Appl. Sci., vol. 71, no. 1, p. 124, Dec. 2024, doi: 10.1186/s44147-024-00453-3.

[26]    M. Abdelhamid and A. Desai, “Balancing the Scales: A Comprehensive Study on Tackling Class Imbalance in Binary Classification,” 2024, [Online]. Available: http://arxiv.org/abs/2409.19751.

[27]    Z. Fadia Laila Hidayati, F. Abadi, M. Reza Faisal, and D. Kartini, “Improving Performance of Random Forest Algorithm Using Abc Feature Selection for Software Defect Prediction,” J. CoreIT, vol. 10, no. 1, pp. 32–39, 2024, doi: 10.24014/coreit.v10i1.29283.

[28]    H. A. Alhija, M. Azzeh, and F. Almasalha, “Software Defect Prediction Using Support Vector Machine,” Int. J. Syst. Innov., vol. 7, no. 2, pp. 37–47, 2022, doi: 10.6977/IJoSI.202206_7(2).0003.

[29]    Abdulrazak Muhammad Gatawa and Yahaya Isah Shehu, “Significance of Support Vector Machine Classifier for Predicting Defects in Software Development,” J. Sci. Innov. Technol. Res., Feb. 2025, doi: 10.70382/ajsitr.v7i9.010.

[30]    W. Albattah and M. Alzahrani, “Software Defect Prediction Based on Machine Learning and Deep Learning Techniques: An Empirical Approach,” AI, vol. 5, no. 4, pp. 1743–1758, Sep. 2024, doi: 10.3390/ai5040086.

[31]    N. P. Kartika, R. Herteno, I. Budiman, and D. T. Nugrahadi, “Comparative Performance Evaluation of Linear , Bagging , and Boosting Models Using BorutaSHAP for Software Defect Prediction on NASA MDP Datasets,” vol. 6, no. 6, pp. 5821–5836, 2025.

[32]    K. Anand, A. K. Jena, H. Das, S. S. Askar, and M. Abouhawwash, “Software defect prediction using wrapper-based dynamic arithmetic optimization for feature selection,” Conn. Sci., vol. 37, no. 1, 2025, doi: 10.1080/09540091.2025.2461080.

[33]    H. Ghinaya, R. Herteno, M. R. Faisal, A. Farmadi, and F. Indriani, “Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 3, pp. 276–288, May 2024, doi: 10.35882/jeeemi.v6i3.453.

[34]    X. Wei, “Research on Preprocessing Techniques for Software Defect Prediction Dataset Based on Hybrid Category Balance and Synthetic Sampling Algorithm,” Procedia Comput. Sci., vol. 262, pp. 840–848, 2025, doi: 10.1016/j.procs.2025.05.117.

[35]    K. C. Ukpe and C. I. Amannah, “Hybrid Software Model for Defect Detection and Cost Evaluation Using Support Vector Machine Algorithm,” J. Comput. Commun., vol. 13, no. 04, pp. 244–264, 2025, doi: 10.4236/jcc.2025.134016.

[36]    S. Harianto, “The Improving Cross-project Software Defect Prediction with CORAL-Based Domain adaptation and Ensemble Learning,” Telematika, vol. 22, no. 1, Oct. 2025, doi: 10.31315/telematika.v22i1.14939.

[37]    W. E. Y. Retnani, M. ’Ariful Furqon, and J. Setiawan, “Improving Software Defect Prediction With a Combination of Feature Selection Based On Ant Colony Optimization and Ensemble Technique,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, Aug. 2024, doi: 10.22219/kinetik.v9i4.2038.

[38]    J. Chen, J. Ding, K. C. Tan, J. Qian, and K. Li, “MBL-CPDP: A Multi-Objective Bilevel Method for Cross-project Defect Prediction,” IEEE Trans. Softw. Eng., vol. 51, no. 8, pp. 2305–2328, 2025, doi: 10.1109/TSE.2025.3577808.