Perangkat lunak banyak memainkan peran penting. Karena itu, tugas untuk menjamin kualitas, seperti pengujian perangkat lunak merupakan hal yang mendasar dan penting. Tetapi pengujian perangkat lunak merupakan pekerjaan yang sangat mahal, baik dalam penggunaan biaya maupun waktu. Oleh karena itu penting bagi sebuah perusahaan pengembangan perangkat lunak untuk melakukan pengujian kualitas perangkat lunak dengan biaya seminimal mungkin. Naïve Bayes pada prediksi cacat perangkat lunak telah menunjukkan kinerja yang baik yaitu menghasilkan rata-rata probabilitas sebesar 71 persen. Selain itu juga merupakan pengklasifikasi yang sederhana dan waktu yang diperlukan pada proses pembelajaran membutuhkan waktu lebih cepat dibandingkan dengan algoritma machine learning lainnya. Dataset NASA sangat populer digunakan pada pengembangan metode prediksi cacat perangkat lunak, yang bersifat umum dan dapat digunakan bebas oleh para peneliti. Dari penelitian yang dilakukan sebelumnya terdapat dua permasalahan utama pada prediksi cacat perangkat lunak yaitu masalah noise atribut dan ketidakseimbangan kelas. Penerapan teknik SMOTE (Synthetic Minority Over-Sampling Technique) menghasilkan hasil yang baik dan efektif untuk menangani ketidakseimbangan kelas yang mengalami overfitting pada proses teknik oversampling untuk kelas minoritas (positif). Dan information gain digunakan dalam pemilihan atribut untuk menangani noise atribut. Eksperimen yang dilakukan untuk membandingkan hasil kinerja Naive Bayes sebelum dan setelah diterapkan metode SMOTE dan Information Gain, serta dibandingkan dengan hasil pengklasifikasi lain seperti algoritma OneR dan SVM. Setelah dilakukan eksperimen bahwa penerapan metode SMOTE dan Information Gain terbukti dapat menangani masalah ketidakseimbangan kelas dan noise atribut pada prediksi cacat perangkat lunak.
Tesis Sukmawati Anggraeni Putri
Catal, C. (2011). Software fault prediction: A literature review and current trends. Expert Systems with Applications, 38(4), 4626–4636. doi:10.1016/j.eswa.2010.10.024
Chawla, N. V, Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE : Synthetic Minority Over-sampling Technique, 16, 321–357.
Dawson, C. W. (2009). Projects in Computing and Information Systems A Student’s Guide (2nd ed). Great Britain: Pearson Education.
De Carvalho, A. B., Pozo, A., & Vergilio, S. R. (2010). A symbolic faultprediction metode based on multiobjective particle swarm optimization. Journal of Systems and Software, 83(5), 868–882.
Demsar, J. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets. The Journal of Machine Learning Research, 7, 1–30.
Domingos, P. (1997). On the Optimality of the Simple Bayesian Classifier under Zero-One Loss. Machine Learning, 29(2-3), 103–130.
Elish, K. O., & Elish, M. O. (2008). Predicting defect-prone software modules using support vector machines. Journal of Systems and Software, 81(5), 649–660.
Friedman, M. (1937). The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. Journal of the American Statistical Association, 32, 675–701.
Gao, K., & Khoshgoftaar, T. M. (2011). Software Defect Prediction for HighDimensional and Class-Imbalanced Data. Conference: Proceedings of the 23rd International Conference on Software Engineering & Knowledge Engineering.
Guyon, I. (2003). An Introduction to Variable and Feature Selection 1 Introduction. Journal of Machine Learning Research, 3, 1157–1182.
Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2010). A Systematic Literature Review on Fault Prediction Performance in Software Engineering. IEEE Transactions on Knowledge and Data Engineering, 38(6), 1276 –1304.
Jain, M., & Richariya, V. (2012a). An Improved Techniques Based on Naïve Bayesian for Attack Detection. International Journal of Emerging Technology and Advanced Engineering, 2(1), 324–331.
Jain, M., & Richariya, V. (2012b). An Improved Techniques Based on Naïve Bayesian for Attack Detection, 2(1), 324–331.
Kabir, M., & Murase, K. (2012). Expert Systems with Applications A new hybrid ant colony optimization algorithm for feature selection. Expert Systems With Applications, 39(3), 3747–3763.
Khoshgoftaar, T. M., & Gao, K. (2009). Feature Selection with Imbalanced Data for Software Defect Prediction. 2009 International Conference on Machine Learning and Applications, 235–240.
Kohavi, R., & Edu, S. (1995). A Study of Cross-Validation and Bootstrap for Accuracy Estimation and M o d e l Selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI), 1137–1143.
Lessmann, S., Member, S., Baesens, B., Mues, C., & Pietsch, S. (2008). Benchmarking Classification Metodes for Software Defect Prediction : A Proposed Framework and Novel Findings. IEEE Transactions on Software Engineering, 34(4), 485–496.
Ling, C. X. (2003). Using AUC and Accuracy in Evaluating Learning Algorithms, 1–31.
Ling, C. X., & Zhang, H. (2003). AUC: a statistically consistent and more discriminating measure than accuracy. Proceedings of the 18th International Joint Conference on Artificial Intelligence.
Liu, H., & Yu, L. (2005). Toward Integrating Feature Selection Algorithms for Classi fi cation and Clustering. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 17(4), 491 – 502. doi:10.1109/TKDE.2005.66
Liu, Y., Yu, X., Huang, J. X., & An, A. (2011). Combining integrated sampling with SVM ensembles for learning from imbalanced datasets. Information Processing & Management, 47(4), 617–631.
Mccabe, T. J. (1976). A Complexity Measure. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,, SE-2(4), 308–320.
McDonald, M., Musson, R., & Smith, R. (2007). The Practical Guide to Defect Prevention (p. 480). USA: Microsoft Press Redmond.
Menzies, T., Greenwald, J., & Frank, A. (2007). Data Mining Static Code Attributes to Learn Defect Predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.
Riquelme, J. C., Ruiz, R., & Moreno, J. (2008). Finding Defective Modules from Highly Unbalanced Datasets. Engineering, 2(1), 67–74.
Rivera, J., & Meulen, R. van der. (2014). Gartner Says Worldwide IT Spending on Pace to Reach $3.8 Trillion in 2014. Retrieved August 01, 2015, from http://www.gartner.com/newsroom/id/2643919
Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2013). Data Quality : Some Comments on the NASA Software Defect Data Sets. Software Engineering, IEEE Transactions, 39(9), 1–13.
Song, Q., Jia, Z., Shepperd, M., Ying, S., & Liu, J. (2011). A General Software Defect-Proneness Prediction Framework. IEEE Transactions on Software Engineering, 37(3), 356–370.
Turhan, B., & Bener, A. (2009). Analysis of Naive Bayes‟ assumptions on software fault data: An empirical study. Data & Knowledge Engineering, 68(2), 278–290.
Wahono, R. S., & Suryana, N. (2013). Combining Particle Swarm Optimization based Feature Selection and Bagging Technique for Software Defect Prediction. International Journal of Software Engineering and Its Applications, 7(5), 153–166.
Wang, H., Khoshgoftaar, T. M., Gao, K., & Seliya, N. (2009). High-Dimensional Software Engineering Data and Feature Selection. Proceedings of 21st IEEE International Conference on Tools with Artificial Intelligence, Nov. 2-5, 83–90.
Wang, T., Li, W., Shi, H., & Liu, Z. (2011). Software Defect Prediction Based on Classifiers Ensemble. Journal of Information & Computational Science 8, 16(December), 4241–4254.
Wilcoxon, F. (1945). Individual Comparisons by Ranking Methods. International Biometric Society Stable, 1(6), 80–83.
Yap, B. W., Rani, K. A., Aryani, H., Rahman, A., Fong, S., Khairudin, Z., & Abdullah, N. N. (2014). An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets. Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013), 285, 13–23