Penerapan Information Gain pada K-Nearest Neighbor untuk Klasifikasi Tingkat Kognitif Soal pada Taksonomi Bloom

research
  • 21 Apr
  • 2020

Penerapan Information Gain pada K-Nearest Neighbor untuk Klasifikasi Tingkat Kognitif Soal pada Taksonomi Bloom

Taksonomi Bloom merupakan sistem klasifikasi yang digunakan untuk mendefinisikan dan membedakan tingkat kognisi (berpikir, belajar, dan memahami) manusia yang berbeda-beda. Tujuan awal dalam pembuatan taksonomi adalah memfokuskan pada tiga domain utama dari pembelajaran, yaitu kognitif, afektif, dan psikomotorik. Saat ini, kalangan akademisi mengidentifikasi tingkat kognitif Bloom sebuah pertanyaan secara manual. Namun, hanya sedikit akademisi yang dapat mengidentifikasi tingkat kognitif dengan benar, sehingga sebagian besar melakukan kesalahan mengkategorisasikan pertanyaan. K-Nearest Neighbor (KNN) merupakan metode sederhana namun efektif untuk kategorisasi tingkat kognitif soal pada taksonomi Bloom, namun KNN memiliki dimensi vektor yang besar. Untuk menyelesaikan masalah tersebut diperlukan metode Information Gain (IG) untuk mengurangi dimensi vektor teks. Beberapa eksperimen dilakukan untuk mendapatkan arsitektur yang optimal dan menghasilkan klasifikasi yang akurat. Hasil dari 10 eksperimen pada dataset Question Bank dengan KNN didapatkan akurasi terbesar adalah 59,97% dan kappa terbesar adalah 0,496. Kemudian pada KNN+IG didapatkan akurasi terbesar adalah 66,18% dan kappa terbesar adalah 0,574. Maka dapat disimpulkan klasifikasi tingkat kognitif soal pada taksonomi Bloom dengan menggunakan metode KNN+IG lebih akurat dibanding dengan metode KNN saja.   

 

Unduhan

 

REFERENSI

[1] K. O. Jones, J. Harland, J. M. V Reid, and R. Bartlett, “Relationship between examination questions and bloom’s taxonomy,” in Proceedings - Frontiers in Education Conference, FIE, 2009.

[2] N. N. Khairuddin and K. Hashim, “Application of Bloom’s taxonomy in software engineering assessments,” Proc. 8th Conf. Appl. Comput. Sci., pp. 66–69, 2008.

[3] E. Thompson, A. Luxton-Reilly, J. L. Whalley, M. Hu, and P. Robbins, “Bloom’s taxonomy for CS assessment,” Conf. Res. Pract. Inf. Technol. Ser., vol. 78, pp. 155–161, 2008.

[4] N. Yusof and C. J. Hui, “Determination of Bloom’s cognitive level of question items using artificial neural network,” in Proceedings of the 2010 10th International Conference on Intelligent Systems Design and Applications, ISDA’10, 2010, pp. 866–870.

[5] U. et al Fayyad, “From Data Mining to Knowledge Discovery in Databases,” Comput. J., vol. 58, no. 1, pp. 1–6, 1996.

[6] S. M. Weiss et al., “Maximizing text-mining performance,” Intell. Syst. their Appl. IEEE, vol. 14, no. 4, pp. 63–69, 1999.

[7] H. Cherfi, A. Napoli, and Y. Toussaint, “Towards a text mining methodology using association rule extraction,” Soft Comput., vol. 10, no. 5, pp. 431–441, 2005.

[8] G. Miner, Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications. 2012.

[9] A. Genkin, D. D. Lewis, and D. Madigan, “Large-Scale Bayesian Logistic Regression for Text Categorization,” Technometrics, vol. 49, no. 3, pp. 291–304, 2007.

[10] E. Alpaydm, Introduction to Machine Learning. MIT press, 2014.

[11] J.-S. SU, “Advances in Machine Learning Based Text Categorization,” J. Softw., vol. 17, no. 9, p. 1848, 2006.

[12] C. Apté, F. Damerau, and S. M. Weiss, “Automated learning of decision rules for text categorization,” ACM Trans. Inf. Syst., vol. 12, no. 3, pp. 233–251, 1994.

[13] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “An kNN model-based approach and its application in text categorization,” in International Conference on Intelligent Text Processing and Computational Linguistics, 2004, pp. 559–570.

[14] E. Frank and R. R. Bouckaert, “Naive bayes for text classification with unbalanced classes,” PKDD’06 Proc. 10th Eur. Conf. Princ. Pract. Knowl. Discov. Databases, pp. 503– 510, 2006.

[15] P. Srinivasan and M. E. Ruiz, “Hierarchical Text Categorization Using Neural Networks,” Inf. Retr. Boston., vol. 5, pp. 87 – 118, 2002.

[16] J. J. Rocchio, “Relevance feedback in information retrieval,” SMART Retr. Syst., pp. 313–323, 1971.

[17] R. C. Chen and C. H. Hsieh, “Web page classification based on a support vector machine using a weighted vote schema,” Expert Syst. Appl., vol. 31, no. 2, pp. 427–435, 2006.

[18] S. Tan, “An improved centroid classifier for text categorization,” Expert Syst. Appl., vol. 35, no. 1–2, pp. 279– 285, 2008.

[19] E. Blanzieri and A. Bryl, “A survey of learning-based techniques of email spam filtering,” Artif. Intell. Rev., vol. 29, no. 1, pp. 63–92, 2008.

[20] Q. Ye, Z. Q. Zhang, and R. D. A.-A. D. O.-10. 1016/j. eswa. 2008. 07. 03. Law, “Sentiment classification of online reviews to travel destinations by supervised machine learning approaches,” Expert Syst. Appl., vol. 36, p. 6527–6535 ST– Sentiment classification of online, 2009.

[21] P.-Y. Hao, J.-H. Chiang, and Y.-K. Tu, “Hierarchically SVM classification based on support vector clustering method and its application to document categorization,” Expert Syst. Appl., vol. 33, no. 3, pp. 627–635, 2007.

[22] A. Esuli, T. Fagni, and F. Sebastiani, “Boosting multi-label hierarchical text categorization,” Inf. Retr. Boston., vol. 11, no. 4, pp. 287–313, 2008. 

[23] Y. Yang and X. Liu, “A re-examination of text categorization methods,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’99, 1999, pp. 42–49.

[24] A. P. de Vries, N. Mamoulis, N. Nes, and M. Kersten, “Efficient k-NN search on vertically decomposed data,” Proc. 2002 ACM SIGMOD Int. Conf. Manag. data - SIGMOD ’02, p. 322, 2002.

[25] L. R. Lu and H. Y. Fa, “A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification [J],” J. Comput. Res. Dev., vol. 4, p. 3, 2004.

[26] Z. Al Aghbari, “Array-index : a plug & search K nearest neighbors method for high-dimensional data,” Data Knowl. Eng., vol. 52, pp. 333–352, 2005.

[27] S. Wang, D. Li, X. Song, Y. Wei, and H. Li, “A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification,” Expert Syst. Appl., vol. 38, no. 7, pp. 8696–8702, 2011.

[28] P. Koncz and J. Paralic, “An approach to feature selection for sentiment analysis,” 2011 15th IEEE International Conference on Intelligent Engineering Systems. pp. 357–362, 2011.

[29] T. Xu, Q. Peng, and Y. Cheng, “Identifying the semantic orientation of terms using S-HAL for sentiment analysis,” Knowledge-Based Syst., vol. 35, pp. 279–289, 2012.

[30] G. Forman, “An extensive empirical study of feature selection metrics for text classification,” J Mach Learn Res, vol. 3, pp. 1289–1305, 2003.

[31] Y. N. Liu, G. Wang, H. L. Chen, H. Dong, X. D. Zhu, and S. J. Wang, “An Improved Particle Swarm Optimization for Feature Selection,” J. Bionic Eng., vol. 8, no. 2, pp. 191–200, 2011.

[32] C. Vercellis, Business Intelligence: Data Mining and Optomization for Decision Making. John Wiley and Sons, 2009.

[33] Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” in Proceedings of the 14 th International Conference on Machine Learning., 1997, pp. 412–420.

[34] S. Tan and J. Zhang, “An empirical study of sentiment analysis for chinese documents,” Expert Syst. Appl., vol. 34, no. 4, pp. 2622–2629, 2008.

[35] D. T. Larose, Data Mining Methodes And Model. 2006.

[36] F. Gorunescu, Data Mining: Concepts, models and techniques, 2011th ed. Springer, 2011.

[37] W. C.-M. Liaw, Yi-Ching, Leou Maw-Lin, “Fast exact k nearest neighbors search using anorthogonal search tree,” Pattern Recognit., vol. 43, no. 6, pp. 2351–2358, 2010.

[38] Y. C. Liaw, C. M. Wu, and M. L. Leou, “Fast k-nearest neighbors search using modified principal axis search tree,” Digit. Signal Process. A Rev. J., vol. 20, no. 5, pp. 1494–1501, 2010.

[39] J. Han, M. Kamber, and J. Pei, Data mining: concepts and techniques, 3rd ed. Morgan Kaufmann, 2012.

[40] M. M. Jain and P. V. Richariya, “An Improved Techniques Based on Naive Bayesian for Attack Detection,” Int. J. Emerg. Technol. Adv. Eng. Website www.ijetae.com, vol. 2, no. 1, pp. 324–331, 2250.

[41] C. Supriyanto, N. Yusof, B. Nurhadiono, and Sukardi, “Twolevel feature selection for naive bayes with kernel density estimation in question classification based on Bloom’s cognitive levels,” in 2013 International Conference on Information Technology and Electrical Engineering (ICITEE), 2013, pp. 237–241. 

[42] L. Maimon, Oded&Rokach, Data mining and knowledge discovey handbook. New York: Springer, 2010.

[43] D. T. Larose, Data Mining Methods and Model. New Jersey: John Willey & Sons, Inc, 2006.

[44] Sumanto, Statistika Deskriptif. Yogyakarta: Center of Academic Publishing Service, 2014.