Erdinç UZUN, Cihat ERDOĞAN, Ahmet SAYGILI

HİYERARŞİK KÜMELEME MODELİ KULLANAN WEB TABANLI BİR ÖDEV DEĞERLENDİRME SİSTEMİ

Ödevlerin öğrencilerin öğretim sürecinde önemli bir yeri bulunmaktadır. Klasik bir ödev değerlendirme sürecinde ödevin sadece doğru olup olmadığı değerlendirilmektedir. Ancak, ödevin öğretime daha iyi katkı verebilmesi için öğrencilerin yapmış oldukları intihallerin de göz önüne alınması gerekmektedir. İntihalleri ve intihallerin oranını tespit etme son derece zor bir ödev değerlendirme prosedürüdür. Bu çalışmada, bu prosedürü kolaylaştıracak doküman benzerliği ölçütlerini hiyerarşik kümeleme modeli ile bütünleştirebilen web tabanlı bir uygulama tanıtılacaktır. Bu uygulama, kimlerin benzer ödev yaptığını ve ödevlerin hangi oranda benzerliğe sahip olduğunu değerlendirme imkânı vermektedir. Bu uygulamanın doküman benzerliği hesaplanmasında Cosine, Jaccard ve Dice benzerlik ölçütleri denenmiştir. Diğer taraftan hiyerarşik kümeleme tarafında Tek Bağlantı, Tam Bağlantı ve Ortalama Grup olmak üzere üç farklı algoritma incelenmiştir. Önceki yıllara ait iki öğretim dönemini kapsayan 6 farklı öğretim üyesinin 18 farklı dersine ait 54 ödevi içeren bir test verisi oluşturulmuştur. Her ödev için doküman benzerlik ölçütlerinin ve kümeleme algoritmalarının çaprazlanmasından 9 farklı sonuç elde edilmiş ve hiyerarşik kümeleme algoritmalarının ne kadar iyi olduğunu test etmek için cophenetic korelasyon katsayıları hesaplanmıştır. Sonuçlar analiz edildiğinde, doküman benzerliğinde Jaccard ölçütü ve hiyerarşik kümelemede Ortalama Grup algoritmasının en uygun ödev değerlendirme çifti olduğu görülmüştür.

Anahtar Kelimeler:

Hiyerarşik kümeleme, Doküman benzerliği, Yerel İntihal Tespiti, Yazılım Geliştirme

A WEB-BASED ASSIGMENT EVALUATION APPLICATION USING HIERARCHICAL CLUSTERING MODEL

Assignments are one of the most important parts of education process of students. In the classical assignment evaluation process, an assignment can be evaluated whether it is correct or not. However, for the assignments to give better contribution to education, plagiarisms committed by students should be considered. Detection of plagiarism and its extent are extremely difficult assignment evaluation procedures. In this study, in order to facilitate this procedure, a web-based application, which can combine document similarity measures with hierarchical clustering model, is introduced. This application gives the opportunity to evaluate which students submit similar assignments and the assignments’ similarity degree. Cosine, Dice and Jaccard similarity measures have been investigated in terms of document similarity calculation of this application. On the other hand, three different algorithms including Single Linkage, Complete Linkage and Average Group are examined in hierarchical clustering side. Test data which covers two education period of previous years and contains 54 different assignments of 18 different courses of 6 lecturers, are created. By using document similarity methods and hierarchical clustering algorithms, 9 different cophenetic correlation coefficients are obtained for each assignment and cophenetic correlation coefficients are calculated to test how well hierarchical clustering algorithms are . When the results were analyzed, it was discovered that Jaccard measure in document similarity and Average Group algorithm in hierarchical clustering is the best matching assignment evaluation pair.

Keywords:

Hierarchical Clustering, Document similarity, Plagiarism Detection, Software Development,

PDF

___

Arda, B. (2003). Üniversitenin araştırma işlevi ve etik. C. Ü. Tıp Fakültesi Dergisi, 25 (4), 7-11.
Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H. C. ve Vursavas, O. M.(2008). Information retrieval on Turkish texts. Journal of the American Society for Information Science and Technology, 59 (3), 407-421.
Çiftçi, K.(2011). Minimum spanning tree reflects the alterations of the default mode network during Alzheimer's Disease. Annals of Biomedical Engineering, 39, 1493-1504.
Dice, L. R.(1945). Measures of the Amount of Ecologic Association Between Species. Ecology, 26 (3), 297-302. Donaldson J. L., Green, B. ve Sposato, P. H.(1981). A plagiarism detection system. Proceedings of the 12th SIGCSE symposium on Computer science education, 13 (1), 21-25.
Fung, B. K., Wang, K. ve Ester, M.(2004). Hierarchical Document Clustering. Encyclopedia of Data Warehousing and Mining, Idea Group Reference, 1 (A-H), 555-559.
Heintze, N.(1996). Scalable document fingerprinting. In Proceedings of the Second USENIX Workshop on Electronic Commerce, 191-200.
Karabulut, M., Gürbüz, M. ve Sandal, E. K.(2008). Hiyerarşik Küme Tekniği Kullanılarak Türkiye’deki İllerin Sosyo-ekonomik Benzerliklerinin Analizi. SDÜ Sosyal Bilimler Enstitüsü Dergisi, 3 (5), 65-78.
Lingxiao, J., Su, Z. ve Chiu, E.(2007). Context-based detection of clone-related bugs. Proceedings of the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, 55-64, Dubrovnik, Croatia.
Liu, X., Xu, C. ve Ouyang, B. (2015). Plagiarism Detection Algorithm for Source Code in Computer Science Education. International Journal of Distance Education Technologies (IJDET), 13(4), 29-39.
Ceglarek, D.(2013). Evaluation of the SHAPD2 algorithm efficiency in plagiarism detection tasks. Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), 2013 International Conference on. IEEE, 9-11 May, Konya, Turkey.
Lyon, C., Malcolm, J. ve Dickerson, B.(2001). Detecting short passages of similar text in large document collections. In Proceedings of Conference on Empirical Methods in Natural Language Processing, 118-125. Manning, C. D., Raghavan, P. ve Schütze, H., Introduction to Information Retrieval. Cambridge University Press, http://informationretrieval.org/, 2008.
Samuel, M. ve Zelda, F.(2006). Similarity and originality in code: plagiarism and normal variation in student assignments. Proceedings of the 8th Australian conference on Computing education, 52, 143-150, Hobart, Australia, Australian Computer Society.
Schleimer, S., Wilkerson, D. S. ve Aiken, A.(2003). Winnowing: local algorithms for document fingerprinting. In Proceedings of the 2003 ACM SIGMOD - International Conference on Management of Data, pp. 76-85.
Tan, P. N., Steinbach, M. ve Kumar, V.(2005). Introduction to Data Mining. Addison-Wesley, ISBN 0-321-32136-7, Bölüm 8: pp. 500.
Uzun, E.(2012). A fuzzy ranking approach for improving search results in Turkish as an agglutinative language. Expert Systems with Applications, 39(5), 5658-5664.
Uzun, E., Karakuş, T., Kurşun, E. ve Karaaslan, H.(2007). Öğrenci gözüyle aşırma (intihal): neden ve çözüm önerileri. Akademik Bilişim ’07, Bildiri Kitabı (pp. 183-188), Dumlupınar Üniversitesi, Kütahya.
Wise, M.(1992). Detection of similarities in student programs: YAP'ing may be preferable to plague'ing. SIGCSE '92 Proceedings of the twenty-third SIGCSE technical symposium on Computer science education, 24 (1), 268-271.
Yerra, R. ve Ng, Y.(2005). A Sentence-Based Copy Detection Approach for Web Documents. Fuzzy Systems and Knowledge Discovery, Lecture Notes in Computer Science 557-570, Springer.
Yeşilbudak, M., Kahraman, H.T. ve Karacan H.(2011). Veri madenciliğinde nesne yönelimli birleştirici hiyerarşik kümeleme modeli. Gazi Üniversitesi Mühendislik Mimarlık Dergisi, 26 (1), 27-39.