Harun DİŞLİ, Ayşe TOSUN

Konvolüsyonel Sinir Ağları İle Kod Klonlarının Tespiti

Yazılım geliştirirken kopyalama ve yeniden kullanma yoluyla oluşturulan benzer veya aynı kod parçaları, kod klonları olarak adlandırılır. Bu klonları tespit etmek için pek çok çalışma yapılmış olsa da, çalışmalar genellikle katar karşılaştırma tekniklerini kullanılmakta ve çok azı popüler araştırma alanlarından olan derin öğrenmeden faydalanmaktadır. Bu makale, konvolüsyonel sinir ağı olarak adlandırılan, popüler ve başarılı görüntü sınıflandırma yöntemine dayanan yeni bir yaklaşım önermektedir. Bu yöntem, görüntü dosyalarını oluşturmak için her aday klon çiftini sembollere ayırır. Daha sonra, konvolüsyonel sinir ağı bu görüntü verilerini “klon” veya “klon değil” etiketleriyle sınıflandırmak için kullanılır. Ağı eğitmek ve test etmek için altı milyon java metodu içeren bir veri tabanından örneklerler seçilerek kullanılmıştır. Sonuç olarak, bu yaklaşım metot bazındaki klonları % 95'lik bir doğrulukla etkili bir şekilde tespit etmektedir.

Code Clone Detection with Convolutional Neural Networks

Similar or identical code portions which are generated by copying and reusing code portions within the source code are named as code clones. While so many works have been conducted to detect these clones, they generally use string comparison techniques and very few of them take advantage of popular learning based approaches, such as deep learning. This paper proposes a new approach based on a popular and successful image classification technique named as convolutional neural network. It simply tokenizes each candidate clone pair in order to generate image files. Then, convolutional neural network is used to classify these image data with labels “clone” and “not clone”. In order to train and test the network, clone and not clone pairs are chosen from a public database including six million methods. As a result, the approach gives 99% accuracy, effectively detects clones and not clones with 2-5% false alarms rates at method granularity.

Keywords:

code clone detection deep learning, convolutional neural network,

PDF

___

[1] C. K. Roy and J. R. Cordy, “A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools”, 4th International Workshop on Mutation Analysis (MUTATION) in 2nd International Conference on Software Testing, Verification, and Validation Workshops. Denver, Colorado: IEEE Computer Society, 157–166, 1-4 April 2009.
[2] A. Sheneamer and J. Kalita, “Article: A survey of software clone detection techniques,” International Journal of Computer Applications, 137 (10), 1–21, 2016
[3] Y. Jia, D. Binkley, M. Harman, J. Krinke, and M. Matsushita, “KClone: a proposed approach to fast precise code clone detection”, 3rd International Workshop on Software Clones (IWSC), 2009
[4] C. K. Roy, J. R. Cordy, and R. Koschke. “Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach”, Sci. Comput. Program., 74(7), 470–495, 2009.
[5] B. Lague, E. M. Merlo, J. Mayrand, J. Hudepohl, “Assessing the Benefits of Incorporating Function Clone Detection in a Development Process”, IEEE International Conference on Software Maintenance (ICSM), 314-321, Oct. 1997.
[6] J. Johnson, “Visualizing textual redundancy in legacy source”, Conference of the Centre for advanced Studies on Collaborative research (CASCON), 171-183, 1994.
[7] S. Ducasse, M. Rieger, S. Demeyer, “A language independent approach for detecting duplicated code”, 15th International Conference on Software Maintenance (ICSM), 109-118, 1999.
[8] C.K. Roy, J.R. Cordy, “An empirical study of function clones in open source software systems”, 15th Working Conference on Reverse Engineering (WCRE), 81-90, 2008.
[9] B. Baker, “A program for identifying duplicated code”, 24th Symposium on the Interface, Computing Science and Statistics, 49-57, 1992.
[10] T. Kamiya, S. Kusumoto, K. Inoue, “CCFinder: A multilinguistic token-based code clone detection system for large scale source code”, IEEE Transactions on Software Engineering, 28(7), 654- 670, 2002.
[11] Z. Li, S. Lu, S. Myagmar, Y. Zhou, “CP-Miner: Finding copypaste and related bugs in large-scale software code”, IEEE Transactions on Software Engineering, 32(3), 176-192, 2006.
[12] T. Yamashina, H. Uwano, K. Fushida, Y. Kamei, M. Nagura, S. Kawaguchi, H. Iida, “SHINOBI: A real-time code clone detection tool for software maintenance”, Technical Report: NAIST-ISTR2007011, Graduate School of Information Science, Nara Institute of Science and Technology, 2008.
[13] I. Baxter, A. Yahin, L. Moura, M. Anna, “Clone detection using abstract syntax trees”, 14th International Conference on Software Maintenance (ICSM), 368-377, 1998.
[14] L. Jiang, G. Misherghi, Z. Su, S. Glondu, “DECKARD: Scalable and accurate tree-based detection of code clones”, 29th International Conference on Software Engineering (ICSE), 96-105, 2007.
[15] S. Ducasse, M. Rieger, S. Demeyer, “A language independent approach for detecting duplicated code”, 15th International Conference on Software Maintenance (ICSM), 109-118, 2009.
[16] B. Baker, “On finding duplication and near-duplication in large software systems”, 2nd Working Conference on Reverse Engineering, 86-95, 1995.
[17] R. Wettel, R. Marinescu, “Archeology of code duplication: Recovering duplication chains from small duplication fragments”, 7th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, 8, 2005.
[18] K. Kontogiannis, “Evaluation experiments on the detection of programming patterns using software metrics”, 3rd Working Conference on Reverse Engineering, 44-54, 1997.
[19] M. White, C. Vendome, M. Linares-Vásquez, D. Poshyvanyk, “Toward deep learning software repositories”, IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR), 334–345, 2015.
[20] B. Can, “LSTM Ağları ile Türkçe Kök Bulma”, Bilişim Teknolojileri Dergisi, 12(3), 183-193, 2019.
[21] H.K. Dam, T. Tran, T. Pham, “A deep language model for software code”, arXiv preprint:1608.02715, 2016.
[22] L. Li, H. Feng, W. Zhuang, N. Meng, B. Ryder, “CCLearner: A Deep Learning-Based Clone Detection Approach”, International Conference on Software Maintenance and Evolution (ICSME), 249–260, 2017.
[23] C.K. Roy, J.R. Cordy, “Near-miss function clones in open source software: an empirical study”, Journal of Software: Evolution and Process, 22(3), 165–189, 2010.
[24] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, “Distributed Representations of Words and Phrases and their Compositionality”, 26th International Conference on Neural Information Processing Systems, Nevada, A.B.D., 3111-3119, 2013.
[25] J. Svajlenko, J.F. Islam, I. Keivanloo, C.K. Roy, M.M. Mia, "Towards a Big Data Curated Benchmark of Inter-Project Code Clones", Early Research Achievements track of the 30th International Conference on Software Maintenance and Evolution (ICSME) Victoria, Canada, 2014.
[26] Internet: F. Li, J. Johnson and S. Yeung, “Convolutional Neural Networks for Visual Recognation class in Stanford University, 2018, http://cs231n.github.io/convolutional-networks/
[27] N. Davey, P. Barson, S. Field, R. Frank, “The development of a software clone detector”, International Journal of Applied Software Technology, 1(3/4), 219-236, 1995.
[28] R. Komondoor, S. Horwitz, “Using slicing to identify duplication in source code”, 8th International Symposium on Static Analysis (SAS), 40-56, 2001.
[29] M. White, M. Tufano, C. Vendome, and D. Poshyvanyk, “Deep learning code fragments for code clone detection,” 31st IEEE/ACM International Conference on Automated Software Engineering, 2016
[30] Internet: ANTLR, http://www.antlr.org
[31] A. Krizhevsky, I. Sutskever, G.E. Hinton, “ImageNet classification with deep convolutional neural networks”, International Conference on Neural Information Processing Systems (NIPS), 1106–1114, 2012
[32] K. Simonyan, A. Zisserman, “Very deep convolutional networks for large-scale image recognition”, International Conference on Learning Representations, 2014.
[33] S.E. Sahin, A. Tosun, “A Conceptual Replication on Predicting the Severity of Software Vulnerabilities”, International Conference on Evaluation and Assessment in Software Engineering (EASE), Copenhagen, 2019.
[34] J. Rokui, “Autoassociative Signature Authentication Based on Recurrent Neural Network”, Artificial Intelligence and Soft Computing, Editors: L. Rutkowski, R. Scherer, M. Korytkowski, W. Pedrycz, R. Tadeusiewicz, J.M. Zurada, Springer, 88-96, 2018.
[35] S. Agarwal, H.S. Sikchi, S. Rooj, S. Bhattacharya, A. Routray, “Illumination-Invariant Face Recognition by Fusing Thermal and Visual Images via Gradient Transfer”, Advances in Computer Vision, Editors: K. Arai and S. Kapoor, 658-670, 2020.
[36] Internet: Y. LeCun, “Lenet, convolutional neural networks,” 2015, Available: http: //yann.lecun.com/exdb/lenet/
[37] Y. Bengio, X. Glorot, “Understanding the difficulty of training deep feedforward neural networks”, 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 249– 256, May 2010.
[38] D. Kingma and J. Ba. “Adam: A method for stochastic optimization”, International Conference on Learning Representations, 2015.
[39] M. Kızrak, B. Bolat “Derin Öğrenme ile Kalabalık Analizi Üzerine Detaylı Bir Araştırma”, Bilişim Teknolojileri Dergisi, 11(3), 263-286, 2018.
[40] C. Acı, A. Çırak, “Türkçe Haber Metinlerinin Konvolüsyonel Sinir Ağları ve Word2Vec Kullanılarak Sınıflandırılması”, Bilişim Teknolojileri Dergisi, 12(3), 219-228, 2019.