Orange 3 ile Türkçe ve İngilizce SMS Mesajlarında Spam Tespiti

Kısa mesajlaşma servisi (SMS), haberleşme ve bilgilendirme için hızlı ve etkili bir yoldur. Kısa süre içerisinde büyük kitlelere ulaşabilme imkanı sağlayabilir. Ancak bu özelliklerin kötü amaçlara yönelik kullanılması kullanıcılar için problem oluşturmaktadır. İstenmeyen, kandırma amaçlı, kötü içerikli ve yanlış bilgi içeren vb. mesajlar gönderilebilmektedir. Bu problemlerin giderilmesi ve daha güvenli bir ortam sağlanması amacıyla bu çalışmada TurkishSMS mesaj ve UCI SMS Spam koleksiyonları kullanılarak Türkçe ve İngilizce içeriklere sahip SMS’ler için spam tespiti yapılmıştır. Çalışmada doğruluk ve hata oranları baz alındığında TurkishSMS veri kümesi için Sinir Ağları, UCI SMS Spam veri kümesi için ise Naive Bayes algoritmasının büyük bir doğruluk ve daha az kaçırma oranına sahip olduğu tespit edilmiştir.

Spam Detection in Turkish and English SMS Messages with Orange 3

Short messaging service (SMS) is a fast and effective way to communicate and inform. Also, it can provide access to large masses in a short time. However, the use of these features for malicious purposes is a problem for users. Unsolicited messages can be sent like cheating, bad content and misinformation etc. In order to solve these problems and to provide a safer environment, spam detection was done for Turkish and English SMS messages by using TurkishSMS message and UCI SMS Spam collections. Based on the accuracy and error rates in the study, it was determined that the Neural Network for the TurkishSMS dataset and the Naive Bayes algorithm for the UCI SMS Spam data set had a great accuracy and less missing rate.

___

  • [1] Shafi’I, M. A., Latiff, M. S. A., Chiroma, H., Osho, O., Abdul-Salaam, G., Abubakar, A. I., Herawan, T. 2017. A review on mobile SMS spam filtering techniques. IEEE Access, 5, 15650-15666.
  • [2] Joe, I., Shim, H. 2010. An SMS spam filtering system using support vector machine. In International Conference on Future Generation Information Technology, Springer, Berlin, Heidelberg, 577-584.
  • [3] Liu, J. Y., Zhao, Y. H., Zhang, Z. X., Wang, Y. H., Yuan, X. M., Hu, L., Dong, Z. J. 2012. Spam short messages detection via mining social networks. Journal of Computer Science and Technology, 27(3), 506-514.
  • [4] Nuruzzaman, M. T., Lee, C., Choi, D. 2011. Independent and personal SMS spam filtering. In 2011 IEEE 11th International Conference on Computer and Information Technology, IEEE, 429-435.
  • [5] Almeida, T. A., Hidalgo, J. M. G., Yamakami, A. 2011. Contributions to the study of SMS spam filtering: new collection and results. In Proceedings of the 11th ACM symposium on Document engineering, ACM, 259-262.
  • [6] Mathew, K., Issac, B. 2011. Intelligent spam classification for mobile text message. In Proceedings of 2011 International Conference on Computer Science and Network Technology, IEEE, Vol. 1, 101-105.
  • [7] Uysal, A. K., Gunal, S., Ergin, S., Gunal, E. S. 2012. A novel framework for SMS spam filtering. In 2012 International Symposium on Innovations in Intelligent Systems and Applications, IEEE, 1-4.
  • [8] Uysal, A. K., Gunal, S., Ergin, S., Gunal, E. S. 2013. The impact of feature extraction and selection on SMS spam filtering. Elektronika ir Elektrotechnika, 19(5), 67-73.
  • [9] Kim, S. E., Jo, J. T., Choi, S. H. 2015. SMS spam filterinig using keyword frequency ratio. International Journal of Security and Its Applications, 9(1), 329-336.
  • [10] Modupe, A., Olugbara, O. O. Ojo, S. O. 2014. Filtering of mobile short messaging service communication using latent Dirichlet allocation with social network analysis. In Transactions on Engineering Technologies, Springer, Dordrecht, 671-686.
  • [11] Alzahrani, A. J., Ghorbani, A. A. 2014. SMS mobile botnet detection using a multi-agent system: research in progress. In Proceedings of the 1st International Workshop on Agents and CyberSecurity, ACM, 2.
  • [12] Yadav, K., Saha, S. K., Kumaraguru, P., Kumra, R. 2012. Take control of your SMSes: Designing an usable spam SMS filtering system. In 2012 IEEE 13th International Conference on Mobile Data Management, IEEE, 352-355.
  • [13] Foozy, C. F. M., Ahmad, R., Abdollah, M. F., Wen, C. C. 2017. A Comparative Study with RapidMiner and WEKA Tools over some Classification Techniques for SMS Spam. In IOP Conference Series: Materials Science and Engineering, IOP Publishing, Vol. 226, No. 1, 012100.
  • [14] Bozan, Y. S., Çoban, Ö., Özyer, G. T., Özyer, B. 2015. SMS spam filtering based on text classification and expert system. In 2015 23nd Signal Processing and Communications Applications Conference (SIU), IEEE, 2345-2348.
  • [15] Kawade, D. R., Oza, K. S. 2015. SMS spam classification using WEKA. International Journal of Electronics Communication and Computer Technology, 5, 43-7.
  • [16] UCI Machine Learning Repository. SMS Spam Collection Data Set. https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection (Erişim Tarihi: 15.03.2019).
  • [17] Pattern Analysis and Recognition Group. TurkishSMS Dataset. http://ceng.eskisehir.edu.tr/par/ (Erişim Tarihi: 15.03.2019).
  • [18] Orange. https://orange.biolab.si/ (Erişim Tarihi: 15.03.2019).
  • [19] Orange 3 Text Mining. Preprocess Text. https://orange3-text.readthedocs.io/en/latest/widgets/preprocesstext.html (Erişim Tarihi: 15.03.2019).