Web Günlük Dosyalarının Analizi için Web Kullanım Madenciliğinin Uygulanması

Günümüzde veri artışı inanılmaz boyutlara ulaşmıştır. Gelişen teknolojiyle birçok farklı sektörde daha kolay veri elde edilebilmektedir. Bu noktada veri madenciliği bu veri yığınlarından anlamlı bilgiye dönüşüm sürecini hızlandırmıştır. Veri madenciliği, ilk başta veri tabanlarından bilgi çıkarımı olarak ortaya çıksa da günümüzde geliştirilen yeni yöntemler ve teknolojilerin desteği ile tahmin gücünden daha fazla yararlanılmaktadır. Çalışmada veri madenciliği sınıflandırma yöntemlerinden destek vektör makineleri, web kullanım madenciliği verisi olan web günlük dosyaları üzerine uygulanmıştır. Kullanılan veri seti bir e-ticaret sitesinin 812 güne ait web günlük dosyalarıdır. Web günlük dosyaları yapılandırılmamış veri içermektedir ve bu tip verinin analizi yapılandırılmış veriye göre daha zordur. Bu nedenle analiz öncesinde verinin temizlenmesi gerekmiş ve bu süreç çalışmada uzun bir süre almıştır. Çalışmada satın alma davranışının eğilimini belirlemek hedeflenmiştir. Destek vektör makineleriyle sınıflandırma yapılmış sonuçlar lojistik regresyonla elde edilen sonuçlarla karşılaştırılmıştır. Destek vektör makineleri ile bir e-ticaret sitesi uygulamasında daha doğru sınıflandırma yapılabildiği görülmüştür.

Anahtar Kelimeler:

Web Kullanım Madenciliği, Destek Vektör Makineleri, İnternet, Sınıflandırma

Applying Web Usage Mining for the Analysis of Web Log Files

Today, size of data has reached amazing amounts. Recent advances in technology collecting data in many different sectors is getting easier. At this point, data mining has accelerated the process of transforming data to information. In the beginning, data mining has been known as information extraction from databases, but recently it is more useful for prediction by the help of new methods and technologies developed. In this study web usage mining will be performed with classification methods of data mining using web log files. The data used is an e-commerce web site’s log files of 812 days. Web log files contain unstructured data and it is very difficult to analyze it in conventional ways. Before analyzing the data, it has to be cleaned and this process takes long time. The aim of this study is finding the way of purchase behavior. First, analysis is made by support vector machines, then results are compared with the results obtained by logistic regression. For implementation to an e-commerce web site, it can be stated that support vector machines can classify more accurately.

Keywords:

Web Usage Mining, Support Vector Machines, Internet, Classification,

PDF

___

Agosti, M. ve Di Nunzio, G. (2007). Web Log Mining: A Study of User sessions. Proceedings of the 10th DELOS Thematic Workshop on Personalized Access, Profile Management, and Context Awareness in Digital Libraries (PersDL). Akaho, S. (1993). VC Dimension Theory for a Learning System with Forgetting, Proceedings of 1993 International Joint Conference on Neural Networks, Tokyo, 1(25-29), 493-496. Akpınar, H. (2014). DATA Veri Madenciliği Veri Analizi, Papatya Yayıncılık. Batista, P. ve Silva, M. (2001). Mining Web Access Logs of an On-line Newspaper. The Proceedings of 12th International Meeting of the euro working group on decision support systems. Bose, I. ve Chen, X. (2009). Quantitative models for direct marketing: A review from systems perspective. European Journal of Operational Research, 195(1), 1-16 Brusilovsky, P., Kobsa, A. ve Nejdl, W. (2007). The Adaptive Web, Springer. Bucklin, R. (2008). Marketing Models for Electronic Commerce. Handbook of Marketing Decision Models, s. 327. Burez, J. ve Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Expert Systems with Applications, 36, 4626-4636. Chawla, N., Bowyer, K., Hall, L. ve Kegelmeyer, W.(2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16(1), 321-357. Chitraa, V. ve Davamani, A. (2010). A Survey on Preprocessing Methods for Web Usage Data. International Journal of Computer Science and Information Security, 7(3). Clarke, B., Fokoué, E. ve Zhang, H. (2009). Principles and Theory for Data Mining and Machine Learning. New York: Springer Science+Business Media. Cooley, R., Mobasher, B. ve Srivastava, J. (1997). Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings 9th IEEE International Conference on Tools with Artificial Intelligence, 558-567. Drummond, C., Holte, R. (2003). C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling. Workshop on Learning from Imbalanced Datasets II, ICML. Washington DC. Etzioni, E. (1996). The World Wide Web: Quagmire or Gold Mine. Communication of the ACM, 39(11), 65-68. He, H., Garcia, E. (2009). Learning from Imbalanced Data. Ieee Transactions on Knowledge and Data Engineering, 21(9), 1263-1284. Hughes, A. (2005). Strategic Database Marketing: The Masterplan for Starting and Managing a Profitable, Customer-Based Marketing Program, McGraw-Hill. Japkowicz, N. (2000). The Class Imbalance Problem: Significance and Strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence: Special Track on Inductive Learning, 1. Japkowicz, N. (2001). Concept learning in the presence of between class and within-class imbalances. In Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence. 67-77. Jiawei, H., Kamber, M. ve Pei, J. (2011). Data Mining Concepts and Techniques, Morgan Kaufmann. Joshi, K., Joshi, A. ve Yesha, Y. (2003). On Using a Warehouse to Analyze Web Logs. Distrubuted and Parallel Databases. Khoshgoftaar, T., Van Hulse, J. ve Napolitano, A. (2011). Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data. Ieee Transactions On Systems, Man, And Cybernetics—Part A: Systems And Humans, 41(3), 552-568. Kim, G., Chae, B.ve Olson, D. (2013). A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models. Service Business, 7, 167-182. Kohavi, R., Mason, L., Parekh, R. ve Zheng, Z. (2004). Lessons and Challenges from mining retail e-commerce data. Machine Learning, 57(1), 83-113. Kosala, R. ve Blockeel, H. (2000). Web Mining Research: A Survey. SIGKDD Explorations: Newsletter of the Special Interest Group on Knowledge Discovery & Data Mining, 1-15. Ling, C. ve Li, C. (1998). Data Mining for Direct Marketing: Problems and Solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), 73-79. Liu, B. (2007). Web Data Mining Exploring Hyperlinks, Contents, and Usage Data. Springer 1st ed. Ngai, E., Xiu, L. ve Chau, D. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592-2602. McCarty, J. ve Hastak, M. (2007). Segmentation approaches in data-mining: A comparison of RFM, CHAID, and logistic regression. Journal of Business Research, 60(6), 656-662. Pöyhönen, S. (2004). Support Vector Machine Based Classification in Condition Monitoring of Induction Motors, Helsinki University of Technology Control Engineering Laboratory, Finland. Provost, F. ve Fawcett, T. (2001). Robust Classification for Imprecise Environments. Machine Learning, 42(3), 203-231. Srivastava, J., Cooley, R., Deshpande, M. ve Tan, P. (2000). Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, 12-23. Tolun S. (2008). Destek Vektör Makineleri: Banka Başarısızlığının Tahmini Üzerine Bir Uygulama, Doktora Tezi, İ.Ü. S.B.E. Vapnik, V. (1998). Statistical Learning Theory, Wiley. Viaene, S., Baesens, B., Van Gestel, T., Suykens, J., Van den Poel, D., Vantihienen, J. ve Dedene, G. (2001). Knowledge Discovery in a Direct Marketing Case using Least Squares Support Vector Machines. International Journal of Intelligent Systems, 16(9), 1023-1036. Wang, X. ve Zhang, Y. (2003). Statistical Learning Theory and State of Art in DVM, Proceedings of the Second IEEE International Conference on Cognitive Informatics, IEEE, 55-59. Weiss, G. (2004). Mining with rarity: a unifying framework. ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets, 6(1), 7-19. Xu, G., Zhang, Y. ve Li, L. (2011). Web Mining and Social Networking Techniques and Applications. Springer Science+Business Media.