Emrah Bilgiç, Özgür ÇAKIR

Kümelemelerin Karşılaştırılması: Bir Mağaza Segmentasyonu Uygulaması

Bu çalışma, kümelemelerin karşılaştırılması ölçülerinden biri olan çiftleri sayma tekniklerini (Rand Endeksi, Düzeltilmiş Rand Endeksi ve Fowlkes Mallows Endeksi gibi) incelemektedir. Bu çalışmanın amacı bahsi geçen tekniklerin özelliklerini tartışmak ve tekniklerin pazarlama alanındaki bir uygulamasını göstermektir. Uygulama olarak, zincir mağazalara sahip olan bir perakendecinin süpermarket mağazaları iki farklı yaklaşımla, kümeleme analizi kullanılarak segmentlere ayrılmıştır. İlk yaklaşımında mağazalar bulundukları yerin ve potansiyel müşterilerinin sosyoekonomik özelliklerine göre, ikinci yaklaşımda ise mağazalar kendi müşterilerinin satın alma davranışlarına göre segmentlere ayrılmıştır. Müşteri satın alma davranışları, sosyoekonomik faktörlerden güçlü bir şekilde etkilendiği için, bu çalışmanın beklentisi iki kümelemenin görüş birliğinde olması yönündedir. Analizler sonucunda Rand Endeksi iki kümeleme arasında bir görüş birliğinin olduğunu gösterse de, Fowlkes-Mallows Endeksi zayıf bir görüş birliğine, Düzeltilmiş Rand Endeksi ise görüş birliğinin olmadığına işaret etmektedir.

Anahtar Kelimeler:

Kümelemelerin Karşılaştırılması, Kümelemelerde Görüş Birliği, Mağaza Segmentasyonu

Comparing Clusterings:A Store Segmentation Application

This study focuses on one of the clustering comparison measures, pair counting techniques such as Rand Index, Adjusted Rand Index and Fowlkes Mallows Index. The aim is discussing their properties and showing a marketing application of the techniques. For an application, a retail chain company’s supermarket stores are segmented with clustering analysis by two approach. The first clustering approach is segmenting stores based on socioeconomic factors and the second approach is based on purchasing behaviors of customers. Since consumer purchases are influenced strongly by socioeconomic factors, this study expects to find an agreement between two clusterings. The results show that while Rand Index value indicates an agreement, Fowlkes-Mallows Index value has found a weak agreement and Adjusted Rand Index value could not find any agreement between two clusterings.

Keywords:

comparing clusterings, clustering agreement, store segmentation,

PDF

___

Aggarwal Charu C. (2015). Data mining: The textbook. Switzerland: Springer.

Agrawal Rakesh, & Srikant Ramakrishnan. (1994). Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB (Vol. 1215, pp. 487-499).

Fowlkes Edward B, & Mallows Colin L. (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical association, 78, 553-569.

Hennig Christian, & Liao Tim F. (2013). How to find an appropriate clustering for mixed‐type variables with application to socio‐economic stratification. Journal of the Royal Statistical Society: Series C (Applied Statistics), 62, 309-369.

Hubert Lawrence, & Arabie Phipps. (1985). Comparing partitions. Journal of Classification, 2, 193-218.

Jaccard Paul. (1908). Nouvelles recherches sur la distribution florale.

Jain A. K., Murty M. N., & Flynn P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31, 264-323.

Jain Anil K, & Dubes Richard C. (1988). Algorithms for clustering data: Prentice-Hall, Inc.

Kotler Philip, & Gary Armstrong. (2012). Principles of marketing (Vol. 14th ed.). New Jersey: Prentice Hall.

Lamb Charles W, Hair Joe F, & McDaniel Carl. (2011). Essentials of marketing: Cengage Learning.

Meilă Marina. (2007). Comparing clusterings—an information based distance. Journal of multivariate analysis, 98, 873-895.

Rabbany Reihaneh, & Zaïane Osmar R. (2015). Generalization of clustering agreements and distances for overlapping clusters and network communities. Data mining and knowledge discovery, 29, 1458-1485.

Rand William M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66, 846-850.

Romano Simone, Bailey James, Nguyen Xuan Vinh, & Verspoor Karin. (2014). Standardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance. In ICML (pp. 1143-1151).

Romano Simone, Vinh Nguyen Xuan, Bailey James, & Verspoor Karin. (2015). Adjusting for Chance Clustering Comparison Measures. arXiv preprint arXiv:1512.01286.

Steinley Douglas, Brusco Michael J, & Hubert Lawrence. (2016). The variance of the adjusted Rand index. Psychological methods, 21, 261.

Vinh Nguyen Xuan, Epps Julien, & Bailey James. (2009). Information theoretic measures for clusterings comparison: is a correction for chance necessary? In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1073-1080): ACM.

Wagner Silke, & Wagner Dorothea. (2007). Comparing clusterings: an overview: Universität Karlsruhe, Fakultät für Informatik Karlsruhe.

Ward Jr Joe H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical association, 58, 236-244.

Xiang Qiaoliang, Mao Qi, Chai Kian Ming, Chieu Hai Leong, Tsang Ivor, & Zhao Zhendong. (2012). A Split-Merge Framework for Comparing Clusterings. arXiv preprint arXiv:1206.6475.

Yeung Ka Yee, & Ruzzo Walter L. (2001). Details of the adjusted Rand index and clustering algorithms, supplement to the paper “An empirical study on principal component analysis for clustering gene expression data”. Bioinformatics, 17, 763-774.

Zaki Mohammed J, & Meira Jr Wagner. (2014). Data mining and analysis: fundamental concepts and algorithms: Cambridge University Press.