Begüm HATTATOĞLU, Çağla ŞENELER, Gökhan ŞAHİN, Fazlı YILDIRIM

CUSTOMER PORTFOLIO OF A CONSUMER GOODS BASED VIRTUAL STORE: IDENTIFYING CUSTOMER SEGMENTS WITH CLUSTER ANALYSIS

In the last decade, analyzing and identifying customers became an irreplaceable need for companies. This research concentrates on discovering a company’s customer segments using different machine learning algorithms, benchmarking different algorithms and its parameters to conclude the best results. Improvements in the technology provided several approaches to dive in and gain insights from a mass amount of data. Machine learning algorithms which is one of the most popular approaches was chosen to convey this empirical study. A dataset with mix categorical and numeric variables is analyzed with one of the conventional machine learning algorithms, namely Hierarchical Agglomerative Clustering Algorithm with Gower’s distance. Kernel Principal Component Analysis is used for preprocessing due to the existence of categorical variables. K-prototypes Algorithm is chosen as benchmark algorithm that fits the qualities of the dataset with mixed categorical and numeric features. Benchmarking provides verification in respect to the accuracy of the results by evaluating the final clusters. Also, examining different parameters and comparing their effects on analysis results indicates the importance and vitality of them for machine learning algorithms, which need to be enlightened to do more accurate analyses. The results showed that both K-prototypes and HAC yield similar results proving that clusters mostly divided appropriately. However, there are a few significant points that are different at both algorithms’ results, which should be examined in further study.

Anahtar Kelimeler:

Machine Learning, Kernel Principal Component Analysis, Clustering, Hierarchical Agglomerative Clustering, Gower’s Distance, K-prototypes

PDF

___

Amaro, S., Duarte, P. & Henriques, C. (2016). Travelers’ use of social media: A clustering approach. Annals of Tourism Research, 59, 1-15.
Canhoto, A. I., Clark, M. & Fennemore, P. (2013). Emerging segmentation practices in the age of the social customer. Journal of Strategic Marketing, 21(5), 413-428.
Crabbe, M., Jones, B. & Vandebroek, M. (2011). A comparison of two-stage segmentation methods for choice-based conjoint data: A simulation study. Leuven: K.U. Leuven Faculty of Business and Economics.
Dolnicar, S. (2002). A review of unquestioned standards in using cluster analysis for data-driven market segmentation. CD Conference Proceedings of the Australian and
New Zealand Marketing Academy Conference 2002 (ANZMAC 2002), (4-6), Melbourne, Retrieved December 2-4, 2002
Formann, A. K. (1984). Die Latent-Class-Analyse: Einführung in Theorie und Anwendung. Weinheim: Beltz.
Ghosh, P. (2017). Fundamentals of Descriptive Analytics. Retrieved from Dataversity. Retrieved from Data Education for Business and IT Professionals: http://www.dataversity.net/fundamentals-descriptive-analytics/
Gower, J. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857-871. doi:10.2307/2528823
Hoven, J. V. (2015). Clustering with Optimised Weights for Gower's Metric. Amsterdam: University of Amsterdam.
Huang, Z. (1997). Clustering Large Data Sets with Mixed Numeric and Categorical Values. Proceedings of the 1st pacific-asia conference on knowledge discovery and data mining,(PAKDD), 21-34.
Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.
Kandeil, D. A., Saad, A. A. & Youssef, S. M. (2014). A two-phase clustering analysis for B2B customer segmentation. International Conference on Intelligent Networking and Collaborative Systems (221-228), Salerno: IEEE. doi:10.1109/INCoS.2014.49
Kim, A. J. & Ko, E. (2012). Do social media marketing activities enhance customer equity? An empirical study of luxury fashion brand. Journal of Business Research, 65(10), 1480-1486.
Lee, J.-M., Yoo, C., Choi, W. S., Vanrolleghem, P. A. & Lee, I.-B. (2004). Nonlinear process monitoring using kernel principal component analysis. Chemical Engineering Science, 59(1), 223-234.
Liu, Y., Li, H., Peng, G., Lv, B. & Zhang, C. (2015). Online purchaser segmentation and promotion strategy selection: Evidence from Chinese E-commerce market. Annals of Operations Research, 233(1), 263-279. doi:https://doi.org/10.1007/s10479-013-1443-z
Muley, P. & Joshi, A. (2015). Application of data mining techniques for customer segmentation in real time business intelligence. International Journal of Innovative Research in Advanced Engineering (IJIRAE), 2(4), 106-109.
Murgath, F. & Legendre, P. (2014, October). Ward's hierarchical agglomerative clustering method: Which algorithms implement ward's criterion? Journal of Classification, 31(3), 274-295. doi:https://doi.org/10.1007/s00357-014-9161-z
Oracle. (2003). Descriptive Data Mining Models. In In Oracle Data Mining Concepts 10g Release 1 (10.1) (pp. 4-1). Oracle.
Rousseeuw, P. J. (1986). Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
Vinerean, S., Cetina, I., Dumitrescu, L. & Tichindelean, M. (2013). The effects of social media marketing on online consumer behavior. International Journal of Business and Management, 8(14), 66-79.