Elipsoit Destek Vektör Öbekleme Algoritmasının Biyomedikal Veri Setleri Üzerinde Karşılaştırmalı Performans Analizi

Hastalıklı kişilerde hastalığın teşhisinin önceden yapılması, tanısının konulması ve gerekli önlemlerin alınmasına yardımcı olmalarından dolayı öbekleme algoritmalarının performansı biyomedikal araştırmalarda çok önemlidir. Ancak, çoğu öbekleme algoritması benzerlik metriği olarak Öklid uzaklığını kullanır. Öklid uzaklığı verilerin varyanslarını eşit kabul eder. Gürültülü veya aykırı değerlerin veriye bulaşması durumunda, geleneksel Öklid uzaklığı kullanan öbekleme yöntemlerinin performansı oldukça düşmektedir. Bu çalışma, yukarıda bahsedilen olumsuzlukları gidermek için kernel tabanlı öbekleme yöntemlerinden biri olan Elipsoit Destek Vektör Öbekleme (EDVÖ) algoritmasını önerir. EDVÖ algoritmasında, önceden öbek sayısının belirtilmesine gerek yoktur. Ayrıca, EDVÖ algoritması, mahalanobis benzerlik ölçüsünü kullanarak verilerin dağılımına uygun kümelenme şekilleri üretebilir. Önerilen EDVÖ algoritması hem gerçek biyomedikal verilere hem de sentetik verilere uygulanmış ve daha sonra geleneksel kümeleme yöntemleri ile karşılaştırılmıştır. EDVÖ algoritmasının doğruluk, özgüllük ve duyarlılık açısından iyi bir performans gösterdiği gözlemlenmiştir.

Comparative Performance Analysis of Ellipsoidal Support Vector Clustering on Biomedical Data Sets

The performance of clustering algorithms is very important in biomedical research because they help in the pre-diagnosis of diseases, recognize diseases and take necessary precautions in diseased people. However, most clustering algorithms use the Euclidean distance as a similarity metric. Euclidean distance assumes the variances of the data samples are equal. The performance of traditional clustering methods that use Euclidean distance is quite low if the data contains noise or outlier samples. This study proposes the Ellipsoidal Support Vector Clustering algorithm, which is one of the kernel-based clustering methods, in order to eliminate the above mentioned problems. In the ESVC algorithm, there is no need to specify the cluster number in advance. Moreover, the ESVC algorithm is capable of generating clustering shapes that are appropriate to the distribution of data using the mahalanobis similarity metric. The proposed ESVC algorithm was applied to both real biomedical data and synthetic data and then compared to conventional clustering methods. It has been observed that ESVC algorithm performs well in terms of accuracy, specificity and sensitivity.

PDF

___

[1].Fahad, Adil, et al. "A survey of clustering algorithms for big data: Taxonomy and empirical analysis", IEEE transactions on emerging topics in computing, 2(3):267-279, (2014).
[2].Starczewski, Artur. "A new validity index for crisp clusters", Pattern Analysis and Applications, 20(3):687-700, 2017.
[3].Xu, Rui, and Donald Wunsch, "Survey of clustering algorithms", IEEE Transactions on neural networks, 16(3):645-678, (2005).
[4].Khanmohammadi, Sina, Naiier Adibeig, and Samaneh Shanehbandy, "An improved overlapping k-means clustering method for medical applications", Expert Systems with Applications, 67:12-18, (2017).
[5].Rosati, Samanta, et al, "Muscle activation patterns during gait: A hierarchical clustering analysis", Biomedical Signal Processing and Control, 31:463-469, (2017).
[6].Kumar, Dhirendra, et al, "A modified intuitionistic fuzzy c-means clustering approach to segment human brain MRI image", Multimedia Tools and Applications, 1-25, (2018).
[7].Velmurugan, T, "Performance based analysis between kMeans and Fuzzy C-Means clustering algorithms for connection oriented telecommunication data", Applied Soft Computing, 19:134-146, (2014).
[8].Erken, Mücahit, "Comparing clustering algorithms on wisconsin data set", 24th. IEEE Signal Processing and Communication Application Conference (SIU), 2016.
[9].Sharma, Alok, et al, "Hierarchical maximum likelihood clustering approach", IEEE transactions on biomedical engineering, 64(1):112-122, (2017).
[10]. Ben-Hur, Asa, et al, "Support vector clustering", Journal of machine learning research, 125-137, (2001).
[11]. Dongiovanni, Danilo & Vaina, Lucia, “Select and Cluster: A Method for Finding Functional Networks of Clustered Voxels in fMRI”, Computational Intelligence and Neuroscience, (7):1-19, (2016).
[12]. Yin, Zhong, and Jianhua Zhang, "Identification of temporal variations in mental workload using locally-linearembedding-based EEG feature reduction and support-vectormachine-based clustering and classification techniques", Computer methods and programs in biomedicine, 115(3): 119-134, (2014).
[13]. Villazana, Sergio, César Seijas, and Antonino Caralli, "Lempel–Ziv complexity and Shannon entropybased support vector clustering of ECG signals", 22(1):(2015).
[14]. Yin, Zhong, and Jianhua Zhang, "Identifying changes in human operator mental workload by locally linear embedding and support vector clustering approaches", IFAC Proceedings Volumes 46(13):353-358, 2013.
[15]. Lım, B., Lı, X., Zhang, J. B., & Shı, D, Feature Dıscretızatıon By Support Vector Clusterıng, “In Information Sciences”, 797-803, (2007).
[16]. Wang, Defeng, et al. "Ellipsoidal support vector clustering for functional MRI analysis", Pattern Recognition, 40(10):2685-2695, (2007).
[17]. UCI Machine Learning Repository, URL: http://archive.ics.uci.edu/ml (Erişim Zamanı; Mart, 2018)
[18]. J. Sturm, “Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones, Optim. Methods Software”, 11:625–653, (1999).
[19]. CVX, Matlab software for disciplined convex programming, URL: http://cvxr.com/cvx (Erişim Zamanı; Mart, 2018).
[20]. Kim, Dae-Won, et al. "A k-populations algorithm for clustering categorical data", Pattern Recognition 38(7): 1131-1134, (2005).