Elipsoit Destek Vektör Öbekleme Algoritmasının Biyomedikal Veri Setleri Üzerinde Karşılaştırmalı Performans Analizi

Hastalıklı kişilerde hastalığın teşhisinin önceden yapılması, tanısının konulması ve gerekli önlemlerin alınmasına yardımcı olmalarından dolayı öbekleme algoritmalarının performansı biyomedikal araştırmalarda çok önemlidir. Ancak, çoğu öbekleme algoritması benzerlik metriği olarak Öklid uzaklığını kullanır. Öklid uzaklığı verilerin varyanslarını eşit kabul eder. Gürültülü veya aykırı değerlerin veriye bulaşması durumunda, geleneksel Öklid uzaklığı kullanan öbekleme yöntemlerinin performansı oldukça düşmektedir. Bu çalışma, yukarıda bahsedilen olumsuzlukları gidermek için kernel tabanlı öbekleme yöntemlerinden biri olan Elipsoit Destek Vektör Öbekleme (EDVÖ) algoritmasını önerir. EDVÖ algoritmasında, önceden öbek sayısının belirtilmesine gerek yoktur. Ayrıca, EDVÖ algoritması, mahalanobis benzerlik ölçüsünü kullanarak verilerin dağılımına uygun kümelenme şekilleri üretebilir. Önerilen EDVÖ algoritması hem gerçek biyomedikal verilere hem de sentetik verilere uygulanmış ve daha sonra geleneksel kümeleme yöntemleri ile karşılaştırılmıştır. EDVÖ algoritmasının doğruluk, özgüllük ve duyarlılık açısından iyi bir performans gösterdiği gözlemlenmiştir.

Anahtar Kelimeler:

Biyomedikal veriler, öbekleme, elipsoit destek vektör öbekleme

Comparative Performance Analysis of Ellipsoidal Support Vector Clustering on Biomedical Data Sets

The performance of clustering algorithms is very important in biomedical research because they help in the pre-diagnosis of diseases, recognize diseases and take necessary precautions in diseased people. However, most clustering algorithms use the Euclidean distance as a similarity metric. Euclidean distance assumes the variances of the data samples are equal. The performance of traditional clustering methods that use Euclidean distance is quite low if the data contains noise or outlier samples. This study proposes the Ellipsoidal Support Vector Clustering algorithm, which is one of the kernel-based clustering methods, in order to eliminate the above mentioned problems. In the ESVC algorithm, there is no need to specify the cluster number in advance. Moreover, the ESVC algorithm is capable of generating clustering shapes that are appropriate to the distribution of data using the mahalanobis similarity metric. The proposed ESVC algorithm was applied to both real biomedical data and synthetic data and then compared to conventional clustering methods. It has been observed that ESVC algorithm performs well in terms of accuracy, specificity and sensitivity.

Keywords:

Biomedical data, clustering, ellipsoidal support vector clustering,

PDF

___

Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Khalil, I., Zomaya, A., Foufou, S. and Bouras, A. (2014). A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis. IEEE Transactions on Emerging Topics in Computing, 2(3), pp.267-279.
Starczewski, A. (2017). A new validity index for crisp clusters. Pattern Analysis and Applications, 20(3), pp.687-700.
Xu, R. and WunschII, D. (2005). Survey of Clustering Algorithms. IEEE Transactions on Neural Networks, 16(3), pp.645-678.
Khanmohammadi, S., Adibeig, N. and Shanehbandy, S. (2017). An improved overlapping k-means clustering method for medical applications. Expert Systems with Applications, 67, pp.12-18.
Rosati, S., Agostini, V., Knaflitz, M. and Balestra, G. (2017). Muscle activation patterns during gait: A hierarchical clustering analysis. Biomedical Signal Processing and Control, 31, pp.463-469.
Kumar, D., Verma, H., Mehra, A. and Agrawal, R. (2018). A modified intuitionistic fuzzy c-means clustering approach to segment human brain MRI image. Multimedia Tools and Applications, 1-25.
T., V. (2014). Performance based analysis between k-Means and Fuzzy C-Means clustering algorithms for connection oriented telecommunication data. Applied Soft Computing, 19, pp.134-146.
Erken, M. (2016). Comparing clustering algorithms on wisconsin data set. In Signal Processing and Communication Application Conference (SIU), pp. 1541-1544.
Sharma, A., Boroevich, K., Shigemizu, D., Kamatani, Y., Kubo, M. and Tsunoda, T. (2017). Hierarchical Maximum Likelihood Clustering Approach. IEEE Transactions on Biomedical Engineering, 64(1), pp.112-122..
Ben-Hur, A., Horn, D., Siegelmann, H. T., & Vapnik, V. (2001). Support vector clustering. Journal of machine learning research, 2(Dec), pp. 125-137.
DonGiovanni, D. and Vaina, L. (2016). Select and Cluster: A Method for Finding Functional Networks of Clustered Voxels in fMRI. Computational Intelligence and Neuroscience, 2016, pp.1-19.
Yin, Z. and Zhang, J. (2014). Identification of temporal variations in mental workload using locally-linear-embedding-based EEG feature reduction and support-vector-machine-based clustering and classification techniques. Computer Methods and Programs in Biomedicine, 115(3), pp. 119-134.
Villazana, S., Seijas, C., & Caralli, A. (2015). Lempel-Ziv complexity and Shannon entropy-based support vector clustering of ECG signals. Revista Ingenıería Uc, 22(1).
Yin, Z. and Zhang, J. (2013). Identifying Changes in Human Operator Mental Workload by Locally Linear Embedding and Support Vector Clustering Approaches. IFAC Proceedings Volumes, 46(13), pp.353-358..
Lım, B., Lı, X., Zhang, J. B., & Shı, D. (2007). Feature Dıscretızatıon By Support Vector Clusterıng. In Information Sciences, pp. 797-803.
Wang, D., Shi, L., Yeung, D. S., Tsang, E. C., & Heng, P. A. (2007). Ellipsoidal support vector clustering for functional MRI analysis. Pattern Recognition, 40(10), pp. 2685-2695.
Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
J. Sturm, (1999) Using sedumi 1.02, a matlab toolbox for optimization over symmetric cones, Optim. Methods Software 11 pp. 625–653.
M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming (web page and software). http://stanford.edu/∼boyd/cvx, June 2009.
J. Lee and D. Lee, (2005), “An Improved Cluster Labeling Method for Support Vector Clustering,” IEEE Trans. Pattern Analysis and Machine Intelligence, 27(3), pp. 461-464.
K.D. Won, K. Lee, K.D. Lee, K.H. Lee, (2005), A k-populations algorithm for clustering categorical data, Pattern Recognition 38(7), pp. 1131–1134.