SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning

There is growing interest in studies on text classification as a result of the exponential increase in the amount of data available. Many studies have been conducted in the field of text clustering, using different approaches. This study introduces Spectral Sentence Clustering (SSC) for text clustering problems, which is an unsupervised method based on graph-partitioning. The study explains how the proposed model proposed can be used in natural language applications to successfully cluster texts. A spectral graph theory method is used to partition the graph into non-intersecting sub-graphs, and an unsupervised and efficient solution is offered for the text clustering problem by providing a physical representation of the texts. Finally, tests have been conducted demonstrating that SSC can be successfully used for text categorization. A clustering success rate of 97.08% was achieved in tests conducted using the TTC-3600 dataset, which contains open-access unstructured Turkish texts, classified into categories. The SSC model proposed performed better compared to a popular k-means clustering algorithm.

SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning

There is growing interest in studies on text classification as a result of the exponential increase in the amount of data available. Many studies have been conducted in the field of text clustering, using different approaches. This study introduces Spectral Sentence Clustering (SSC) for text clustering problems, which is an unsupervised method based on graph-partitioning. The study explains how the proposed model proposed can be used in natural language applications to successfully cluster texts. A spectral graph theory method is used to partition the graph into non-intersecting sub-graphs, and an unsupervised and efficient solution is offered for the text clustering problem by providing a physical representation of the texts. Finally, tests have been conducted demonstrating that SSC can be successfully used for text categorization. A clustering success rate of 97.08% was achieved in tests conducted using the TTC-3600 dataset, which contains open-access unstructured Turkish texts, classified into categories. The SSC model proposed performed better compared to a popular k-means clustering algorithm.

___

[1] Ş. Canberk, G. , Sağıroğlu, Bilgi ve Bilgisayar Güvenliği : Casus Yazılımlar ve Korunma Yöntemleri. Ankara: Grafiker Yayıncılık, 2006.

[2] Osman Durmaz;, “Metin Sınıflandırmada Boyut Azaltmanın Etkisi ve Özellik Seçimi,” Gazi Üniversitesi, 2011.

[3] C. Hark, A. Seyyarer, T. Uçkan, and A. Karci, “Doǧal dil işleme yaklaşimlari ile yapisal olmayan dökümanlarin benzerliǧi,” in IDAP 2017 - International Artificial Intelligence and Data Processing Symposium, 2017.

[4] D. D. Lewis, “Naive (Bayes) at forty: The independence assumption in information retrieval,” in European conference on machine learning, 1998, pp. 4–15.

[5] K. Aas and L. Eikvil, “Text categorisation: A survey.” Technical report, Norwegian computing center, 1999.

[6] N. O. Andrews and E. A. Fox, “Recent Developments in Document Clustering,” 2007.

[7] N. Shah, “Document Clustering : A Detailed Review,” vol. 4, no. 5, pp. 30–38, 2012.

[8] A. Pujari, “Data mining techniques,” 2001.

[9] S. N. Bharath Bhushan and A. Danti, “Classification of text documents based on score level fusion approach,” Pattern Recognit. Lett., vol. 94, pp. 118–126, Jul. 2017.

[10] R. J. Wilson and J. J. Watkins, Graphs: an introductory approach: a first course in discrete mathematics. John Wiley & Sons Inc, 1990.

[11] H. Qiu and E. R. Hancock, “Graph matching and clustering using spectral partitions,” Pattern Recognit., vol. 39, pp. 22–34, 2006.

[12] D. Kılınç, A. Özçift, F. Bozyigit, P. Yıldırım, F. Yücalar, and E. Borandag, “TTC-3600: A new benchmark dataset for Turkish text categorization,” J. Inf. Sci., vol. 43, no. 2, pp. 174–185, Apr. 2017.

[13] C. Hark, A. Seyyarer, T. Uçkan, and A. Karci, “Doǧal dil işleme yaklaşimlari ile yapisal olmayan dökümanlarin benzerliǧi,” in IDAP 2017 - International Artificial Intelligence and Data Processing Symposium, 2017, pp. 1–6.

[14] C. Hark, T. Uçkan, and A. Seyyarer, Abubekir Karci, “Metin Özetleme İçin Çizge Tabanlı Bir Öneri,” in IDAP 2018 - International Artificial Intelligence and Data Processing Symposium, 2018.

[15] A. Karci, “Çizge Algoritmaları ve Çizge Bölmeleme.pdf,” 1998.

[16] U. Von Luxburg, “A Tutorial on Spectral Clustering,” 2007.

[17] B. Slininger, “Fiedler’s Theory of Spectral Graph Partitioning.”

[18] H. Jia, S. Ding, and M. Du, “Self-Tuning p-Spectral Clustering Based on Shared Nearest Neighbors,” Cognit. Comput.

[19] L. Macdonald, “Successive Approximations by the Rayleigh-Ritz Variation Methoti,” 1933.

[20] L. Hagen, … A. K. on computer-aided design of, and U. 1992, “New spectral methods for ratio cut partitioning and clustering,” ieeexplore.ieee.org.

[21] S. Jianbo and M. Jitendra, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 888–905, 2000.

[22] A. Ben Ayed, M. Ben Halima, and A. M. Alimi, “Adaptive fuzzy exponent cluster ensemble system based feature selection and spectral clustering,” IEEE Int. Conf. Fuzzy Syst., no. July, 2017.

[23] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, “A survey of kernel and spectral methods for clustering,” Pattern Recognit., vol. 41, pp. 176–190, 2008.

[24] W. R. Casper and B. Nadiga, “A New Spectral Clustering Algorithm.”

[25] R. M. Alguliev, R. M. Aliguliyev, and M. S. Hajirahimova, “GenDocSum+ MCLR: Generic document summarization based on maximum coverage and less redundancy,” Expert Syst. Appl., vol. 39, no. 16, pp. 12460–12473, 2012.

[26] M. Fiedler, “Algebraic connectivity of graphs,” Czechoslov. Math. J., vol. 23, no. 2, pp. 298–305, 1973.

[27] F. R. K. Chung, “Lectures on Spectral Graph Theory.”

[28] C. T. Zheng, C. Liu, and H. S. Wong, “Corpus-based topic diffusion for short text clustering,” Neurocomputing, vol. 275, pp. 2444–2458, 2018.

[29] J. Xu et al., “Self-Taught convolutional neural networks for short text clustering,” Neural Networks, vol. 88, pp. 22–31, 2017.

[30] F. Beil, M. Ester, B. Bc, and C. Va, “Frequent Term-Based Text Clustering,” 2002.

[31] C. Manning, P. Raghavan, and H. Schütze, “Introduction to information retrieval,” Nat. Lang. Eng., vol. 16, no. 1, pp. 100–103, 2010.

[32] M. Kozlowski and H. Rybinski, “Clustering of semantically enriched short texts,” J. Intell. Inf. Syst., Dec. 2018.

___

Bibtex @araştırma makalesi { politeknik684558, journal = {Politeknik Dergisi}, eissn = {2147-9429}, address = {Gazi Üniversitesi Teknoloji Fakültesi 06500 Teknikokullar - ANKARA}, publisher = {Gazi Üniversitesi}, year = {2021}, volume = {24}, number = {4}, pages = {1433 - 1444}, doi = {10.2339/politeknik.684558}, title = {SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning}, key = {cite}, author = {Uçkan, Taner and Hark, Cengiz and Karci, Ali} }
APA Uçkan, T. , Hark, C. & Karci, A. (2021). SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning . Politeknik Dergisi , 24 (4) , 1433-1444 . DOI: 10.2339/politeknik.684558
MLA Uçkan, T. , Hark, C. , Karci, A. "SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning" . Politeknik Dergisi 24 (2021 ): 1433-1444 <
Chicago Uçkan, T. , Hark, C. , Karci, A. "SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning". Politeknik Dergisi 24 (2021 ): 1433-1444
RIS TY - JOUR T1 - SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning AU - Taner Uçkan , Cengiz Hark , Ali Karci Y1 - 2021 PY - 2021 N1 - doi: 10.2339/politeknik.684558 DO - 10.2339/politeknik.684558 T2 - Politeknik Dergisi JF - Journal JO - JOR SP - 1433 EP - 1444 VL - 24 IS - 4 SN - -2147-9429 M3 - doi: 10.2339/politeknik.684558 UR - Y2 - 2020 ER -
EndNote %0 Politeknik Dergisi SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning %A Taner Uçkan , Cengiz Hark , Ali Karci %T SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning %D 2021 %J Politeknik Dergisi %P -2147-9429 %V 24 %N 4 %R doi: 10.2339/politeknik.684558 %U 10.2339/politeknik.684558
ISNAD Uçkan, Taner , Hark, Cengiz , Karci, Ali . "SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning". Politeknik Dergisi 24 / 4 (Aralık 2021): 1433-1444 .
AMA Uçkan T. , Hark C. , Karci A. SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning. Politeknik Dergisi. 2021; 24(4): 1433-1444.
Vancouver Uçkan T. , Hark C. , Karci A. SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning. Politeknik Dergisi. 2021; 24(4): 1433-1444.
IEEE T. Uçkan , C. Hark ve A. Karci , "SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning", Politeknik Dergisi, c. 24, sayı. 4, ss. 1433-1444, Ara. 2021, doi:10.2339/politeknik.684558
Politeknik Dergisi
  • ISSN: 1302-0900
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 1998

65.1b9.6b

Sayıdaki Diğer Makaleler

Sıralı Tip Isı Borulu Isı Değiştiricilerinde MgO+CuO/Su ve MgOAl2O3/Su Nanoakışkanları Kullanılarak Performansın İyileştirilmesi: Karşılaştırmalı Deneysel Çalışma

Çağdaş FİLİZ, Yaşar YETİŞKEN

Grafen Takviyeli Epoksi Nanokompozitlerin Özelliklerinin İncelenmesi

Hazal YILMAZ, Yasin ALTIN, Ayşe BEDELOĞLU

Synthesis and Characterization of Silver Nanoparticles and Coating with Chitosan

Mehmet ATEŞ, Ersen YILMAZ, Bülent KAR, İlknur KARS DURUKAN

AZ31B Magnezyum Alaşımının Sürtünmeli Delme İşlemi Üzerine Deneysel Çalışma

Musa BİLGİN

Üleksit Katkısının Kum-Bentonit Karışımlarının Mühendislik Özelliklerine Etkisi

S. Gizem ALPAYDİN, Yeliz YUKSELEN-AKSOY

Tepki Kuvveti Gözetleyicisi Tabanlı Tork Kontrolü

Erim Can ÖZÇINAR, Regaip Barkan UĞURLU

Experimental and Statistical Analysis of Effect of Si Modification and Grain Refinement on Tensile Properties of A356 Alloy

Muhammet ULUDAĞ

Gri İlişkisel Analiz Yöntemi ile Tufal Katkılı Harçların Basınç Dayanımı ve Porozitesinin Optimizasyonu

Erdinç ARICI, Oğuzhan KELEŞTEMUR

Ramat Geri Kazanım Prosesinde Açığa Çıkan Cüruflardan Gravite Ayırması ile Altın Kazanımı

Eyüp SABAH, Filiz ORUÇ ŞAPÇI

Analysis of Attack Angle Effect on Flow Characteristics Around Torpedo-Like Geometry Placed Near the Free-Surface via CFD

Muammer ÖZGÖREN, Alpaslan KILAVUZ, Tahir DURHASAN, Beşir ŞAHİN, Levent KAVURMACIOĞLU, Hüseyin AKILLI, Fuad SARIĞIGÜZEL