Salih BAL, Efnan ŞORA GÜNAL

A NEW MODEL ON AUTOMATIC TEXT SUMMARIZATION FOR TURKISH

The amount of data available in the electronic environment is increasing day by day with the development of technology. It becomes tough and time consuming for the users to access the information they desire within this increasing amount of data. Automatic text summarization systems have been developed to reach the desired information within texts in a shorter time than that of manual text summarization. In this paper, a new extractive text summarization model is proposed. In the proposed model, the inclusion of sentences of a given text in the summary is decided based on a classification approach. Also, the effectiveness of widely used features for automatic text summarization in the Turkish language is evaluated using sequential feature selection methods. The evaluations were carried out specifically for Turkish texts in the categories of economy, art, and sports. The experimental work justified the performance of the proposed text summarization method and revealed how effective the features are.

Keywords:

Text summarization Feature selection, Classification,

PDF

___

Güran A, Arslan SN, Kılıç E, Diri B. Sentence selection methods for text summarization. In: 2014 22nd Signal Processing and Communications Applications Conference (SIU), IEEE, 192-195.
alZahir S, Fatima Q, Cenek M. New graph-based text summarization method. In: 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), 396-401.
Luhn HP. The Automatic Creation of Literature Abstracts. IBM J. Res. Dev., 2, 159-165.
Altan Z. A Turkish Automatic Text Summarization System. In:2004 IASTED International Conference on AIA, 16-18 February.
Tardan PP, Erwin A, Eng KI, Muliady W. Automatic text summarization based on semantic analysis approach for documents in Indonesian language. In: 2013 International Conference on Information Technology and Electrical Engineering (ICITEE), 47-52.
Hingu D, Shah D, Udmale S. Automatic text summarization of Wikipedia articles. In: 2015 International Conference on Communication, Information & Computing Technology (ICCICT), 1-4.
GeethaJ K, Deepamala N. Kannada text summarization using Latent Semantic Analysis. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1508-1512.
Karakoç E, Yılmaz B. Derin Öğrenme Tabanlı Yoruma Dayalı Türkçe Haber Özetleme. In: 2019 22nd Signal Processing and Communications Applications Conference (SIU), IEEE.
Sanchez-Gomez JM, Vega-Rodríguez MA, Perez CJ. Experimental analysis of multiple criteria for extractive multi-document text summarization. Expert Systems with Applications. 2020; 140, 112904. https://doi.org/10.1016/j.eswa.2019.112904.
Güran A, Uysal M, Ekinci Y, Güran CB. An Additive FAHP based sentence score function for text summarization. Inf. Technol. Control. 2017; 46(1), 53-69.
Gulati AN, Sawarkar S. A novel technique for multidocument Hindi text summarization. In: 2017 International Conference on Nascent Technologies in Engineering (ICNTE), 1-6.
Yadav J, Meena YK. Use of fuzzy logic and wordnet for improving performance of extractive automatic text summarization. In : 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (2016, September), IEEE, 2071-2077.
Akin AA, Akin MD. Zemberek, an open source NLP framework for Turkic Languages. Structure 10: 1-5, 2007.
Güran A. Otomatik Metin Özetleme Sistemi, Phd. Thesis, Yildiz Technical University, Istanbul, 2013.
Edmundson HP. New methods in automatic extracting. Journal of the Association for Computing Machinery. 1969; 16: 264-285.
Güran A, Bayazıt NG, Gürbüz MZ. Effcient feature integration with Wikipedia-based semantic feature extraction for Turkish text summarization. Turkish Journal of Electrical Engineering & Computer Sciences. 2013; 21(5):1411-1425.
Saeys Y, Inza I, Larranaga P. A review of feature selection techniques in bionformatics. Bionformatics. 2007; 23: 2507-2517.
Forman G. An Extensive Emprical Study of Feature Selection Metrics for Text Cassification. Journal of Machine Learning Research 2003; 3: 1289-1305.
Liu T, Liu S, Chen Z, Ma WY. An evaluation on feature selection for text clustering. In: 2003 Proc. of the 20th international conference on machine learning (ICML-03), 488-495.
Bai X, Gao X, Xue B. Particle swarm optimization based two-stage feature selection in text mining. In: 2018 IEEE Congress on Evolutionary Computation (CEC) (2018, July), 1-8.
Mihuandayani Utami E, Luthfi ET. Text mining based on tax comments as big data analysis using SVM and feature selection. In: 2018 International Conference on Information and Communications Technology (ICOIACT), 537-542.
Mohamad M, Selamat A. An evaluation on the efficiency of hybrid feature selection in spam email classification. In: 2015 International Conference on Computer, Communications, and Control Technology (I4CT), 227-231.
Yildiz O, Tez M, Bilge HŞ, Akcayol MA, Güler I. Meme kanseri sınıflandırması için gen seçimi. In: IEEE 20. Sinyal İşleme ve İletişim Uygulamaları Kurultayı, 2012.
Canedo VB, Marono NS, Betanzos AA, Benitez JM, Herrera F. A review of microarray datasets and applied feature selection methods. Information Sciences. 2014; 282: 111-135.
Onan A. Ensemble learning based feature selection with an application to text classification. In: IEEE 2018 26th Signal Processing and Communications Applications Conference (SIU).
Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of machine learning research. 2003; 3(Mar): 1157-1182.
Nakariyakul S, Casasent DP. An improvement on floating search algorithms for feature subset selection. Pattern Recognition. 2009; 42, No. 9: 1932-1940.
Uysal AK, Günal S. The impact of preprocessing on text classification. Information Processing and Management. 2014; 50: 104-112.
Weiss SM, Indurkhya N, Zhang T. Fundamentals of predictive text mining. Springer, 2015.
Suzuki K. Artificial Neural Networks – Architectures And Applications. 2013, Intech open science, ISBN 978-953-51-0935-8.

Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering-Cover

ISSN: 2667-4211
Yayın Aralığı: Yılda 4 Sayı
Başlangıç: 2000
Yayıncı: Eskişehir Teknik Üniversitesi

Arşiv

Sayıdaki Diğer Makaleler

Oğuzhan Murat HALAT, İlke CİRİTCİ, Gül YÜCEL

Yiğit Ali ÜNCÜ

Fatih SERTTAŞ, Fatih Onur HOCAOĞLU

Salih BAL, Efnan ŞORA GÜNAL

Gökhan DİKMEN, Özgür ALVER

Caner TANIŞ, Kadir KARAKAYA

Tuncay YEŞİLKAYNAK, Ruken Esra DEMİRDOGEN, Vural KAFADAR, Fatih EMEN

İbrahim KÜÇÜKKARA, Barış POLAT

Özgür YILDIZ

Murat DAĞ