Event-based summarization of news articles

In recent years, with the increase of available digital information on the Web, the time needed to find relevant information is also increased. Therefore, to reduce the time spent on searching, research on automatic text summarization has gained importance. The proposed summarization process is based on event extraction methods and is called an event-based extractive single-document summarization. In this method, the important features of event extraction and summarization methods are analyzed and combined together to extract the summaries from single-source news documents. Among the tested features, six features are found to be the most effective in constructing good summaries. The constructed summaries are tested on benchmark Document Understanding Conferences 2001 and 2002 datasets, and the results outperformed most of the other well-known summarization methods.

___

  • [1] Nenkova A. Automatic text summarization of newswire: Lessons learned from the document understanding conference. In: Proceedings AAAI; Pittsburgh, PA, USA; 2005. pp. 1436-1441.
  • [2] Salim N, Khanam A. A review on abstractive summarization methods. Journal of Theoretical and Applied Information Technology 2014; 59 (1): 64-72.
  • [3] Liu F, Flanigan J, Thomson S, Sadeh N, Smith NA. Toward abstractive summarization using semantic representations. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Denver, CO, USA; 2005. pp. 1077–1086.
  • [4] Eidheim HL. Temporal summarization of time critical events: A system for summarizing events over time in a continuous stream of documents. MSc, Norwegian University of Science and Technology, Trondheim, Norway, 2015.
  • [5] Narayan S, Cohen SB, Lapata M. Ranking sentences for extractive summarization with reinforcement learning. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; New Orleans, LA, USA; 2018. pp. 1747–1759.
  • [6] Chasin R, Rumshisky A, Uzuner O, Szolovits P. Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge poor unsupervised methods. Journal of American Medical Informatics Association 2014; 21 (5): 842–849.
  • [7] Sizov G. Extraction-based automatic summarization. MSc, Norwegian University of Science and Technology, Trondheim, Norway, 2010.
  • [8] Gambhir M, Gupta V. Recent automatic text summarization techniques: a survey. Artificial Intelligence Review 2016; 47 (1): 1–66.
  • [9] Nenkova A, McKeown K. Automatic summarization. Foundations and Trends in Information Retrieval 2011; 5 (3): 103-233. doi: 10.1561/1500000015.
  • [10] Mihalcea R, Tarau P. TextRank: Bringing order into texts. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing; Barcelona, Spain; 2004. pp. 8–15.
  • [11] Kleinberg J. Authoritative sources in a hyperlinked environment. Journal of the ACM 1999; 46 (5): 604-632.
  • [12] Page L, Brin S. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 1998; 30 (1): 107-117.
  • [13] Erkan G, Radev DR. Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 2004; 22: 457–479.
  • [14] Ferreira R, Freitas R, Cabral LS, Lins RD, Lima R et al. A context based text summarization system. In: IEEE 11th IAPR International Workshop on Document Analysis Systems; Tours, France; 2014. pp. 66-70.
  • [15] Garcia CRC, Lima R, Espinasse B, Oliveira H. Towards coherent single-document summarization: an integer linear programming-based approach. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing; New York, NY, USA; 2018. pp. 712-719.
  • [16] Gong Y, Liu X. Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; New Orleans, LA, USA; 2001. pp. 19–25.
  • [17] Steinberger J, Karel J. Sentence compression for the LSA-based summarizer. In: Proceedings of the 7th International Conference on Information Systems Implementation and Modelling; Prerov, Czech Republic; 2006. pp. 141–148.
  • [18] Steinberger J, Karel J. Text summarization and singular value decomposition. Lecture Notes in Computer Science 2004; 2457: 245–254.
  • [19] Hidayata EH, Firdausillah F, Hastuti K, Dewi IN, Azhari A. Automatic text summarization using latent Drichlet allocation (lda) for document clustering. International Journal of Advances in Intelligent Informatics 2015; 1 (3): 132-139.
  • [20] Blei DM, Ng AY, Jordan M. Latent Dirichlet allocation. Journal of Machine Learning Research 2003; 3: 993-1022.
  • [21] Lee JH, Park S, Ahn CM, Kim D. Automatic generic document summarization based on non-negative matrix factorization. Information Processing and Management 2009; 45 (1): 20-34.
  • [22] Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature 1999; 401 (6755): 788.
  • [23] Mendoza M, Bonilla S, Noguera C, Cobos C, León E. Extractive single-document summarization based on genetic operators and guided local search. Expert Systems with Applications 2014; 41 (9): 4158-4169.
  • [24] Long DH, Nguyen MT, Bach NX, Nguyen LM, Phuong TM. An entailment-based scoring method for content selection in document summarization. In: Proceedings of the Ninth International Symposium on Information and Communication Technology; Danang City, Vietnam; 2018. pp. 122-129.
  • [25] Alguliyev RM, Aliguliyev RM, Isazade NR, Abdi A, Idris N. COSUM: Text summarization based on clustering and optimization. Expert Systems 2019; 36 (1): e12340.
  • [26] Mao X, Yang H, Huang S, Liu Y, Li R. Extractive summarization using supervised and unsupervised learning. Expert Systems with Applications 2019; 133: 173-181.
  • [27] Liu Y, Titov I, Lapata M. Single document summarization as tree induction. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Minneapolis, MN, USA; 2019. pp. 1745-1755.
  • [28] Ramaswamy G, Navratil J, Chaudhari U, Zilca R, Pelecanos J. The IBM systems for the NIST 2003, speaker recognition evaluation. In: NIST 2003 Speaker Recognition Workshop; College Park, MD, USA; 2003. pp. 1-20.
  • [29] Li SJ, Lu Q, Li Q. Experiments of ontology construction with formal concept analysis. In: Proceedings of IJCNLP 05 Workshop on Ontologies and Lexical Resources; Jeju Island, South Korea; 2005. pp. 67-75.
  • [30] Kolya AK, Ekbal A, Bandyopadhyay S. A hybrid approach for event extraction. Polibitis 2012; 46: 55-59.
  • [31] Punyakok V, Roth D, Yih W. The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics 2008; 34: 257-287.
  • [32] Miller AG, Beckwith R, Fellbaum C. Introduction to Wordnet: An on-line lexical database. International Journal of Lexicography 1990; 3 (4): 235–244.
  • [33] Judicibus D. Method, system, and program for merging query search results. US Patent, 2003.
  • [34] Lin CY, Hovy E. Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics; Edmonton, Canada; 2003. pp. 150-157.
  • [35] Pai MY, Chen MY, Chu CH, Chen YM. Development of a semantic-based content mapping mechanism for information retrieval. Expert Systems with Applications 2013; 40 (7): 2447–2461.
  • [36] Zhang S, Lv Q. Hybrid EGU-based group event participation prediction in event-based social networks. KnowledgeBased Systems 2018; 143: 19-29.
  • [37] Filatova E, Hatzivassiloglou V. Event-based extractive summarization. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop; Barcelona, Spain; 2004. pp. 104-111.
  • [38] Wu M. Investigations on event-based summarization. In: Proceedings of the COLING/ACL 2006 Student Research Workshop; Sydney, Australia; 2006. pp. 37–42.
  • [39] Xu W, Li W, Wu M, Yuan C. Deriving event relevance from the ontology constructed with formal concept analysis. Lecture Notes in Computer Science 2006; 3878: 480–489.
  • [40] Liu M, Li W, Wu M, Hu H. Event-based extractive summarization using event semantic relevance from external linguistic resource. In: Sixth International Conference on Advanced Language Processing and Web Information Technology; Luoyang, China; 2007. pp. 117-122.
  • [41] Aksoy C, Bugdayci A, Gur T, Uysal I, Can F. Semantic argument frequency-based multi-document summarization. In: Proceedings of Computer and Information Sciences; Northern Cyprus; 2009. pp. 460-464.
  • [42] Payne T. Describing Morphosyntax: A Guide for Field Linguists. Cambridge, UK: Cambridge University Press, 1997.
  • [43] Glavas G, Snajder J. Event graphs for information retrieval and multi-document. Expert Systems with Applications 2014; 41: 6904–6916.
  • [44] Oliveira H, Ferreira R, Lima R, Lins RD, Freitas F et al. Assessing shallow sentence scoring techniques and combinations for single and multi-document summarization. Expert Systems with Applications 2016; 65: 68-86.
  • [45] Huang D, Hu S, Min YCH. Discovering event evolution graphs based on news articles relationships. In: Proceedings of the IEEE 11th International Conference on e-Business Engineering; Guangzhou, China; 2014. pp. 246-251.
  • [46] Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S et al. The Stanford corenlp natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Baltimore, MD, USA; 2014. pp. 55-60.
  • [47] Goyal A, Kumar M, Gupta V. Named entity recognition: applications, approaches and challenges. International Journal of Advance Research in Science and Engineering 2017; 35 (5): 482-489.
  • [48] Palmer M, Xue N. Calibrating features for semantic role labeling. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language; Barcelona, Spain; 2004. pp. 88–94.
  • [49] Grimshaw J. Argument Structure. Cambridge, MA, USA: MIT Press, 1990.
  • [50] Clark D. Comprehensive English Grammar. Birmingham, UK: The Educational Trading Company, 1871.
  • [51] Shroff KP, Maheta HH. A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy. In: Computer Communication and Informatics (ICCCI) 2015 International Conference; Coimbatore, India; 2015. pp. 1-6.
  • [52] Liu H, Motoda H. Computational Methods of Feature Selection. New York, NY, USA: Chapman and Hall/CRC Press, 2007.
  • [53] Lin CY. ROUGE: A Package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop; Barcelona, Spain; 2004. pp. 74–81.
  • [54] Tomar AS. An implementation of Pearson correlation method for predicting items to user in e-commerce. International Journal of Engineering Sciences and Research Technology 2016; 5 (7): 873-887.
  • [55] Mani KB. Text summarization using deep learning and ridge regression. 2016 arXiv preprint, arXiv: 1612.08333.
  • [56] Gillick D, Favre B, Tür DH, Bohnet B, Liu Y et al. The icsi/utd summarization system at TAC 2009. In: Proceedings of the Text Analysis Conference Workshop; Gaithersburg, MD USA; 2009. pp. 1-20.
  • [57] Suanmali L, Salim N, Binwahlan MS. SRL-GSM: A hybrid approach based on semantic role labeling and general statistic method for text summarization. Journal of Applied Sciences 2010; 10 (3): 166-173.
  • [58] Meena YK, Gopalani D. Feature priority based sentence filtering method for extractive automatic text summarization. Procedia Computer Science 2015; 48: 728-734.
  • [59] Papineni K, Roukos S, Ward T, Zhu WJ. BLEU: A method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL); Philadelphia, PA, USA; 2002. pp. 311-318.
  • [60] Ozsoy MG, Alpaslan NF. Text summarization using latent semantic analysis. Journal of Information Science 2011; 37 (4): 405-417.
  • [61] Henß S, Mieskes M, Gurevych I. A reinforcement learning approach for adaptive single and multi-document summarization. In: Proceedings of International Conference of the German Society for Computational Linguistics and Language Technology; Duisburg-Essen, Germany; 2015. pp. 3-12.