Temporal specificity-based text classification for information retrieval

Temporal specificity-based text classification for information retrieval

Time is an important aspect in temporal information retrieval (TIR), a subfield of information retrieval(IR). Web search engines like Google or Bing are common examples of IR systems. An important constituent of asearch engine is news retrieval, where users present their information needs in the form of temporal queries. Users areusually interested in news documents focusing on a particular time period. Existing search engines rarely fulfill thetemporal information requirements as they ignore the temporal information available in the content of news documents,also known as document focus time. Furthermore, information related to multiple time periods in a news documentmakes the identification of document focus time a challenging task. Therefore, it is necessary to classify news documentsbased on temporal specificity before it is possible to use the temporal information in the retrieval process. In this study,we formulate the temporal specificity problem as a time-based classification task by classifying news documents intothree temporal classes, i.e. high temporal specificity, medium temporal specificity, and low temporal specificity. Forsuch classification, rule-based and temporal specificity score (TSS)-based classification approaches are proposed. In theformer approach, news documents are classified using a defined set of rules that are based on temporal features. Thelater approach classifies news documents based on a TSS score using the temporal features. The results of the proposedtechniques are compared with four machine learning classification algorithms: Bayes net, support vector machine, randomforest, and decision tree. The results show that the proposed rule-based classifier outperforms the four algorithms byachieving 82% accuracy, whereas TSS classification achieves 77% accuracy

___

  • Kocabaş İ, Dinçer BTö Karaoğlan B. Investigation of Luhn’s claim on information retrieval. Turk J Elec Eng & Comp Sci 2011; 6: 993-1004.
  • Nittya K, Kjetil N. Using temporal language models for document dating. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases; 7–11 September 2009; Bled, Solvenia. Berlin, Germany: Springer. pp. 338-341.
  • Niculae V, Zampieri M, Dinu LP, Ciobanu AM. Temporal text ranking and automatic dating of texts. In: Proceedings of the 14th Conference of the European Chapter of the ACL; 26–30 April 2014; Gothenburg, Sweden. pp. 17-21.
  • Zhang R, Konda Y, Dong A, Kolari P, Chang Y, Zheng Z. Learning recurrent event queries for web search. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing; 9–11 October 2010; Massachusetts, USA. pp. 1129-1139.
  • Metzler D, Jones R, Peng F, Zhang R. Improving search relevance for implicitly temporal queries. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval; 19–23 July 2009; Boston, MA, USA. New York, NY, USA: ACM. pp. 700-7001.
  • Li X, Croft WB. Time-based language models. In: Proceedings of the Twelfth International Conference on Information and Knowledge Management; 3–8 November 2003; New Orleans, LA, USA. New York, NY, USA: ACM. pp. 469-475.
  • Costa M, Couto FM, Silva MJ. Learning temporal-dependent ranking models. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval; 6–11 July 2014; Gold Coast, Australia. New York, NY, USA: ACM. pp. 757-766.
  • Nattiya K. Time-aware approaches to information retrieval. PhD, Norwegian University of Science and Technology, Oslo, Norway, 2012.
  • Spitz A, Strotgen J, Bogel T, Gertz M. Terms in time and times in context: a graph based term-time ranking model. In: Proceedings of the 24th International Conference on World Wide Web; 18–22 May 2015; Florence, Italy. New York, NY, USA: ACM. pp. 1378-1380.
  • Strotgen J, Gertz M. Proximity 2-aware ranking for textual, temporal, and geographic queries, In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management; 27 October–7 November 2013; San Francisco, CA, USA. New York, NY, USA: ACM. pp. 739-744.
  • Dakka W, Gravano L, Ipeirotis P. Answering general time-sensitive queries. IEEE T Knowl Data Eng 2012; 2: 220-235.
  • Adam J, Ching M, Katsumi T. Estimating document focus time, In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management; 27 October–1 November 2013; San Francisco, CA, USA: ACM. pp. 2273-2278.
  • Adan J, Ching M, Katsumi T. Generic method for detecting focus time of documents. Inf Process Manag 2015; 6: 851-868.
  • Alonso O, Gertz M, Baeza-Yates R. On the value of temporal information in information retrieval. In: 2007 ACM SIGIR Forum; 2 December 2007. New York, NY, USA: ACM. pp. 35-41.
  • Omar RA. Temporal Information Retrieval. Davis, CA, USA: University of California, Davis, 2008.
  • Filannino M, Nenadic G. Mining temporal footprints from Wikipedia. In: Proceedings of the First AHA!-Workshop on Information Discovery in Text; 23 August 2014; Dublin, Ireland. pp. 7-13.
  • Nattiya K, Kjetil N. Determining time of queries for re-ranking search results. In: International Conference on Theory and Practice of Digital Libraries; 6–20 September 2010; Glasgow, UK. pp. 261-272.
  • Štajner S, Zampieri M. Stylistic changes for temporal text classification. In: Proceedings of the 16th International Conference on Text, Speech and Dialogue; 1–5 September 2013; Pilsen, Czech Republic. Berlin, Germany: Springer. pp. 519–526.
  • Fumiyo F, Yoshimi S. Temporal-based feature selection and transfer learning for text categorization. In: Proceedings of the 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management; 12–14 November 2015; Lisbon, Portugal. pp. 17-26.
  • Luo X, Zincir-Heywood AN. Analyzing the temporal sequences for text categorization. In: Proceedings of the 8th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems; 20–25 September 2004; Wellington, New Zealand. Berlin, Germany: Springer. pp. 498-505.
  • Strotgen J, Gertz M. HeidelTime: High quality rule-based extraction and normalization of temporal expressions. In: Proceedings of the 5th International Workshop on Semantic Evaluation; 15–16 July 2010; Los Angeles, CA, USA. New York, NY, USA: ACM. pp. 321-324.
  • Chang AX, Manning CD. SUTime: A library for recognizing and normalizing time expressions. In: Proceedings of 2012 LREC; 21–27 May 2012; İstanbul, Turkey. pp. 3735-3740.
  • Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971; 76: 378-382.
  • Xiaoxin Y, Jiawei H. CPAR: Classification based on predictive association rules. In: Proceedings of the 2003 SIAM International Conference on Data Mining; 1–3 May 2003; San Francisco, CA, USA. pp. 331-335.