Combining metadata and co-citations for recommending related papers

Identification of relevant documents is performed to keep track of the state-of-the-art methods and relies on research paper recommender systems. The proposed approaches for these systems can be classified into categories like content-based, collaborative filtering-based, and bibliographic information-based approaches. The content-based approaches exploit the full text of articles and provide more promising results than other approaches. However, most content is not freely available because of subscription requirements. Therefore, the scope of content-based approaches is limited. In such scenarios, the best possible alternative could be the exploitation of other openly available resources. Therefore, this research explores the possible use of metadata and bibliographic information to find related articles. The approach incorporates metadata with co-citations to find and rank related articles against a query paper. The similarity score of metadata fields is calculated and combined with co-citations. The proposed approach is evaluated on a newly constructed dataset of 5116 articles. The benchmark ranking against each co-cited document set is established by applying Jensen--Shannon divergence JSD and results are evaluated with the state-of-the-art content-based approach in terms of normalized discounted cumulative gain NDCG . The state-of-the-art content-based approach achieved an NDCG score of 0.86 while the traditional co-citation-based approach scored 0.72. The presented method achieved NDCG scores of 0.73, 0.77, and 0.78 by incorporating the title, co-citation and title, and abstract, respectively, whereas the highest NDCG score of 0.77 was achieved by combining co-citations with metadata. However, better results are achieved by incorporating the title and abstract with NDCG score of 0.81. Therefore, it can be concluded that the proposed approach could be a better alternative in cases where content is unavailable.

___

  • [1] Habib R, Afzal MT. Sections-based bibliographic coupling for research paper recommendation. Scientometrics 2019; 119 (2): 643-656.
  • [2] Breitinger C, Gipp B, Langer S. Research-paper recommender systems: a literature survey. International Journal on Digital Libraries 2015; 17 (4): 305-338.
  • [3] Perc M, Ozer M, Hojnik J. Social and juristic challenges of artificial intelligence. Palgrave Communications 2019; 5 (1): 1-7.
  • [4] Helbing D, Brockmann D, Chadefaux T, Donnay K, Blanke U et al. Saving human lives: What complexity science and information systems can contribute. Journal of Statistical Physics 2015; 158 (3): 735-781.
  • [5] Beel J, Gipp B, Langer S, Breitinger C. Paper recommender systems: a literature survey. International Journal on Digital Libraries 2016; 17 (4): 305-338.
  • [6] Ricci F, Rokach L, Shapira B. Introduction to Recommender Systems Handbook. Recommender Systems Handbook; 2011. p. 1-35.
  • [7] Colavizza G, Boyack KW, van Eck NJ, Waltman L. The closer the better: similarity of publication pairs at different cocitation levels. Journal of the Association for Information Science and Technology 2018; 69 (4): 600-609.
  • [8] Gipp B, Beel J. Citation proximity analysis (CPA): a new approach for identifying related work based on co-citation analysis. In: ISSI’09: 12th International Conference on Scientometrics and Informetrics; Rio de Janeiro, Brazil; 2009. pp. 571-575.
  • [9] Martinčić-Ipšić S, Močibob E, Perc M. Link prediction on Twitter. PLoS One 2017; 12 (7): e0181079.
  • [10] Jalili M, Orouskhani Y, Asgari M, Alipourfard N, Perc M. Link prediction in multiplex online social networks. Royal Society Open Science 2017; 4 (2): 160863.
  • [11] Yin P, Zhang M, Li X. Recommending scientific literatures in a collaborative tagging environment. In: International Conference on Asian Digital Libraries; Berlin, Germany; 2007. pp. 478-481.
  • [12] Manouselis N, Verbert K. Layered evaluation of multi-criteria collaborative filtering for scientific paper recommendation. Procedia Computer Science 2013; 18: 1189-1197.
  • [13] Melville P, Mooney RJ, Nagarajan R. Content-boosted collaborative filtering for improved recommendations. AAAI/IAAI 2002; 23: 187-192.
  • [14] Huang Z, Chung W, Ong TH, Chen H. A graph-based recommender system for digital library. In: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries; Portland, OR, USA; 2002. pp. 65-73.
  • [15] Boyack KW, Small H, Klavans R. Improving the accuracy of co-citation clustering using full text. Journal of the Association for Information Science and Technology 2013; 64 (9): 1759-1767.
  • [16] Kessler MM. Bibliographic coupling between scientific papers. Journal of the Association for Information Science and Technology 1963; 14 (1): 10-25.
  • [17] Small H. Co-citation in the scientific literature: a new measure of the relationship between two documents. Journal of the Association for Information Science and Technology 1973; 24 (4): 265-269.
  • [18] Liu C. The proximity of co-citation. Scientometrics 2012; 91 (2): 495-511.
  • [19] Liu S, Chen C. The effects of co-citation proximity on co-citation analysis. In: 13th International Conference of the International Society for Scientometrics and Informetrics; Durban, South Africa; 2011. pp. 474-484.
  • [20] Caragea C, Wu J, Ciobanu A, Williams K, Fernández-Ramírez J et al. Citeseer x: A scholarly big dataset. In: European Conference on Information Retrieval; Amsterdam, the Netherlands; 2014. pp. 311-322.
  • [21] Lin YS, Jiang JY, Lee SJ. A similarity measure for text classification and clustering. IEEE Transactions on Knowledge and Data Engineering 2013; 26 (7): 1575-1590.
  • [22] Jomsri P, Sanguansintukul S, Choochaiwattana W. A framework for tag-based research paper recommender system: an IR approach. In: IEEE 24th International Conference on Advanced Information Networking and Applications Workshops; Perth, Australia; 2010. pp. 103-108.
  • [23] Naak A, Hage H, Aimeur E. A multi-criteria collaborative filtering approach for research paper recommendation in papyres. In: International Conference on E-Technologies; Les Diablerets, Switzerland; 2009. pp. 25-39.
  • [24] Yang C, Wei B, Wu J, Zhang Y, Zhang L. CARES: A ranking-oriented CADAL recommender system. In: 9th ACM/IEEE-CS Joint Conference on Digital Libraries; Austin, TX, USA; 2009. pp. 203-212.
  • [25] Schafer JB, Frankowski D, Herlocker J, Sen S. Collaborative filtering recommender systems. In: The Adaptive Web; Berlin, Germany; 2007. pp. 291-324.
  • [26] MacRoberts M, MacRoberts B. Problems of citation analysis. Scientometrics 1996; 36 (3): 435-444.
  • [27] Liu M. Progress in documentation the complexities of citation practice: a review of citation studies. Journal of Documentation 1993; 49 (4): 370-408.
  • [28] Shahid A. Recommending relevant papers using in-text citation frequencies and patterns. Capital University of Science and Technology, Islamabad, Pakistan, 2016.
  • [29] Khan AY, Khattak AS, Afzal MT. Extending co-citation using sections of research articles. Turkish Journal of Electrical Engineering & Computer Sciences 2018; 26 (6): 3345-3355.
  • [30] Ahmad R, Afzal MT. CAD: An algorithm for citation-anchors detection in research papers. Scientometrics 2018; 117 (3): 1405-1423.
  • [31] Constantin A, Pettifer S, Voronkov A. PDFX: Fully-automated PDF-to-XML conversion of scientific literature. In: ACM Symposium on Document Engineering; Florence, Italy; 2013. pp. 177-180.
  • [32] Zhang K, Xu H, Tang J, Li J. Keyword extraction using support vector machine. In: International Conference on Web-Age Information Management; Hong Kong; 2006. pp. 85-96.
  • [33] Bhowmik R. Keyword extraction from abstracts and titles. In: IEEE Southeast Conference; Huntsville, AL, USA; 2008; pp. 610-617.
  • [34] Chim H, Deng X. Efficient phrase-based document similarity for clustering. IEEE Transactions on Knowledge and Data Engineering 2008; 20 (9): 1217-1229.
  • [35] Bird S, Loper E. NLTK: The natural language toolkit. In: Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions; Barcelona, Spain; 2004. pp. 31-37.
  • [36] Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems 2002; 20 (4): 422-446.
  • [37] Bigi B. Using Kullback-Leibler distance for text categorization. In: European Conference on Information Retrieval; Pisa, Italy; 2003. pp. 305-319.
  • [38] Fister I, Fister I, Perc M. Toward the discovery of citation cartels in citation networks. Frontiers in Physics 2016; 4: 49.