Automated citation sentiment analysis using high order n-grams: a preliminary investigation

Automated citation sentiment analysis using high order n-grams: a preliminary investigation

Scientific papers hold an association with previous research contributions (i.e. books, journals or conferencepapers, and web resources) in the form of citations. Citations are deemed as a link or relatedness of the previous workto the cited work. The nature of the cited material could be supportive (positive), contrastive (negative), or objective(neutral). Extraction of the author’s sentiment towards the cited scientific articles is an emerging research discipline dueto various linguistic differences between the citation sentences and other domains of sentiment analysis. In this paper,we propose a technique for the identification of the sentiment of the citing author towards the cited paper by extractingunigram, bigram, trigram, and pentagram adjective and adverb patterns from the citation text. After POS tagging ofthe citation text, we use the sentence parser for the extraction of linguistic features comprising adjectives, adverbs, andn-grams from the citation text. A sentiment score is then assigned to distinguish them as positive, negative, and neutral.In addition, the proposed technique is compared with manually classified citation text and 2 commercial tools, namelySEMANTRIA and THEYSAY, to determine their applicability to the citation corpus. These tools are based on differenttechniques for determining the sentiment orientation of the sentence. Analysis of the results shows that our proposedapproach has achieved results comparable to the commercial counterparts with average precision, recall, and accuracyof 90%, 81.82%, and 85.91% respectively.

___

  • [1] Liu H. Sentiment Analysis of Citations Using Word2vec. arXiv preprint 1704.00177.
  • [2] Ma Z, Nam J, Weihe K. Improve sentiment analysis of citations with author modelling. In: Proceedings of WASSA 2016; San Diego, CA, USA. pp. 122-127.
  • [3] Klavans R, Boyack KW. Which type of citation analysis generates the most accurate taxonomy of scientific and technical knowledge? J Assoc Inf Sci Technol 2017; 68: 984-998.
  • [4] Oh J, Chang H, Kim JA, Choi M, Park Z, Cho Y, Lee EG. Citation analysis for biomedical and health sciences journals published in Korea. Healthc Inform Res 2017; 23: 218-225.
  • [5] Gurzki H, Woisetschläger DM. Mapping the luxury research landscape: a bibliometric citation analysis. J Bus Res 2017; 77: 147-166.
  • [6] Alam H, Kumar A, Werner T, Vyas M. Are cited references meaningful? Measuring semantic relatedness in citation analysis. In: ACM Proceedings of BIRNDL 2017; Tokyo, Japan.
  • [7] Meng R, Lu W, Chi Y, Han S. Automatic classification of citation function by new linguistic features. In: iConf Proceedings 2017; Wuhan, China.
  • [8] Nobre GC, Tavares E. Scientific literature analysis on big data and internet of things applications on circular economy: a bibliometric study. Scientometrics 2017; 111: 463-492.
  • [9] Wang Y, Bowers AJ, Fikis DJ. Automated text data mining analysis of five decades of educational leadership research literature: probabilistic topic modeling of EAQ articles from 1965 to 2014. Edu Adm Qua 2017; 53: 289-323.
  • [10] Hernández-Alvarez M, Gómez JM. Survey about citation context analysis: tasks, techniques, and resources. Nat Lang Eng 2016; 22: 327-349.
  • [11] Thelwall M. Data science altmetrics. J Data Inf Sci 2016; 1: 7-12.
  • [12] Fister JRI, Fister I, Perc M. Toward the discovery of citation cartels in citation networks. Front Phys 2016; 4: 49.
  • [13] Saggion H, Ronzano F. Scholarly data mining: making sense of scientific literature. In: ACM/IEEE Proceedings of JCDL 2017; June 2017; Toronto, Canada. pp. 1-2
  • [14] Jha R, Jbara AA, Qazvinian V, Radev DR. NLP-driven citation analysis for scientometrics. Nat Lang Eng 2017; 23: 93-130.
  • [15] Kuhn T, Perc M, Helbing D. Inheritance patterns in citation networks reveal scientific memes. Phys Rev X 2014; 4: 041036.
  • [16] Perc M. Self-organization of progress across the century of physics. Sci Rep 2013; 3: 1720.
  • [17] Michel JB, Shen YK, Aiden AP, Veres A, Gray MK, Pickett JP, Pinker S. Quantitative analysis of culture using millions of digitized books. Science 2011; 331: 176-182.
  • [18] Gao J, Hu J, Mao X, Perc M. Culturomics meets random fractal theory: insights into long-range correlations of social and natural phenomena over the past two centuries. J R Soc Interface 2012; 2012: rsif20110846.
  • [19] Li B, Liu T, Du X, Zhang D, Zhao Z. Learning document embeddings by predicting n-grams for sentiment classification of long movie reviews. arXiv preprint 1512.08183.
  • [20] Tembhurnikar SD, Patil NN. Topic detection using BNgram method and sentiment analysis on twitter dataset. In: IEEE Proceedings of ICRITO; September 2015; Noida, India. pp. 1-6.
  • [21] Georgiou D, MacFarlane A, Russell-Rose T. Extracting sentiment from healthcare survey data: An evaluation of sentiment analysis tools. In: IEEE Proceedings of SAI; July 2015; London, UK. pp. 352-361.
  • [22] Perc M. Evolution of the most common English words and phrases over the centuries. J R Soc Interface 2012; 2012: rsif20120491.
  • [23] Parthasarathy G, Tomar DC. Sentiment analyzer: analysis of journal citations from citation databases. In: IEEE Proceedings of Confluence; September 2014; Noida, India. pp. 923-928.
  • [24] Parthasarathy G, Tomar DC. A survey of sentiment analysis for journal citation. Indian Journal of Science and Technology 2015; 8: 35.
  • [25] Ikram MT, Butt NA, Afzal MT. Open source software adoption evaluation through feature level sentiment analysis using Twitter data. Turk J Elec Eng & Comp Sci 2016; 24: 4481-4496.
  • [26] Athar A. Sentiment analysis of citations using sentence structure-based features. In: ACM Proceedings of ACL; June 2011. pp. 81-87.
  • [27] Kim IC, Thoma GR. Automated classification of author’s sentiments in citation using machine learning techniques: a preliminary study. In: IEEE Proceedings of CIBCB; 12 August 2015; Manchester, UK. pp. 1-7.
  • [28] Tandon N, Jain A. Citation context sentiment analysis for structured summarization of research papers. In: Springer Proceedings of KI; 24 September 2012; Saarbrücken, Germany. pp. 98-102.
  • [29] Yu B. Automated citation sentiment analysis: What can we learn from biomedical researchers? In: Proceedings of ASIST; 1 January 2013. pp. 1-9.
  • [30] Athar A, Teufel S. Context-enhanced citation sentiment detection. In: ACL Proceedings of HLT; 3 June 2012; Montreal, Canada. pp. 597-601.
  • [31] Butt BH, Rafi M, Jamal A, Rehman RSU, Alam SMZ, Alam MB. Classification of research citations (CRC). arXiv preprint 1506.08966.
  • [32] Hernández M, Gómez JM. Sentiment, polarity and function analysis in bibliometrics: a review. In: Natural Language Processing and Cognitive Science; 10 March 2015. p. 149.
  • [33] Stamou S, Mpouloumpasis N, Kozanidis L. Deriving the impact of scientific publications by mining citation opinion terms. Journal of Digital Information Management 2009; 7: 283-289.