Satire identification in Turkish news articles based on ensemble of classifiers

Social media and microblogging platforms generally contain elements of figurative and nonliteral language, including satire. The identification of figurative language is a fundamental task for sentiment analysis. It will not be possible to obtain sentiment analysis methods with high classification accuracy if elements of figurative language have not been properly identified. Satirical text is a kind of figurative language, in which irony and humor have been utilized to ridicule or criticize an event or entity. Satirical news is a pervasive issue on social media platforms, which can be deceptive and harmful. This paper presents an ensemble scheme for satirical news identification in Turkish news articles. In the presented scheme, linguistic and psychological feature sets have been utilized to extract the feature sets i.e. linguistic, psychological, personal, spoken categories, and punctuation . In the classification phase, accuracy rates of five supervised learning algorithms i.e. naive Bayes algorithm, logistic regression, support vector machines, random forest, and k-nearest neighbor algorithm with three widely utilized ensemble methods i.e. AdaBoost, bagging, and random subspace have been considered. Based on the results, we concluded that the random forest algorithm yielded the highest performance, with a classification accuracy of 96.92% for satire detection in Turkish. For deep learning-based architectures, we have achieved classification accuracy of 97.72% with the recurrent neural network architecture with attention mechanism.

___

  • [1] Cambria E. Affective computing and sentiment analysis. IEEE Intelligent Systems 2016; 31: 102-107.
  • [2] Onan A. Deep learning based sentiment analysis on product reviews on Twitter. In: The International Conference on Big Data Innovations and Applications; Istanbul, Turkey; 2019. pp. 80-91.
  • [3] Toçoğlu MA, Onan A. Satire detection in Turkish news articles: a machine learning approach. In: The International Conference on Big Data Innovations and Applications; Istanbul, Turkey; 2019. pp. 107-117.
  • [4] Poria S, Cambria E, Hazarika D, Vij P. A deeper look into sarcastic tweets using deep convolutional neural networks. arXiv preprint arXiv:1610.08815, 2016.
  • [5] Davidov D, Tsur O, Rappoport A. Semi-supervised recognition of sarcastic sentences in Twitter and Amazon. In: The 14th Conference on Computational Natural Language Learning; Uppsala, Sweden; 2010. pp. 107-116.
  • [6] González-Ibánez R, Muresan S, Wacholder N. Identifying sarcasm in Twitter: a closer look. In: The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers; Portland, OR, USA; 2011. pp. 581-586.
  • [7] Filatova E. Irony and sarcasm: Corpus generation and analysis using crowdsourcing. In: The 8th International Conference on Language Resources and Evaluation (LREC); Istanbul, Turkey; 2012. pp. 392-398.
  • [8] Del Pilar Salas-Zárate M, Paredes-Valverde MA, Rodriguez-Garcia MA, Valencia-García R, Alor-Hernández G. Automatic detection of satire in Twitter: A psycholinguistic-based approach. Knowledge-Based Systems 2017; 128: 20-33.
  • [9] Francis ME, Pennebaker JW. LIWC: Linguistic Inquiry and Word Count. Dallas, TX, USA: Southern Methodist University, 1993.
  • [10] Brubaker JR, Kivran-Swaine F, Taber L, Hayes GR. Grief-Stricken in a crowd: The language of bereavement and distress in social media. In: 6th International AAAI Conference on Weblogs and Social Media; Dublin, Ireland; 2012. pp. 42-49.
  • [11] Rubin VL, Conroy NJ, Chen Y, Cornwell S. Fake news or truth? Using satirical cues to detect potentially misleading news. In: The Workshop on Computational Approaches to Deception Detection at the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-CADD2016); San Diego, CA, USA; 2016. pp. 7–17.
  • [12] Justo R, Corcoran T, Lukin SM, Walker M, Torres MI. Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web. Knowledge-Based Systems 2014; 69: 124–133.
  • [13] Ahmad T, Akhtar H, Chopra A, Akhtar MW. Satire detection from web documents using machine learning methods. In: The International Conference on Soft Computing and Machine Intelligence; New Delhi, India; 2014. pp. 102-105.
  • [14] Barbieri F, Ronzano F, Saggion H. Is this Tweet satirical? A computational approach for satire detection in Spanish. Procesamiento del Lenguaje Natural 2015; 55: 135-142.
  • [15] Barbieri F, Ronzano F, Saggion H. Do we criticise (and laugh) in the same way? Automatic detection of multilingual satirical news in Twitter. In: The 24th International Joint Conference on Artificial Intelligence; Buenos Aires, Argentina; 2015. pp. 1215-1221.
  • [16] Perez-Rosas V, Kleinberg B, Lefevre A, Mihalcea R. Automatic detection of fake news. arXiv preprint arXiv:1708.07104, 2017.
  • [17] Ravi K, Ravi V. A novel automatic satire and irony detection using ensembled feature selection and data mining. Knowledge-Based Systems 2017; 120: 15-33.
  • [18] Ahmed H, Traore I, Saad S. Detection of online fake news using n-gram analysis and machine learning techniques. In: International Conference on Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments; Vancouver, BC, Canada; 2017. pp. 127-138.
  • [19] Yang F, Mukherjee A, Dragut E. Satirical news detection and analysis using attention mechanism and linguistic features. arXiv preprint arXiv:1709.01189, 2017.
  • [20] Ravi K, Ravi V. Irony detection using neural network language model, psycholinguistic features and text mining. In: The 17th International Conference on Cognitive Informatics and Cognitive Computing; Berkeley, CA, USA; 2018. pp. 254-260.
  • [21] Mihalcea R, Pulman S. Linguistic ethnography: Identifying dominant word classes in text. In: 10th International Conference on Intelligent Text Processing and Computational Linguistics; Mexico City, Mexico; 2009. pp. 594–602.
  • [22] Pennebaker JW, Boyd RL, Jordan K, Blackburn K. The development and psychometric properties of LIWC 2015, Austin, TX, USA: University of Texas at Austin, 2015.
  • [23] Onan A. Sentiment analysis on Twitter based on ensemble of psychological and linguistic feature sets. Balkan Journal of Electrical and Computer Engineering 2018; 6: 69-77.
  • [24] Onan A. Classifier and feature set ensembles for web page classification. Journal of Information Science 2016; 42: 150-165.
  • [25] Onan A, Korukoğlu S, Bulut H. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications 2016; 57: 232-247.
  • [26] Kantardzic M. Data Mining: Concepts, Models, Methods, and Algorithms. New York, NY, USA: John Wiley & Sons, 2011.
  • [27] Cortes C, Vapnik V. Support-vector networks. Machine Learning 1995; 20: 273-297.
  • [28] Breiman L. Random forests. Machine Learning 2001; 45: 5-32.
  • [29] Zhou ZH. Ensemble Methods: Found Algorithm. Boca Raton, FL, USA: CRC Press, 2012.
  • [30] Breiman L. Bagging predictors. Machine Learning 1996; 24: 123-140.
  • [31] Ho TK. The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 1998; 20: 832-844.