Pelin CANBAY, Ebru Akcapinar SEZER, Hayri SEVER

Deep Combination of Stylometry Features for Authorship Analysis

Authorship Analysis AA is a process aim to extract information about an author from his/her writings. To analyze whether two anonymous short texts were written by the same author, we propose a combination of stylometry features from different categories in different progress. The majority of the previous AA studies use many stylometry features from different categories together at the beginning of a solution as a pre-processing step. During the learning process, no category-specific operations are performed; all categories used are evaluated equally. However, the proposed approach has a separate learning process for each feature category and combines these processes at the decision phase by using a Combination of Deep Neural Networks C-DNN . To evaluate the Authorship Verification AV performance of the proposed approach, we designed and implemented a problem-specific Deep Neural Network DNN for each stylometry category we used. Experiments were conducted on two English public datasets. The results show that the proposed approach significantly improves the generalization ability and robustness of the solutions, and also have better accuracy than the single DNNs.

Keywords:

Forensic Authorship Analysis Deep Neural Networks, Neural Network Combination, anonymous document pairs,

PDF

___

[1] N. Pokhriyal, K. Tayal, I. Nwogu, and V. Govindaraju, “Cognitive-Biometric Recognition from Language Usage: A Feasibility Study,” Transactions on Information Forensics and Security, vol. 12, no. 1, pp. 134-143, 2016.
[2] T. Neal, K. Sundararajan, and D. Woodard, “Exploiting linguistic style as a cognitive biometric for continuous verification,” in Proceedings - 2018 International Conference on Biometrics, ICB 2018, pp. 270–276, 2018.
[3] T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang, and D. Woodard, “Surveying stylometry techniques and applications,” ACM Computing Surveys (CSUR), vol. 50, no. 6, pp. 1-36, 2017.
[4] P. Juola, “Authorship attribution,” vol. 3, Now Publishers Inc, 2008.
[5] M. L. Brocardo, I. Traore, S. Saad, and I. Woungang, “Authorship verification for short messages using stylometry,” in 2013 International Conference on Computer, Information and Telecommunication Systems, CITS 2013, pp. 1-6, 2013.
[6] M. Koppel, J. Schlier, and S. Argamon, “Computational methods in authorship attribution,” Journal of the American Society for information Science and Technology, vol. 60, no. 1, pp. 9–26, 2009.
[7] M. Koppel, J. Schler, S. Argamon, and Y. Winter, “The ‘Fundamental Problem’ of Authorship Attribution,” English Studies, vol. 93, no. 3, pp. 284–291, 2012.
[8] F. Iqbal, H. Binsalleeh, B. C. M. Fung, and M. Debbabi, “Mining writeprints from anonymous e-mails for forensic investigation,” Digital Investigation, vol. 7, no. 1-2, pp. 56- 64, 2010.
[9] E. Stamatatos, “Authorship verification: a review of recent advances,” Research in Computing Science, vol. 123, pp. 9-25, 2016.
[10] M. Litvak, “Deep dive into authorship verification of email messages with convolutional neural network,” Annual International Symposium on Information Management and Big Data. Springer, Cham, pp. 129-136, 2018.
[11] S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K. Cheung, “Learning stylometric representations for authorship analysis,” IEEE transactions on cybernetics, vol. 49, no. 1, pp. 107-121, 2019.
[12] B. Boenninghoff, R. M. Nickel, S. Zeiler, and D. Kolossa, “Similarity Learning for Authorship Verification in Social Media,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2457-2461, 2019.
[13] R. Zheng, J. Li, H. Chen, and Z. Huang, “A framework for authorship identification of online messages: Writingstyle features and classification techniques,” Journal of the American society for information science and technology, vol. 57, no. 3, pp. 378-393, 2006.
[14] O. Halvani, C. Winter, and A. Pflug, “Authorship verification for different languages, genres and topics,” Digital Investigation, vol. 16, pp. S33–S43, 2016.
[15] P. Varela, E. Justino, A. Britto, and F. Bortolozzi, “A computational approach for authorship attribution of literary texts using sintatic features,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN). IEEE, pp.4835-4842, 2016.
[16] J. Dunn, S. Argamon, A. Rasooli, and G. Kumar, “Profile-based authorship analysis,” Digital Scholarship in the Humanities, vol. 31, no. 4, pp. 689-710., 2016.
[17] S. Afroz, A. Caliskan-Islam, A. Stolerman, R. Greenstadt, and D. McCoy, “Doppelgänger finder: Taking stylometry to the underground,” in Proceedings - IEEE Symposium on Security and Privacy, pp. 212-226, 2014.
[18] Z. Ahmad and J. Zhang, “Selective combination of multiple neural networks for improving model prediction in nonlinear systems modelling through forward selection and backward elimination,” Neurocomputing, vol. 72, no. 4-6, pp. 1198-1204, 2009.
[19] E. Stamatatos, “A survey of modern authorship attribution methods,” Journal of the American Society for information Science and Technology, vol. 60, no. 3, pp. 538- 556, 2009.
[20] P. Rosso, M. Potthast, B. Stein, E. Stamatatos, F. Rangel, and W. Daelemans, “Evolution of the PAN Lab on Digital Text Forensics,” In Information Retrieval Evaluation in a Changing World, Springer, Cham, pp. 461-485, 2019.
[21] E. Stamatatos, G. Kokkinakis, and N. Fakotakis, “Automatic text categorization in terms of genre and author,” Computational Linguistics, vol. 26, no. 4, pp. 471–495, 2000.
[22] M. Koppel and Y. Winter, “Determining if two documents are written by the same author,” Journal of the Association for Information Science and Technology., vol. 65, no. 1, pp. 178–187, 2014.
[23] S. Adamovic, V. Miskovic, M. Milosavljevic, M. Sarac, and M. Veinovic, “Automated language‐independent authorship verification (for Indo‐European languages),” Journal of the Association for Information Science and Technology, vol. 70, no. 8, pp. 858–871, 2019.
[24] A. Abbasi and H. Chen, “Applying authorship analysis to extremist-group Web forum messages,” IEEE Intelligent Systems., vol. 20, no. 5, pp. 67-75, 2005.
[25] P. Shrestha, S. Sierra, F. A. González, P. Rosso, M. Montes-Y-Gómez, and T. Solorio, “Convolutional neural networks for authorship attribution of short texts,” in 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, pp. 669-674, 2017.
[26] Jafariakinabad, Fereshteh, S. Tarnpradab, and K. A. Hua, “Syntactic Neural Model for Authorship Attribution,” in The Thirty-Third International Flairs Conference, pp. 234- 239, 2020.
[27] F. Jafariakinabad and K. A. Hua, “Style-aware neural model with application in authorship attribution,” 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, pp. 325-328, 2019.
[28] M. L. Brocardo, I. Traore, I. Woungang, and M. S. Obaidat, “Authorship verification using deep belief network systems,” International Journal of Communication Systems, vol. 30, no. 12, e3259, 2017.
[29] M. Koppel and J. Schler, “Authorship verification as a one-class classification problem,” in Proceedings, TwentyFirst International Conference on Machine Learning, ICML 2004, pp. 489–495, 2004.
[30] E. Stamatatos et al., “Overview of the author identification task at PAN 2015,” in CEUR Workshop Proceedings, vol. 1391, pp. 1–8, 2015.
[31] S. Seidman, “Authorship verification using the impostors method: Notebook for PAN at CLEF 2013,” in CEUR Workshop Proceedings, vol. 1179, pp. 23-26, 2013.
[32] C. Sanderson and S. Guenter, “Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation,” in COLING/ACL 2006 - EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, pp. 482–491, 2006.
[33] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E. Alsaadi, “A survey of deep neural network architectures and their applications,” Neurocomputing, vol. 234, pp. 11-26, 2017.

International Journal of Information Security Science-Cover

Yayın Aralığı: Yılda 4 Sayı
Başlangıç: 2012
Yayıncı: Şeref SAĞIROĞLU

Arşiv

Sayıdaki Diğer Makaleler

Deep Combination of Stylometry Features for Authorship Analysis

Pelin CANBAY, Ebru Akcapinar SEZER, Hayri SEVER