Deep Combination of Stylometry Features for Authorship Analysis
Deep Combination of Stylometry Features for Authorship Analysis
Authorship Analysis AA is a process aim to extract information about an author from his/her writings. To analyze whether two anonymous short texts were written by the same author, we propose a combination of stylometry features from different categories in different progress. The majority of the previous AA studies use many stylometry features from different categories together at the beginning of a solution as a pre-processing step. During the learning process, no category-specific operations are performed; all categories used are evaluated equally. However, the proposed approach has a separate learning process for each feature category and combines these processes at the decision phase by using a Combination of Deep Neural Networks C-DNN . To evaluate the Authorship Verification AV performance of the proposed approach, we designed and implemented a problem-specific Deep Neural Network DNN for each stylometry category we used. Experiments were conducted on two English public datasets. The results show that the proposed approach significantly improves the generalization ability and robustness of the solutions, and also have better accuracy than the single DNNs.
___
- [1] N. Pokhriyal, K. Tayal, I. Nwogu, and V. Govindaraju,
“Cognitive-Biometric Recognition from Language Usage: A
Feasibility Study,” Transactions on Information Forensics
and Security, vol. 12, no. 1, pp. 134-143, 2016.
- [2] T. Neal, K. Sundararajan, and D. Woodard, “Exploiting
linguistic style as a cognitive biometric for continuous
verification,” in Proceedings - 2018 International Conference
on Biometrics, ICB 2018, pp. 270–276, 2018.
- [3] T. Neal, K. Sundararajan, A. Fatima, Y. Yan, Y. Xiang,
and D. Woodard, “Surveying stylometry techniques and
applications,” ACM Computing Surveys (CSUR), vol. 50,
no. 6, pp. 1-36, 2017.
- [4] P. Juola, “Authorship attribution,” vol. 3, Now Publishers
Inc, 2008.
- [5] M. L. Brocardo, I. Traore, S. Saad, and I. Woungang,
“Authorship verification for short messages using
stylometry,” in 2013 International Conference on Computer,
Information and Telecommunication Systems, CITS 2013,
pp. 1-6, 2013.
- [6] M. Koppel, J. Schlier, and S. Argamon, “Computational
methods in authorship attribution,” Journal of the American
Society for information Science and Technology, vol. 60, no.
1, pp. 9–26, 2009.
- [7] M. Koppel, J. Schler, S. Argamon, and Y. Winter, “The
‘Fundamental Problem’ of Authorship Attribution,” English
Studies, vol. 93, no. 3, pp. 284–291, 2012.
- [8] F. Iqbal, H. Binsalleeh, B. C. M. Fung, and M. Debbabi,
“Mining writeprints from anonymous e-mails for forensic
investigation,” Digital Investigation, vol. 7, no. 1-2, pp. 56-
64, 2010.
- [9] E. Stamatatos, “Authorship verification: a review of
recent advances,” Research in Computing Science, vol. 123,
pp. 9-25, 2016.
- [10] M. Litvak, “Deep dive into authorship verification of
email messages with convolutional neural network,” Annual
International Symposium on Information Management and
Big Data. Springer, Cham, pp. 129-136, 2018.
- [11] S. H. H. Ding, B. C. M. Fung, F. Iqbal, and W. K.
Cheung, “Learning stylometric representations for authorship
analysis,” IEEE transactions on cybernetics, vol. 49, no. 1,
pp. 107-121, 2019.
- [12] B. Boenninghoff, R. M. Nickel, S. Zeiler, and D.
Kolossa, “Similarity Learning for Authorship Verification in
Social Media,” in ICASSP, IEEE International Conference
on Acoustics, Speech and Signal Processing, pp. 2457-2461,
2019.
- [13] R. Zheng, J. Li, H. Chen, and Z. Huang, “A framework
for authorship identification of online messages: Writingstyle features and classification techniques,” Journal of the
American society for information science and technology,
vol. 57, no. 3, pp. 378-393, 2006.
- [14] O. Halvani, C. Winter, and A. Pflug, “Authorship
verification for different languages, genres and topics,”
Digital Investigation, vol. 16, pp. S33–S43, 2016.
- [15] P. Varela, E. Justino, A. Britto, and F. Bortolozzi, “A
computational approach for authorship attribution of literary
texts using sintatic features,” in Proceedings of the
International Joint Conference on Neural Networks (IJCNN).
IEEE, pp.4835-4842, 2016.
- [16] J. Dunn, S. Argamon, A. Rasooli, and G. Kumar,
“Profile-based authorship analysis,” Digital Scholarship in
the Humanities, vol. 31, no. 4, pp. 689-710., 2016.
- [17] S. Afroz, A. Caliskan-Islam, A. Stolerman, R.
Greenstadt, and D. McCoy, “Doppelgänger finder: Taking
stylometry to the underground,” in Proceedings - IEEE
Symposium on Security and Privacy, pp. 212-226, 2014.
- [18] Z. Ahmad and J. Zhang, “Selective combination of
multiple neural networks for improving model prediction in
nonlinear systems modelling through forward selection and
backward elimination,” Neurocomputing, vol. 72, no. 4-6,
pp. 1198-1204, 2009.
- [19] E. Stamatatos, “A survey of modern authorship
attribution methods,” Journal of the American Society for
information Science and Technology, vol. 60, no. 3, pp. 538-
556, 2009.
- [20] P. Rosso, M. Potthast, B. Stein, E. Stamatatos, F.
Rangel, and W. Daelemans, “Evolution of the PAN Lab on
Digital Text Forensics,” In Information Retrieval Evaluation
in a Changing World, Springer, Cham, pp. 461-485, 2019.
- [21] E. Stamatatos, G. Kokkinakis, and N. Fakotakis,
“Automatic text categorization in terms of genre and author,”
Computational Linguistics, vol. 26, no. 4, pp. 471–495,
2000.
- [22] M. Koppel and Y. Winter, “Determining if two
documents are written by the same author,” Journal of the
Association for Information Science and Technology., vol.
65, no. 1, pp. 178–187, 2014.
- [23] S. Adamovic, V. Miskovic, M. Milosavljevic, M. Sarac,
and M. Veinovic, “Automated language‐independent
authorship verification (for Indo‐European languages),”
Journal of the Association for Information Science and
Technology, vol. 70, no. 8, pp. 858–871, 2019.
- [24] A. Abbasi and H. Chen, “Applying authorship analysis
to extremist-group Web forum messages,” IEEE Intelligent
Systems., vol. 20, no. 5, pp. 67-75, 2005.
- [25] P. Shrestha, S. Sierra, F. A. González, P. Rosso, M.
Montes-Y-Gómez, and T. Solorio, “Convolutional neural
networks for authorship attribution of short texts,” in 15th
Conference of the European Chapter of the Association for
Computational Linguistics, EACL 2017, pp. 669-674, 2017.
- [26] Jafariakinabad, Fereshteh, S. Tarnpradab, and K. A.
Hua, “Syntactic Neural Model for Authorship Attribution,”
in The Thirty-Third International Flairs Conference, pp. 234-
239, 2020.
- [27] F. Jafariakinabad and K. A. Hua, “Style-aware neural
model with application in authorship attribution,” 18th IEEE
International Conference on Machine Learning and
Applications, ICMLA 2019, pp. 325-328, 2019.
- [28] M. L. Brocardo, I. Traore, I. Woungang, and M. S.
Obaidat, “Authorship verification using deep belief network
systems,” International Journal of Communication Systems,
vol. 30, no. 12, e3259, 2017.
- [29] M. Koppel and J. Schler, “Authorship verification as a
one-class classification problem,” in Proceedings, TwentyFirst International Conference on Machine Learning, ICML
2004, pp. 489–495, 2004.
- [30] E. Stamatatos et al., “Overview of the author
identification task at PAN 2015,” in CEUR Workshop
Proceedings, vol. 1391, pp. 1–8, 2015.
- [31] S. Seidman, “Authorship verification using the
impostors method: Notebook for PAN at CLEF 2013,” in
CEUR Workshop Proceedings, vol. 1179, pp. 23-26, 2013.
- [32] C. Sanderson and S. Guenter, “Short text authorship
attribution via sequence kernels, Markov chains and author
unmasking: An investigation,” in COLING/ACL 2006 -
EMNLP 2006: 2006 Conference on Empirical Methods in
Natural Language Processing, Proceedings of the
Conference, pp. 482–491, 2006.
- [33] W. Liu, Z. Wang, X. Liu, N. Zeng, Y. Liu, and F. E.
Alsaadi, “A survey of deep neural network architectures and
their applications,” Neurocomputing, vol. 234, pp. 11-26,
2017.