USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS

USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS

This study investigates the impact of using textual features for the detection of vandalism across low-resource language sections in Wikipedia. For this purpose, we propose new features that allow the machine learning-based text classifiers to better distinguish vandalism and to improve the detection rates of vandalism across languages, based on textual features applied in previous researches. These features enable us to compare the contributions of the bots against vandalism, stressing the differences between bots and editors with regards to the detection of vandalism. We propose a new set of efficient and language independent features, which has the performance level similar to the previous sets. Three Wikipedia sections will be used for this purpose: Simple English (simple), Albanian (sq) and Bosnian (bs). We will show that our set of textual features has similar and, in some cases, better vandalism detection rates across languages than previous research.   

___

  • Adler B. T, de Alfaro L., Pye I., 2008, “Measuring author contributions to the Wikipedia. In: WikiSym ’08, Porto, Portugal, 8-10 September 2008. New York: ACM.
  • Adler B. T., de Alfaro L., Mola-Velasco S. M., Rosso P., and West A. G., 2011, “Wikipedia vandalism detection: Combining natural language, metadata, and reputation features”. In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing - Volume Part II, CICLing'11, pages 277 - 288, Berlin, Heidelberg, Springer-Verlag.
  • Davis J. and Goadrich M., 2006, “The Relationship Between Precision-Recall and ROCCurves”. In Proceedings of the 23rd International Conference on Machine learning (ICML), 2006.
  • Geiger R. S. and Ribes D., 2010, “The Work of Sustaining Order in Wikipedia: The Banning of a Vandal”. In Proceedings of the 22nd ACM Conference on Computer Supported Cooperative Work (CSCW).
  • Hunt J. W., Mcllroy M. D, 1974, “An Algorithm for Differential File Comparison”, Computer Science Technical Report, Bell Laboratories.
  • Massey F. J., 1951, “The Kolmogorov-Smirnov Test for Goodness of Fit”. Journal of the American Statistical Association, 46.
  • Mola-Velasco S. M., 2010, “Wikipedia Vandalism Detection Through Machine Learning: Feature Review and New Proposals”. In CLEF (Notebook Papers/Labs/-Workshops).
  • Susuri A., Hamiti M. and Dika A, 2016, “Machine Learning Based Detection of Vandalism in Wikipedia across Languages”. In proceedings of the 5th Mediterranean Conference on Embedded Computing (MECO), Bar, Montenegro.
  • Tran K.N., Christen P., 2013 "Cross-language prediction of vandalism on wikipedia using article views and revisions". Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD).
  • West A. G., 2013, “Damage Detection and Mitigation in Open Collaboration Applications”, Ph.D. thesis, University of Pennsylvania.
  • West A. G. and Lee I., 2011, “Multilingual Vandalism Detection using Language-Independent & Ex Post Facto Evidence”. In CLEF (Notebook Papers/Labs/Workshops).

___

Bibtex @araştırma makalesi { pap371613, journal = {PressAcademia Procedia}, eissn = {2459-0762}, address = {Siteler Sok. No.12/18 Maltepe, 34843, Istanbul}, publisher = {PressAcademia}, year = {2017}, volume = {5}, number = {1}, pages = {80 - 87}, doi = {10.17261/Pressacademia.2017.575}, title = {USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS}, key = {cite}, author = {Susuri, Arsim and Hamiti, Mentor and Dika, Agni} }
APA Susuri, A. , Hamiti, M. & Dika, A. (2017). USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS . PressAcademia Procedia , 5 (1) , 80-87 . DOI: 10.17261/Pressacademia.2017.575
MLA Susuri, A. , Hamiti, M. , Dika, A. "USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS" . PressAcademia Procedia 5 (2017 ): 80-87 <
Chicago Susuri, A. , Hamiti, M. , Dika, A. "USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS". PressAcademia Procedia 5 (2017 ): 80-87
RIS TY - JOUR T1 - USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS AU - ArsimSusuri, MentorHamiti, AgniDika Y1 - 2017 PY - 2017 N1 - doi: 10.17261/Pressacademia.2017.575 DO - 10.17261/Pressacademia.2017.575 T2 - PressAcademia Procedia JF - Journal JO - JOR SP - 80 EP - 87 VL - 5 IS - 1 SN - -2459-0762 M3 - doi: 10.17261/Pressacademia.2017.575 UR - Y2 - 2022 ER -
EndNote %0 PressAcademia Procedia USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS %A Arsim Susuri , Mentor Hamiti , Agni Dika %T USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS %D 2017 %J PressAcademia Procedia %P -2459-0762 %V 5 %N 1 %R doi: 10.17261/Pressacademia.2017.575 %U 10.17261/Pressacademia.2017.575
ISNAD Susuri, Arsim , Hamiti, Mentor , Dika, Agni . "USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS". PressAcademia Procedia 5 / 1 (Haziran 2017): 80-87 .
AMA Susuri A. , Hamiti M. , Dika A. USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS. PAP. 2017; 5(1): 80-87.
Vancouver Susuri A. , Hamiti M. , Dika A. USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS. PressAcademia Procedia. 2017; 5(1): 80-87.
IEEE A. Susuri , M. Hamiti ve A. Dika , "USING TEXTUAL FEATURES FOR THE DETECTION OF VANDALISM IN WIKIPEDIA: A COMPARATIVE APPROACH IN LOW-RESOURCE LANGUAGE SECTIONS", , c. 5, sayı. 1, ss. 80-87, Haz. 2017, doi:10.17261/Pressacademia.2017.575