A new algorithm for detection of link spam contributed by zero-out link pages

A new algorithm for detection of link spam contributed by zero-out link pages

Link spammers are constantly seeking new methods and strategies to deceive the search engine ranking algorithms. The search engines need to come up with new methods and approaches to challenge the link spammers and to maintain the integrity of the ranking algorithms. In this paper, we proposed a methodology to detect link spam contributed by zero-out link or dangling pages. We randomly selected a target page from live web pages, induced link spam according to our proposed methodology, and applied our algorithm to detect the link spam. The detail results from amazon.com pages showed that there was a considerable improvement in their PageRank after the link spam was induced; our proposed method detected the link spam by using eigenvectors and eigenvalues.

___

  • [1] Gy¨ongyi Z, Garcia-Molina H. Link Spam Alliances. In: The 31st International Conference on Very Large Databases (VLDB); 2005; Trondheim, Norway: ACM. pp. 517-528.
  • [2] Henzinger MR, Motwani R, Silverstein C. Challenges in web search engines. Journal of ACM SIGIR 2002; 36: 11-22.
  • [3] Eiron N, McCurley KS, Tomlin AJ. Ranking the Web Frontier. In: The 13th International conference on WWW; 17–22 May 2004; New York, USA: pp. 309-318.
  • [4] Wang X, Tao T, Sun JT, Shakery A, Zhai C. DirichletRank: solving the zero-one-gap problem of PageRank. ACM T Inform Syst 2008; 26: 10.
  • [5] Bianchini M, Gori M, Scarselli F. Inside PageRank. ACM T Internet Techn 2005; 5: 92-128.
  • [6] Brin S, Page L, Motwani R, Winograd T. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-0120. Stanford, CA, USA: Computer Science Department, Stanford University, 1999.
  • [7] Kleinberg J. Authoritative sources in a hyper-linked environment. J ACM 1999; 46: 604-632.
  • [8] Lempel R, Moran S. SALSA: the stochastic approach for link-structure analysis. ACM T Inform Syst 2001; 19: 131-160.
  • [9] Gy¨ongyi Z, Garcia-Molina H. Web spam taxonomy. In: The 1st International Workshop on Adversarial Information Retrieval on the Web; 10–14 May 2005; Chiba, Japan: pp. 39-47.
  • [10] Baeza-Yates R, Castillo C, Lst opez V. PageRank increase under different collusion topologies. The 1st International Workshop on Adversarial Information Retrieval on the Web; 10–14 May 2005; Chiba, Japan: pp. 17-24.
  • [11] Zhang H, Goel A, Govindan R, Mason K, Van Roy B. Making eigenvector-based reputation systems robust to collusion. In: The 3rd Workshop on Web Graphs (WAW). Lecture Notes in Computer Science, Vol. 3243; 2004; Rome, Italy: Springer. pp. 92-104.
  • [12] Gy¨ongyi Z, Berkhin P, Garcia-Molina H. Link spam detection based on mass estimation. The 32nd International Conference on Very Large Data Bases; 12–15 September 2006; Seoul, Korea: ACM. pp. 439-450.
  • [13] Zhou B, Pei J. Link spam target detection using page farms. ACM Transactions on Knowledge Discovery from Data (TKDD) 2009; 3: 13.
  • [14] Nikita S, Jiawei H. Survey on web spam detection: principles and algorithms. ACM SIGKDD Explorations Newsletter 2011; 13: 50-64.
  • [15] Haveliwala TH, Kamvar SD. The Second Eigenvalue of the Google Matrix. Technical Report 2003-20. Stanford, CA, USA: Stanford University, 2003.
  • [16] Ipsen ICF, Selee TM. PageRank computation, with special attention to dangling node. Society for Industrial and Applied Mathematics 2007; 29: 1281-1296.
  • [17] Langville AN, Meyer CD. Deeper Inside PageRank. Internet Mathematics 2003; 1: 335-380.
  • [18] de Jager DV, Bradley JT. PageRank: splitting homogeneous singular linear systems of index one. In: The 2nd International Conference on the Theory of Information Retrieval: Advances in Information Retrieval Theory; 10-12 September 2009; Cambridge, UK. Berlin, Germany: Springer. pp. 17-28.
  • [19] Gleich DF, Gray AP, Greif C, Lau T. An inner-outer iteration for computing PageRank. SIAM J Sci Comput 2010; 32: 349-371.
  • [20] Singh AK, Kumar PR, Goh AKL. Efficient methodologies to handle hanging pages using virtual node. Cybernet Syst 2011; 42: 621-635.
  • [21] Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J. Graph structure in the web. Comput Netw 2000; 33: 309-320.
  • [22] Gao B, Liu TY, Ma Z, Wang T, Li H. A general Markov framework for page importance computation. In: The 18th Conference on Information and Knowledge Management; 2–6 November 2009; Hong Kong, China: ACM. pp.1835-1838.
  • [23] Kumar PR, Goh AKL, Singh AK, Application of Markov chain in the PageRank algorithm. Pertanika Journal of Science and Technology 2013; 21: 541-554.
  • [24] Langville AN, Meyer CD. A survey of eigenvector methods of web information retrieval. SIAM 2005; 47: 135-161.
  • [25] Meyer CD. Matrix Analysis and Applied Linear Algebra. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2000.
  • [26] Boldi P, Vigna S, Santini M. PageRank as the function of the damping factor. In: The 14th International Conference on World Wide Web; 2005; Chiba, Japan: pp. 557-566.
  • [27] Moler C. Experiments with MATLAB. Natick, MA, USA: MathWorks, Inc., 2011.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

A robust Bayesian inference-based channel estimation in power line communication systems contaminated by impulsive noise

Mohammad ASADPOUR, Behzad TAZEHKAND MOZAFFARI, Hadi SEYEDARABI

A new wideband electronically tunable grounded resistor employing only three MOS transistors

Erkan YÜCE, Fırat YÜCEL, Sezai TOKAT

Design of a frequency control system in a microgrid containing HVAC

Mehdi JAFARI, Seyed Masoud MOGHADDAS TAFRESHI, Mohammad KOOCHAKIAN JAZI

Threshold optimization according to the restricted Bayes criterion in decentralized detection problems

Suat BAYRAM, Hakan SOKU

Residential electricity pricing using time-varying and non-time-varying scenarios: an application of game theory

Najmeh FAR KHALEGHI, Hamid AMADEH, Mohammad Hossein KOOCHI REZAEIAN

Wind farm based on DFIG entirely interfaced with 14-node distribution network: power control and voltage regulation

Ounissa AOUCHENNI, Djamel AOZELLAG, Kaci GHEDAMSI, Rabah BABOURI

A wavelet-based feature set for recognizing pulse repetition interval modulation patterns

Kenan GENÇOL, Nuray AT, Ali KARA

An adaptive fuzzy PI controlled bus quantity enhancer for wave energy systems

Emre OZKOP, ismail Hakkı ALTAŞ, Adel Mahmoud SHARAF

A new method for accurate estimation of PV module parameters and extraction of maximum power point under varying environmental conditions

Manimaran SARAVANAN, Mohamed Saleem ABDUL KAREEM

Modeling and control of a 6-control-area interconnected power system to protect the network frequency applying different controllers

Qi HUANG, NgocKhoat NGUYEN, Thi-Mai-Phuong DAO