Views
3 years ago

Removing Web Spam Links from Search Engine ... - NEU SECLAB

Removing Web Spam Links from Search Engine ... - NEU SECLAB

cutoff value, the false

cutoff value, the false positives could be lowered to zero. At this rate, almost one out of five spam pages can be detected, improving the results of search engines without removing any valid results. Acknowledgments This work has been supported by the Austrian Science Foundation (FWF) under grant P18764, SECoverer FIT-IT Trust in IT-Systems 2. Call, Austria, Secure Business Austria (SBA), and the WOM- BAT and FORWARD projects funded by the European Commission in the 7th Framework. References [1] A. Bifet, C. Castillo, P.-A. Chirita, and I. Weber. An Analysis of Factors Used in Search Engine Ranking. In Adversarial Information Retrieval on the Web, 2005. [2] S. Brin and L. Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In 7th International World Wide Web Conference (WWW), 1998. [3] F. Cacheda and Á. Viña. Experiencies retrieving information in the world wide web. In Proceedings of the Sixth IEEE Symposium on Computers and Communications (ISCC 2001), pages 72–79, 2001. [4] K. Chellapilla and D. Chickering. Improving Cloaking Detection Using Search Query Popularity and Monetizability. In Adversarial Information Retrieval on the Web, 2006. [5] M. P. Evans. Analysing Google rankings through search engine optimization data. Internet Research Vol. 17 No. 1, 2007. [6] D. Fetterly, M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam web pages. In WebDB, pages 1–6, 2004. [7] Google. Zeitgeist: Search patterns, trends, and surprises. http://www.google.com/press/ zeitgeist.html. [8] Google Keeps Tweaking Its Search Engine. http://www.nytimes.com/2007/06/03/ business/yourmoney/03google.html?pagewanted=4&_r=1. [9] Z. Gyöngyi and H. Garcia-Molina. Web Spam Taxonomy. In Adversarial Information Retrieval on the Web, 2005. [10] C. Karlberger, G. Bayler, C. Kruegel, and E. Kirda. Exploiting Redundancy in Natural Language to Penetrate Bayesian Spam Filters. In First USENIX Workshop on Offensive Technologies (WOOT07), 2007. [11] Y. Niu, Y.-M. Wang, H. Chen, M. Ma, , and F. Hsu. A quantitative study of forum spamming using context-based analysis. In NDSS, 2007. [12] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly. Detecting Spam Web Pages through Content Analysis. In 15th International World Wide Web Conference (WWW), 2006. [13] N. Provos, P. Mavrommatis, M. A. Rajab, and F. Monrose. All your iframes point to us. In 17th USENIX Security Symposium, 2008. [14] N. Provos, D. McNamee, P. Mavrommatis, K. Wang, and N. Modadugu. The Ghost In The Browser Analysis of Web-based Malware. In First Workshop on Hot Topics in Understanding Botnets (HotBots ’07), 2007. [15] R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. [16] Rahul Mohandas (McAfee Avert Labs). Analysis of Adversarial Code: The role of Malware Kits! http://clubhack.com/2007/files/Rahul-Analysis_of_Adversarial_ Code.pdf, December 2007. Last accessed, December 2008. [17] Google Search Engine Ranking Factors. http://www.seomoz.org/article/ search-ranking-factors. [18] K. Svore, Q. Wu, C. Burges, and A. Raman. Improving Web Spam Classification using Rank-time Features. In Adversarial Information Retrieval on the Web, 2007. [19] Y.-M. Wang, M. Ma, Y. Niu, and H. Chen. Spam Double-Funnel: Connecting Web Spammers with Advertisers. In 16th International Conference on World Wide Web, 2007. 12

[20] I. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2nd edition edition, 2005. [21] B. Wu and B. Davison. Cloaking and Redirection: A Preliminary Study. In Adversarial Information Retrieval on the Web, 2005. [22] B. Wu and B. D. Davison. Identifying Link Farm Spam Pages. In 14th International World Wide Web Conference (WWW), 2005. Appendix A: J48 Decision Tree tfreq 0 filepath domainname 0 1 True (27.0/1.0) inlink_yahoo inlink_yahoo True (15.0/3.0) 2 3 False (7.0/1.0) True (9.0/1.0) False (187.0/29.0) inlink_yahoo 5 pagerank_site False (4.0) -1 True (6.0) domainname 2 0 tfreq True (6.0/1.0) False (20.0/5.0) inlink_yahoo 4 True (8.0/1.0) False (6.0/2.0) Figure 2: Generated J48 decision tree. The node labels correspond to the feature extractors listed in Section 4.1 13

Link Analysis Web Search Engines - College of Charleston
Link Analysis and Anti Web Spam
Link Analysis Web Search Engines - Carl Meyer - North Carolina ...
Search Engine Marketing Applications - Dowitcher Designs
Search engine marketing strategy - Simply Clicks
Academic Search Engine Optimization (ASEO ... - Jöran Beel
A Guide to Building Links Like a PRO!
Link-Based Similarity Search to Fight Web Spam - AIRWeb - Lehigh ...
Academic search engine spam and Google Scholar's ... - Docear
Site Level Noise Removal for Search Engines - Conferences
Searching the web - Computer Engineering
A Spam Classifier for Biology: Removing Noise from Small ... - CS 229
Object Search: Supporting Structured Queries in Web Search Engines
optimising websites for higher search engine positioning
Small Business Search Engine Marketing - The Business Link
Design and implementation of ICS Web Search Engine
ascertaining the relevance model of a web search-engine - CS 229
Fighting against Web Spam: A Novel Propagation Method based on ...
An Efficient Link Building Strategies for Search Engine ... - IJCST
Deriving Query Intents from Web Search Engine Queries
Search Engine Optimization and Web 2.0
Overlap among major web search engines - Emerald
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Impact of Query Operators on Web Search Engine Results : An ...
Improving Web Spam Classification using Rank-time Features
A ovel Meta-Search Engine for the Web - UCL Department of ...
Advanced Search Techniques & Alternative Search Engines, From ...
Search Engine Optimization
SemSearch: A Search Engine for the Semantic Web - CiteSeerX