13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ISBN: 978-972-8939-25-0 © 2010 IADIS6. FEATURES IN WEB SEARCH ENGINESSince a series of indexes, hit lists and ranking tables can be available at IS (and also at WS), a SE <strong>do</strong>es notneed to create them from scratch. Instead, SE can take advantage of them. Therefore, SE should:i. Store and create data structures at level 3 that subsume data from level 2 and, maybe exceptionally,from level 1.ii. Update these data structures.iii. Minimize data transfer among different levels.Crawling in the new paradigm is significantly different from that in the current. Even more, crawlingcould be unnecessary. The main reason is that levels 1 and 2 are responsible for creating and updating theirindexes, hit lists and ranking tables on a bottom-up basis. Therefore, SE <strong>do</strong>es not need to access WS directly.Rather, crawling is replaced by a new, different, process in which SE uses data structures of IS at level 2.If standards for level 1 are established, other alternate architectures can be implemented for the newparadigm using other different data structures and features at levels 2 and 3. Regarding WB, new featurescould be incorporated, so that WB can perform a smarter role in information search and retrieval.Nevertheless, the increasing in network traffic that could be produced by smarter WB is a constraint to betaken into account.7. CONCLUSIONS AND FUTURE RESEARCHThis paper has addressed the need for standards to crawl, index and rank web contents in order to provideweb server software with local indexing features that support a global, distributed, indexing scheme. Thesestandards and protocols can become guidelines to create free software applications so that every web servercan automatically create and manage local indexes. If these functionalities become inherent to <strong>WWW</strong>protocols and servers, dependence on centralized search services can be reduced. Future research shouldaddress an evaluation of the best algorithms and data representations for these purposes in order to proposepreliminary standards.REFERENCESBaeza-Yates, R. et al., 2009. On the Feasibility of Multi-site Web Search Engines. Proceedings of ACM 18th Conferenceon Information and Knowledge Management (CIKM 09). Hong Kong, China, pp. 425-434.Brin, S. and Page, L. 1998. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Proceedings of the SeventhInternational World-Wide Web Conference (<strong>WWW</strong> 1998). Brisbane, Australia.Evans, M.P. et al., 2005. Search Adaptations and the Challenges of the Web. IEEE <strong>Internet</strong> Computing, Vol. 9, Issue 3,pp. 19-26.Gravano et al. 1994. The Effectiveness of GlOSS for the Text-Database Discovery Problem. Proceedings of the 1994ACM SIGMOD International Conference On Management Of Data. Minneapolis, USA, pp. 126-137.Manning, C. D. et al. (2008). Introduction to Information Retrieval. Cambridge University Press, New York, USA.Melnik, S. et al. 2001. Building a Distributed Full-Text Index for the Web. Building a Distributed Full-Text Index for theWeb. ACM Transactions on Information Systems, Vol. 19, No. 3, July 2001, Pages 217–241.Spink, A. et al. 2006. Overlap among Major Web Search Engines. Proceedings of the Third International Conference onInformation Technology: New Generations (ITNG'06). Las Vegas, USA. pp. 370-374.376

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!