13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

IADIS International Conference <strong>WWW</strong>/<strong>Internet</strong> 2010NEED FOR PROTOCOLS AND STANDARDS FOR ADECENTRALIZED WEB SEARCH PARADIGMSergio R. CoriaUniversity of the Sierra SurCalle Guillermo Rojas Mijangos S/N, Col. Ciudad Universitaria70800 Miahuatlan de P. Diaz, Oax., MexicoABSTRACTThis paper proposes to create a new paradigm on services for information search on the web, involving new protocols andstandards. For more than one decade, the current paradigm has consisted in services that have been provided by a smallnumber of private companies on an advertising business scheme, mainly. This concentration involves a number of risksand disadvantages to <strong>WWW</strong> users. One of the risks is the service vulnerability because of the large dependence on theinfrastructure of one or two providers. Scalability might become into a concern because of the huge increasing ofinformation on the <strong>WWW</strong>. The current paradigm also involves economical and political disadvantages because theproviders decide which websites are allowed to be listed or not as well as their rankings; this can bias the search results.Therefore, this paper aims to suggest general lines for research and development for a new, global, non-for-profit,paradigm for crawling, indexing and searching information on the <strong>WWW</strong>.KEYWORDSWeb search, web crawling, web indexing, protocols, standards.1. INTRODUCTIONSince the 1990 decade, service for information search on the World Wide Web (SISW) has been provided bya small number of private companies on an advertising business basis. A general explanation on how SISWwork can be found in (Manning et al., 2008), and Brin and Page (1998) present a specific description ofGoogle, the most influential SISW at the present time. A historical overview of infrastructure and algorithmsfor information search on <strong>WWW</strong> is presented by Evans et al. (2005).The current paradigm presents a series of technological, economical and political concerns to cyberspaceusers. Concentration in a small number of providers involves, among others, the risk that the service is morevulnerable because of the large dependence on their infrastructure only. Information increasing on the <strong>WWW</strong>can impact on infrastructure scalability. A non-technological issue is the bias in both search results andrankings that can be introduced by the service providers. Spink et al. (2006) show how search results differamong the major providers, suggesting that bias can be involved.Centralization of web contents indexes and focus on profit orientation seem to be two importantweaknesses of the current paradigm. Thus, index decentralization and a non-for-profit approach are desirablecharacteristics of a future, improved, paradigm. Index decentralization needs to be supported by a largenumber of servers; therefore, new protocols and standards to communicate multiple locations can be useful.A non-for-profit approach needs publicly accessible hardware infrastructure to reduce index hosting costs.2. PROTOCOLS AND STANDARDSWeb search depends on creating and updating indexes of the <strong>WWW</strong> contents. This is a highly complex taskthat needs efficient algorithms as well as infrastructure with high storage and speed capabilities. Brin andPage (1998) state that distributed indexing systems, e.g. GlOSS (Gravano, 1994), can be the most efficientand elegant technical solution for indexing. Furthermore, they consider that distributed indexing would373

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!