02.11.2014 Views

untangling_the_web

untangling_the_web

untangling_the_web

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DOClD: 4046925<br />

UNCLASSIFIEDJJFOR OFFlCI,lcL \:ISE ot4L't<br />

Beyond Search Engines-Specialized<br />

Research Tools<br />

Search engines are a good and natural starting place for performing research on <strong>the</strong><br />

Internet, but <strong>the</strong>y represent only a small portion of <strong>the</strong> data available on <strong>the</strong> <strong>web</strong> and<br />

only one way of tapping into that data. It is also important to understand that search<br />

engine spiders do not access (and <strong>the</strong>refore search engines do not index) most<br />

data contained in many databases or <strong>web</strong>sites that require registration or<br />

payment to enter. For example, search engines do not normally index <strong>the</strong> data in<br />

PeopleData.com (a database) or any information beyond <strong>the</strong> first page or so of <strong>the</strong><br />

Chicago Tribune (which requires registration) . These types of sites require users to<br />

access <strong>the</strong>m directly. The information at <strong>the</strong>se sites is part of <strong>the</strong> invisible, hidden, or<br />

deep <strong>web</strong>.<br />

The types of sites and information that are not generally accessible to search<br />

engines include:<br />

~ information in databases: phone and email directories, Whois registration &<br />

DNS data, dictionaries , encyclopedia articles, statistics , legal and medical<br />

data, financial information .<br />

~ rapidly changing information: news, airline flight information , stock, bond,<br />

currency market data, auctions.<br />

~ for-fee and subscription services.<br />

~ information behind a firewall (corporate , government, educational).<br />

To give you a better idea just how vast <strong>the</strong> deep <strong>web</strong> is, consider <strong>the</strong>se points from<br />

"The Deep Web: Surfacing Hidden Value,,112 by Michael K. Bergman .<br />

112 Michael K. Bergman , "The Deep Web: Surfacing Hidden Value," BrightPlanet, August 2001,<br />

(14 November 2006).<br />

306 UNCLASSIFIEDNFOR OFFlCI,lcL \:ISE ONLY

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!