02.11.2014 Views

untangling_the_web

untangling_the_web

untangling_the_web

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DOClD: 4046925<br />

UNCLASSIFIEDf1Fe~eFFlelAL USE eNLY<br />

Popdex<br />

Backlinks from blogs (as well as <strong>the</strong> date of linkage) known to <strong>the</strong> Popdex blog<br />

indexer.<br />

The Similar tab is not entirely self-explanatory. Alexa, UCmore, Furl, and Google all<br />

try to show related or similar <strong>web</strong>sites, though not in <strong>the</strong> same way. Alexa shows<br />

'people who visit this page also visit...'; UCmore clusters related pages by topic;<br />

Furl is a collaborative bookmarking tool, so it only shows pages bookrnarked by <strong>the</strong><br />

same person (of dubious use); and Google's related pages is, in Fagan's and my<br />

opinion, of poor quality. Google News will show related news articles, but only if <strong>the</strong><br />

original article has been indexed by Google News. The Waypath tool looks for blog<br />

entries about a <strong>web</strong>site, and Waypath is showing no links to http://www.google.com<br />

and two hits on http://www.microsoft.com. There is obviously a problem with this<br />

specific search.<br />

The Cache tab is much more useful at this time. Fagan has done us all <strong>the</strong> great<br />

service of bringing <strong>the</strong> search tools that cache <strong>web</strong>pages toge<strong>the</strong>r so <strong>the</strong>y can be<br />

searched from one convenient interface. Also, URLinfo makes it possible to see<br />

Google's cached pages without images, style sheets, or forms with Google<br />

(plain). Openfind is an Asian search engine and does not yet have an English<br />

version. I was unable to figure out how <strong>the</strong>ir caching works because of <strong>the</strong> language<br />

barrier. For news and blogs Daypop caches each page it crawls. "Its cache is often<br />

<strong>the</strong> most up-to-date copy of <strong>the</strong> page, and it shows <strong>the</strong> exact time that <strong>the</strong> copy was<br />

made."<br />

Here's <strong>the</strong> low-down on <strong>the</strong> o<strong>the</strong>r general cache tools at Fagan Finder:<br />

Internet Archive<br />

The Internet Archive has been crawling <strong>the</strong> <strong>web</strong> and caching pages since 1996.<br />

The Wayback Machine allows you to view <strong>the</strong> copies made during any of those<br />

crawls, and also to compare any two versions of <strong>the</strong> same page.<br />

Google<br />

When Google crawls <strong>the</strong> <strong>web</strong>, it stores a copy of each <strong>web</strong> page. This is <strong>the</strong><br />

most recent copy. This can also be used as a means of viewing some non-HTML<br />

files converted to HTML.<br />

Google (plain)<br />

Google's stripped cache, with images, styles (style sheets), and forms removed.<br />

Gigablast<br />

Gigablast does not provide direct access to its cache. You must follow <strong>the</strong> link<br />

labeled [archived copy]. Gigablast's cache shows <strong>the</strong> date on which <strong>the</strong> copy<br />

was made.<br />

200 UNCLASSIFIEDHFOR OFFICIAL l:I8~ 9Nb¥

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!