02.11.2014 Views

untangling_the_web

untangling_the_web

untangling_the_web

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DOClD: 4046925<br />

UNCLASSIFIEDh'FOR OFFIOIAL USE ONLY<br />

as a true visionary: "Imagine being able to analyze <strong>the</strong> changes to <strong>the</strong> English<br />

language over time. Imagine being able to use <strong>the</strong> hand translated versions of past<br />

books as a way to train automatic translation technologies so we can more<br />

effectively translate any book into any language. Imagine being able to analyze <strong>the</strong><br />

interrelation of papers through <strong>the</strong>ir footnotes and links to find new patterns of<br />

thought. Each of <strong>the</strong>se projects is already proceeding using <strong>the</strong> digital holdings of<br />

<strong>the</strong> Internet Archive by researchers.t'?" You have to love this guy.<br />

Microsoft and Yahoo both threw in with Kahle and <strong>the</strong> Open Content Alliance (OCA)<br />

during 2005, Microsoft in advance of its new Live Book Search. This occurred as<br />

Google was embroiled in not one but two lawsuits to stop its book digitization<br />

project. The OCA has thus far avoided any such suits because it is only indexing<br />

books and o<strong>the</strong>r content in <strong>the</strong> public domain. But Microsoft has made it known it will<br />

not be content to stick with public domain content, which will put Microsoft on <strong>the</strong><br />

horns of <strong>the</strong> same dilemma as Google. It will be interesting to see how OCA and its<br />

members handle copyright and o<strong>the</strong>r infringement issues.<br />

Open Content Alliance<br />

http://www.opencontentalliance.org/<br />

While some of its members may view <strong>the</strong> OCA project as a way to take on Google ,<br />

Kahle is not at all unhappy about competition from o<strong>the</strong>r digitization projects. Quite<br />

<strong>the</strong> contrary, he sees his efforts as augmenting more commercial ventures while he<br />

openly seeks to emulate in <strong>the</strong> public domain Amazon's approach to full-text search.<br />

Any way you look at it, this is great news. Sometimes with all <strong>the</strong> petty annoyances<br />

in our everyday lives, it is hard to remember we really are witnessing and even<br />

participating in a revolution in human knowledge .<br />

And just how does Kahle envision storing all <strong>the</strong>se treasures? He worked with<br />

Capricorn Technologies to design what is called <strong>the</strong> PetaBox, basically a very large,<br />

affordable data repository that can store a million gigabytes of data. Capricorn<br />

shipped <strong>the</strong> first of its PetaBox products to <strong>the</strong> Internet Archive in June 2005.<br />

All that data is accessible to users in a variety of ways, none more interesting or<br />

useful than <strong>the</strong> Wayback Machine. Using <strong>the</strong> Wayback Machine, you may very well<br />

be able to retrieve a page or an entire site even if it disappeared from <strong>the</strong> <strong>web</strong> years<br />

ago. Also keep in mind that Yahoo also offers an excellent way to search <strong>the</strong><br />

Internet Archive to its fullest.<br />

105 Kahle.<br />

UNCLASSIFIEDllj;eR eFFlOljlcL USE ONLY 269

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!