02.11.2014 Views

untangling_the_web

untangling_the_web

untangling_the_web

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DOClD: 4046925<br />

UNCLASSIFIEDHFOR OFFICIAL I:JSE m~LY<br />

which of <strong>the</strong> search engines is capable of searching for that particular type of file.<br />

Not every search engine on <strong>the</strong> list searches for every file type.<br />

Also, keep in mind that <strong>the</strong> Fagan Finder file type search for XML is less precise<br />

than going directly to Google or Yahoo and searching by filetype : in Google and by<br />

originurlextension: in Yahoo. If you use one of <strong>the</strong>se search engines, you can<br />

specify that you only want to search for, say, those files that are .rss by entering <strong>the</strong><br />

query [filetype:rss] or [originurlextension:rss). These queries will return only those<br />

documents in RSS format, not those in XML or RDF. So I recommend using <strong>the</strong><br />

Fagan Finder search by file type for files types o<strong>the</strong>r than XML, RSS, or RDF..<br />

W¥r:IAimMwm jjtji·n,ifil •<br />

Sponsors: Search Engine Q mimi~fQn RllY I e'd Li~ lJJ1'i..~Q!Q.!:Y.<br />

File FOlilld t:<br />

r. ~ Ad Obe Portable Document Format<br />

r gjA dooe PostScnpt<br />

c ~ Microsoft Excel<br />

r @) Mieros Oft PowerPoint<br />

r ~ M i c ro s O fl Word<br />

(' !:lMlcroso fl Works<br />

c B Mlcrosofl write<br />

(' ®Rich Text Format<br />

r ~ Cor el worcs errect<br />

r ~ Lotus 1-2-3<br />

r G'Lotus WordPro<br />

r ~ Star Office<br />

(' ~ Ma eWr~ e<br />

(' ~ Macromedla Flash<br />

c @)Text<br />

r ""XML<br />

' ~Au tog~Q<br />

Search Engine<br />

\0 Google into<br />

r Yahoo!<br />

r Gigabla st<br />

(' Teoma<br />

, Exalead<br />

' · Seirus<br />

(' Sensts<br />

About t his Tool<br />

•<br />

This tool uses enables ea sy access to searching for various non -IITML (standard <strong>web</strong> page) file formats . Certain documents are commorJy used for different<br />

purposes ; for example many academic papers are in Ad obe Portable Do cument F ormat. Because this tool makes use of o<strong>the</strong>r tools . it is limited by <strong>the</strong>ir functionality.<br />

Se arching in Googe for XML files. for example, uses <strong>the</strong> file extensions xml. rdf and r55; which mean s that not all XML files are included. and some ncn -Xbdl. files<br />

may be included.<br />

File Viewi n g<br />

Different file formats require differe.nt software to view those files, Adobe P ortable Document F ormat, for instance , requires _l1~ d ob e Reader<br />

s ctrus ~nd s ensrs<br />

Scirus is a search engine for scientific information; it includes Adobe Portab le Docum ent Format files in addition 10 standard <strong>web</strong> pages. It is powered by f--,,~<br />

Se arch & Transfer. <strong>the</strong> former owner of <strong>the</strong> Allrht We b search engine. Semis is a search engine which has both "world " and ..AuslTaIi..· options. Res tricting by file<br />

fonn at is not perfect yet, as some results return ed may not b e of <strong>the</strong> requ ested type.<br />

Fagan Finder Search by File Type<br />

http://www.faganfinder.com/filetype/<br />

URLinfo<br />

http://www.faganfinder.com/urlinfo/<br />

The indefatigable Michael Fagan also introduced a beta version of a new tool,<br />

URLinfo, in mid-2004. URLinfo fills a void created when Ali<strong>the</strong>Web effectively shut<br />

down and took with it <strong>the</strong> useful "uri investigator." While Yahoo now offers Site<br />

Explorer and Google a lame version [info:domain.com), Fagan's URLinfo provides<br />

many more options for exploring a site. As with everything he does, Fagan has gone<br />

all out with URLinfo, almost to <strong>the</strong> point of providing too many options! However, he<br />

has done a smart thing in keeping <strong>the</strong> main URLinfo page simple, "hiding" <strong>the</strong> nearly<br />

85 investigative tools in his toolkit behind a variety of tabs. I think URLinfo is<br />

important and valuable enough to spend time looking at most of <strong>the</strong> options in some<br />

detail.<br />

194 UNCLASSIFIEDNFOR OFFICIAL I:JS~ QNb¥

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!