02.11.2014 Views

untangling_the_web

untangling_the_web

untangling_the_web

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DID: 4046925<br />

UNCLASSIFIEDNFOR OFFlel)!cL 1:I5E ONLY<br />

5. Learn <strong>the</strong> search syntax of <strong>the</strong> search engines you use (never assume).<br />

Most search engines use double quotes ("") to enclose a phrase and <strong>the</strong> plus<br />

+ and minus - keys to indicate "must include" and "must exclude" respectively.<br />

But <strong>the</strong>se are by no means universal rules (especially when using<br />

international or metasearch engines).<br />

6. The default operator for all major US search engines is now AND. As of<br />

February .2002, no major search engine used OR as its default operator.<br />

However, most search engines will let you use an OR in <strong>the</strong> simple search<br />

box: Yahoo and Google permit OR searches in <strong>the</strong> simple search box, but<br />

you must capitalize <strong>the</strong> OR.<br />

7. Keep in mind that because HTML does not have a "date" tag, "date" can<br />

mean many things: creation date; <strong>the</strong> last modified date for <strong>the</strong> page; or <strong>the</strong><br />

date search engine found <strong>the</strong> page. I do not recommend searching by date<br />

except when using <strong>web</strong>log, news, or newsgroup search engines.<br />

Understanding statistical interfaces is important, especially for<br />

researchers used to boolean and o<strong>the</strong>r non-statistical query languages.<br />

Most search engines use statistical interfaces. The search engine assigns<br />

relative weights to each search term, depending on:<br />

~ its rarity in <strong>the</strong>ir database<br />

~ how frequently <strong>the</strong> term occurs on <strong>the</strong> <strong>web</strong>page<br />

~ whe<strong>the</strong>r or not <strong>the</strong> term appears in <strong>the</strong> uri<br />

~ how close to <strong>the</strong> top of <strong>the</strong> page <strong>the</strong> term appears<br />

~ (sometimes) whe<strong>the</strong>r or not <strong>the</strong> term appears in <strong>the</strong> metatags.<br />

When you query <strong>the</strong> database, <strong>the</strong> search engine adds up all <strong>the</strong> weights<br />

that match your query terms and returns <strong>the</strong> documents with <strong>the</strong> highest<br />

weight first. Each search engine has its own algorithm for assigning<br />

weights, and <strong>the</strong>y tweak <strong>the</strong>se frequently. In general, rare, unusual terms<br />

are easier to find than common ones because of <strong>the</strong> weighting system.<br />

However, remember that "popularity" measured by various means often<br />

trumps any statistical interface.<br />

UNCLASSIFIEDffFOR OFFlel)!cL 1:I5EONLY 27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!