02.11.2014 Views

untangling_the_web

untangling_the_web

untangling_the_web

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

DOCID: 4046925<br />

UNCLASSIFIED"j;OR OJ;j;ICIAb UGi g~JbY<br />

Beyond <strong>the</strong> use of <strong>the</strong> OR operator in its simple search, Google does not support<br />

. boolean search.<br />

While Google assumes that multiple keywords are a phrase, searchers can<br />

delimit phrases using double-quotes. For example, if I search on:<br />

[<strong>the</strong> last king of france]<br />

without double-quotes, Google will ignore <strong>the</strong> "<strong>the</strong>" and <strong>the</strong> "of' in its search. The<br />

results I get include many irrelevant hits, such as music from a group called ''The<br />

Last King" and an article about Lance Armstrong. However, if I enclose <strong>the</strong> same<br />

query in double-quotes, Google will search on exactly <strong>the</strong> phrase ["<strong>the</strong> last king of<br />

france"], and return a result with <strong>the</strong> name of <strong>the</strong> last king of France. Enclosing<br />

searches in double-quotes is much more effective for finding precise results than<br />

relying on automatic phrase searching.<br />

Google no longer routinely ignores stop words outside double quotes. Each of<br />

<strong>the</strong>se searches will now return different results:<br />

[<strong>the</strong> last king of france] [last king france] ["<strong>the</strong> last king of france"]<br />

Stop words are English words that are so commonplace <strong>the</strong>y are not included in a<br />

search unless <strong>the</strong> searcher forces Google to do so. The stop words Google<br />

recognizes include: a, an, about, and, are, as, at, be, by, com, from, how, I, in, is, it,<br />

of, on, or, that, <strong>the</strong>, this, to, we, what, when, where, which, with, why. There probably<br />

are o<strong>the</strong>rs!<br />

However, Google's handling of stop words is inconsistent. For example, in <strong>the</strong> query<br />

[to be or not to be], Google ignores OR because it may be a logical operator, and it<br />

also appears to ignore TO and BE, only searching for NOT. Therefore, you may<br />

need to force Google to search for a stop word on occasion. There is a Google hack<br />

for forcing Google to search for stop words.<br />

It is unnecessary to use <strong>the</strong> plus sign (+) with any terms except stop words because<br />

by default Google searches for all keywords. However, <strong>the</strong>re are many times when<br />

searchers need to exclude certain terms that are commonly associated with a<br />

keyword but irrelevant to <strong>the</strong>ir search. That's where <strong>the</strong> minus sign (-) comes in.<br />

Using <strong>the</strong> minus sign in front of a keyword ensures that Google excludes that term<br />

from <strong>the</strong> search. For example, <strong>the</strong> results for <strong>the</strong> search ["pearl harbor" -movie] are<br />

very different from <strong>the</strong> results for ["pearl harbor"].<br />

Google's handling of words with diacritical marks such as accents or umlauts is<br />

inconsistent. By default, Google will search for terms matching those with and<br />

without <strong>the</strong> diacritic. As Google's Vanessa Fox explains, "When a searcher enters<br />

a query that includes a word with accented characters, our algorithms consider <strong>web</strong><br />

pages that contain versions of that word both with and without <strong>the</strong> accent. For<br />

52 UNCLASSIFIEDHFOR OFFlOIlItL USE ONLY

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!