16.01.2013 Views

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

590 Part VII: Information Management in SharePoint <strong>Products</strong> <strong>and</strong> <strong>Technologies</strong><br />

string can be broken back into individual words. SharePoint Portal Server uses word<br />

breakers provided by Windows 2000 Server Indexing Service <strong>and</strong> also provides word<br />

breakers of its own. The word breakers provided by the <strong>Microsoft</strong> Windows 2000<br />

Server Indexing Service are for documents that are in Dutch, Italian, Swedish, <strong>and</strong><br />

German. SharePoint Portal Server provides word breakers for English, French, Spanish,<br />

Japanese, Thai, Korean, Chinese-Traditional, <strong>and</strong> Chinese-Simplified.<br />

If multiple languages are used in a single document, SharePoint Portal Server<br />

recognizes that multiple word breakers are needed. If no word breaker is available<br />

for a particular language, the neutral word breaker is used. In this case, words are<br />

broken at neutral characters such as spaces <strong>and</strong> punctuation marks.<br />

Stemmers<br />

Many words in a language have several inflections that can be used. In English, even<br />

a simple word such as “get” can also take the form of “getting,” “got,” or “gotten.”<br />

Because of this, a component is needed to convert different variations of a word.<br />

Components that convert all the variations of a word are called stemmers. Stemmers<br />

also affect the formats of numbers, dates, <strong>and</strong> times so that they are h<strong>and</strong>led consistently.<br />

Stemmers are used only in processing queries <strong>and</strong> are not used in the indexing<br />

process.<br />

Noise Word Removal<br />

Some words in a language are not useful for performing searches. Put another way,<br />

these words provide no search value when executed in a search query. For example,<br />

in the English language, words such as “the” <strong>and</strong> “a” are not useful in a search<br />

because they provide no real search value: nearly every document will have these<br />

words, making them indiscriminate in a search query. Words like these are considered<br />

to be noise words. Noise words are different in each language, so, as with word<br />

breakers, SharePoint Portal Server uses several different files to contain noise words.<br />

Each file is a list of words that are removed during the indexing process.<br />

Noise Word File Management<br />

By default, SharePoint Portal Server stores noise word files in the \Program<br />

Files\SharePoint Portal Server\DATA\Config directory of the server. The location of<br />

the DATA directory can be changed during the server installation process.<br />

Another set of the same noise word files is copied to the to local_drive\Program<br />

Files\SharePoint Portal Server\Data\Applications\Application UID\Config directory.<br />

These files can be used to specify noise word files that apply at the application<br />

level instead of at the server or server-farm level. For example, if SharePoint<br />

Portal Server <strong>and</strong> <strong>Microsoft</strong> SQL Server are installed on the same server, each<br />

can have different noise word lists. Table 21-1 lists each language supported<br />

<strong>and</strong> the corresponding noise word file included by default in SharePoint Portal<br />

Server 2003.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!