16.01.2013 Views

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 21: The Architecture of the Gatherer 593<br />

Your users will need to be educated about this process because it can be a bit<br />

confusing. For example, let’s assume one of your users creates a search query to<br />

look for the word “SharePoint” on the MSDN (<strong>Microsoft</strong> Solutions Developer Network)<br />

library site. After executing the search, your user selects the Alert Me feature<br />

<strong>and</strong> then selects To Be Notified Immediately.<br />

Now, your user believes that he will be notified as soon as there is a change in<br />

content in the online MSDN library concerning the word “SharePoint.” But this is not<br />

the case. The gatherer does not place hooks in the content source so that the user<br />

is automatically notified on the fly. Instead, the user’s configuration really results in<br />

the user being notified within 5 minutes after the PQS plug-in becomes aware that<br />

there has been a change in the MSDN Library that matches the user’s alert rule.<br />

If you are crawling this site on a nightly basis, the user will not be notified until<br />

after the next crawl.<br />

Now, refer to the “Frequency of Updates” section later in this chapter. You’ll<br />

notice that an incremental crawl is executed on all portal content every 10 minutes<br />

by default. So, every 10 minutes, new or modified portal content is sent through the<br />

PQS plug-in, thereby delivering faster notifications to your users than content that is<br />

crawled only once per day. Therefore, if you crawl content incrementally on a onceper-day<br />

schedule, which is quite common for many content sources, the effect is<br />

that the user is really notified on a daily basis, not an immediate basis, as the user<br />

interface (UI) would indicate.<br />

Therefore, a best practice is as follows:<br />

■ If you need fast notifications on content in the portal site, do nothing, as this is<br />

accomplished by default.<br />

■ If you need fast notifications on content outside the portal site, you’ll need to<br />

match your crawling schedule with the notification schedule.<br />

■ Determine the crawling schedule for your content sources by first determining<br />

the frequency at which users need to be notified of new content in that content<br />

source.<br />

AutoCat Plug-In<br />

The Auto Categorization (AutoCat) plug-in is the component that makes the Topic<br />

Assistant feature possible. This feature is used to automatically categorize content<br />

external to the portal site <strong>and</strong> assign it to the Topic area in accordance with how you<br />

have trained the Assistant.<br />

Building the Catalogs<br />

Once the data stream has passed through the plug-ins, the data is then placed in one<br />

of two areas. First, the metadata for the documents, including the exact URLs to each<br />

document, are placed in the SPS.EDB file. The actual content of the documents is<br />

first written to word lists in RAM, very quickly written to shadow indexes on disk,<br />

<strong>and</strong> then merged into a Master Index on a nightly basis.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!