16.01.2013 Views

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 22: Managing External Content in <strong>Microsoft</strong> Office SharePoint Portal Server 2003 611<br />

Third, you’ll need to ensure that you create enough content sources to pull in<br />

the information your users need to perform their jobs <strong>and</strong> collaborate effectively,<br />

but not so many that you can’t manage them. Content source management can<br />

become a real issue in information-intensive environments. Hardware resources can<br />

be unnecessarily taxed, <strong>and</strong> without proper planning you can actually degrade the<br />

performance of your indexing server when you try to crawl a content source.<br />

For example, one procedure that occurs every night on nearly every server in<br />

most environments is the backup procedure. There are others, such as antivirus<br />

scanning <strong>and</strong> disk defragmenting. But for purposes of our conversation here, we’ll<br />

focus on the backup procedure in our example. Because the crawling function is<br />

highly processor- <strong>and</strong> RAM-intensive on both the crawling server <strong>and</strong> the content<br />

source server, a best practice is to schedule your crawl schedules around the other<br />

servers’ activities.<br />

So, if you have 30 content sources that you want to crawl, a best practice is to<br />

ascertain (as best you can) the regular routines that are run on that server <strong>and</strong> when<br />

they are run, <strong>and</strong> then schedule the index update builds during times when that<br />

server is not taxed by routines or client dem<strong>and</strong>. Doing this can be a tall order, but<br />

if those servers are in different time zones, having this information might widen the<br />

crawling window for you. Therefore, you’ll need to think through what content<br />

really needs to be crawled <strong>and</strong> indexed versus how many content sources you’ll<br />

have, when the sources can be crawled, <strong>and</strong> how you’ll do this without creating bottlenecks<br />

on either server during peak crawling periods.<br />

Default Content Sources<br />

SharePoint Portal Server 2003 installs with several default content sources, as follows:<br />

■ Portal Site Content (This Portal). An incremental update of portal site content<br />

is conducted every 10 minutes in the background, <strong>and</strong> an incremental<br />

(inclusive) update is performed each night. Portal site content includes content<br />

hosted in areas plus linked content via portal listings.<br />

■ People Content (People). An incremental update of people content occurs<br />

every 60 minutes in the background. This content source crawls people profiles<br />

<strong>and</strong> personal sites <strong>and</strong> includes both public <strong>and</strong> private documents. However,<br />

remember that permissions are applied to result sets before they are<br />

presented to the user so that the user will see only documents for which the<br />

user has permissions.<br />

■ Non–Portal Site Content (Site Directory). An incremental update of non–<br />

portal site content is conducted every night to ensure that all non–portal site<br />

content is included in the index. Non–portal site content includes all site collections<br />

created in the Sites Directory, but it can include other content sources<br />

as configured by the portal administrator or administrators.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!