16.01.2013 Views

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

Microsoft Sharepoint Products and Technologies Resource Kit eBook

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

610 Part VII: Information Management in SharePoint <strong>Products</strong> <strong>and</strong> <strong>Technologies</strong><br />

You can create as many content sources as you need, but once you get above<br />

a few hundred content sources, the user interface will become unworkable. However,<br />

you can still manage this many <strong>and</strong> more by using the comm<strong>and</strong>-line parameters.<br />

The SharePoint Portal Server object model is the recommended method of<br />

working with content sources in large server farms with hundreds or thous<strong>and</strong>s of<br />

content sources.<br />

In smaller installations, the more content sources you have, the more scheduling<br />

you’ll need to manage because you won’t want all your content sources firing at<br />

the same time to index their respective data. While it is not unusual for organizations<br />

to have hundreds of locations from which they could potentially crawl content, it is<br />

also conceivable that creating several hundred or even thous<strong>and</strong>s of content sources<br />

would present significant resource needs <strong>and</strong> (perhaps) administrative difficulties.<br />

Therefore, you’ll need to balance several competing needs as you work with<br />

SharePoint Portal Server. First, you’ll need to ensure that you crawl only data that<br />

you really need in your index. For example, if you need to crawl 100 documents that<br />

are located in a file share hosting 3,000 documents, crawling all 3,000 documents to<br />

get the data from the 100 documents into your index would be foolish. Your index<br />

would be filled with data that your users wouldn’t want appearing in their result<br />

sets. Indexing needless information only clutters your index <strong>and</strong> the result sets, leading<br />

to a less positive end-user experience when they use the portal site’s search<br />

functionality.<br />

So, what would you do? A best practice is to move these 100 documents into<br />

their own file share <strong>and</strong> crawl that file share individually. Remember the old adage:<br />

garbage in, garbage out. If you fill your index with needless information, the result<br />

sets will not be tight <strong>and</strong> pinpointed toward what the user is really after. Keeping<br />

your indexes clean <strong>and</strong> trim will help lead to a positive end-user experience when<br />

they input a search query. Remember that the goal of creating content sources is to<br />

enable your users to find information quickly <strong>and</strong> easily. A cluttered index leads to<br />

a result set that forces your end users to hunt through the list to find the information<br />

they are after. Such hunting is neither quick nor easy, <strong>and</strong> it’s likely to “turn off” your<br />

end users from wanting to use the search features in the portal site. This is why, later<br />

in this chapter, we’ll spend time looking at ways to craft the result set for your end<br />

users.<br />

Second, you’ll need to ensure that content sources are not created on a whim<br />

or at every request from a user. It is conceivable employees from your sales department<br />

will decide that they’d like to have a host of sale-oriented articles on a website<br />

indexed for their own use. While SharePoint Portal Server can certainly h<strong>and</strong>le this<br />

task, it might not be wise to set up content sources at the request of each user<br />

because you could end up creating many additional content sources that are really<br />

unnecessary. A best practice to guard against this is to create a SharePoint Portal<br />

Server planning team that lives in perpetuity <strong>and</strong> that approves the creation of new<br />

content sources.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!