27.01.2014 Views

NIST 800-44 Version 2 Guidelines on Securing Public Web Servers

NIST 800-44 Version 2 Guidelines on Securing Public Web Servers

NIST 800-44 Version 2 Guidelines on Securing Public Web Servers

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

GUIDELINES ON SECURING PUBLIC WEB SERVERS<br />

EmailSiph<strong>on</strong> and Cherry Picker are bots specifically designed to crawl <strong>Web</strong> sites for electr<strong>on</strong>ic mail<br />

(e-mail) addresses to add to spam mailing lists. These are comm<strong>on</strong> examples of bots that may have a<br />

negative impact <strong>on</strong> a <strong>Web</strong> site or its users.<br />

Many spambots crawl <strong>Web</strong> sites for login forms to create free e-mail addresses from which to send<br />

spam or to spam blogs, guestbooks, wikis, and forums to boost the search engine rankings of a<br />

particular <strong>Web</strong> site.<br />

Screen scrapers retrieve c<strong>on</strong>tent from <strong>Web</strong> sites to put up a copy <strong>on</strong> another server. These copies can<br />

be used for phishing or for attempting to generate ad revenue by having users visit the copy.<br />

Some malicious bots crawl <strong>Web</strong> sites looking for vulnerable applicati<strong>on</strong>s c<strong>on</strong>taining sensitive data<br />

(e.g., Social Security Numbers [SSN], credit card data).<br />

Bots can present a challenge to <strong>Web</strong>masters’ administrati<strong>on</strong> of their servers because—<br />

<strong>Web</strong> servers often c<strong>on</strong>tain directories that do not need to be indexed.<br />

Organizati<strong>on</strong>s might not want part of their site appearing in search engines.<br />

<strong>Web</strong> servers often c<strong>on</strong>tain temporary pages that should not be indexed.<br />

Organizati<strong>on</strong>s operating the <strong>Web</strong> server are paying for bandwidth and want to exclude robots and<br />

spiders that do not benefit their goals.<br />

Bots are not always well written or well intenti<strong>on</strong>ed and can hit a <strong>Web</strong> site with extremely rapid<br />

requests, causing a reducti<strong>on</strong> in resp<strong>on</strong>siveness or outright DoS for legitimate users.<br />

Bots may uncover informati<strong>on</strong> that the <strong>Web</strong>master would prefer remained secret or at least<br />

unadvertised (e.g., e-mail addresses).<br />

Fortunately, <strong>Web</strong> administrators or the <strong>Web</strong>master can influence the behavior of most bots <strong>on</strong> their <strong>Web</strong><br />

site. A series of agreements called the Robots Exclusi<strong>on</strong> Protocol (REP) has been created. Although<br />

REP is not an official Internet standard, it is supported by most well-written and well-intenti<strong>on</strong>ed bots,<br />

including those used by most major search engines.<br />

<strong>Web</strong> administrators who wish to limit bots’ acti<strong>on</strong>s <strong>on</strong> their <strong>Web</strong> server need to create a plain text file<br />

named “robots.txt.” The file must always have this name, and it must reside in the <strong>Web</strong> server’s root<br />

document directory. In additi<strong>on</strong>, <strong>on</strong>ly <strong>on</strong>e file is allowed per <strong>Web</strong> site. Note that the robots.txt file is a<br />

standard that is voluntarily supported by bot programmers, so malicious bots (such as EmailSiph<strong>on</strong> and<br />

Cherry Picker) often ignore this file. 28<br />

The robots.txt file is a simple text file that c<strong>on</strong>tains some keywords and file specificati<strong>on</strong>s. Each line of<br />

the file is either blank or c<strong>on</strong>sists of a single keyword and its related informati<strong>on</strong>. The keywords are used<br />

to tell robots which porti<strong>on</strong>s of a <strong>Web</strong> site are excluded.<br />

28<br />

Other methods for c<strong>on</strong>trolling malicious bots exist; however, they are changing c<strong>on</strong>stantly as the malicious bot operators and<br />

<strong>Web</strong> administrators develop new methods of counteracting each other’s techniques. Given the c<strong>on</strong>stantly changing nature<br />

of this area, discussi<strong>on</strong> of these techniques is bey<strong>on</strong>d the scope of this document. More informati<strong>on</strong> is available at<br />

http://www.<strong>on</strong>guard<strong>on</strong>line.gov/spam.html.<br />

5-7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!