NIST 800-44 Version 2 Guidelines on Securing Public Web Servers
NIST 800-44 Version 2 Guidelines on Securing Public Web Servers
NIST 800-44 Version 2 Guidelines on Securing Public Web Servers
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
GUIDELINES ON SECURING PUBLIC WEB SERVERS<br />
EmailSiph<strong>on</strong> and Cherry Picker are bots specifically designed to crawl <strong>Web</strong> sites for electr<strong>on</strong>ic mail<br />
(e-mail) addresses to add to spam mailing lists. These are comm<strong>on</strong> examples of bots that may have a<br />
negative impact <strong>on</strong> a <strong>Web</strong> site or its users.<br />
Many spambots crawl <strong>Web</strong> sites for login forms to create free e-mail addresses from which to send<br />
spam or to spam blogs, guestbooks, wikis, and forums to boost the search engine rankings of a<br />
particular <strong>Web</strong> site.<br />
Screen scrapers retrieve c<strong>on</strong>tent from <strong>Web</strong> sites to put up a copy <strong>on</strong> another server. These copies can<br />
be used for phishing or for attempting to generate ad revenue by having users visit the copy.<br />
Some malicious bots crawl <strong>Web</strong> sites looking for vulnerable applicati<strong>on</strong>s c<strong>on</strong>taining sensitive data<br />
(e.g., Social Security Numbers [SSN], credit card data).<br />
Bots can present a challenge to <strong>Web</strong>masters’ administrati<strong>on</strong> of their servers because—<br />
<strong>Web</strong> servers often c<strong>on</strong>tain directories that do not need to be indexed.<br />
Organizati<strong>on</strong>s might not want part of their site appearing in search engines.<br />
<strong>Web</strong> servers often c<strong>on</strong>tain temporary pages that should not be indexed.<br />
Organizati<strong>on</strong>s operating the <strong>Web</strong> server are paying for bandwidth and want to exclude robots and<br />
spiders that do not benefit their goals.<br />
Bots are not always well written or well intenti<strong>on</strong>ed and can hit a <strong>Web</strong> site with extremely rapid<br />
requests, causing a reducti<strong>on</strong> in resp<strong>on</strong>siveness or outright DoS for legitimate users.<br />
Bots may uncover informati<strong>on</strong> that the <strong>Web</strong>master would prefer remained secret or at least<br />
unadvertised (e.g., e-mail addresses).<br />
Fortunately, <strong>Web</strong> administrators or the <strong>Web</strong>master can influence the behavior of most bots <strong>on</strong> their <strong>Web</strong><br />
site. A series of agreements called the Robots Exclusi<strong>on</strong> Protocol (REP) has been created. Although<br />
REP is not an official Internet standard, it is supported by most well-written and well-intenti<strong>on</strong>ed bots,<br />
including those used by most major search engines.<br />
<strong>Web</strong> administrators who wish to limit bots’ acti<strong>on</strong>s <strong>on</strong> their <strong>Web</strong> server need to create a plain text file<br />
named “robots.txt.” The file must always have this name, and it must reside in the <strong>Web</strong> server’s root<br />
document directory. In additi<strong>on</strong>, <strong>on</strong>ly <strong>on</strong>e file is allowed per <strong>Web</strong> site. Note that the robots.txt file is a<br />
standard that is voluntarily supported by bot programmers, so malicious bots (such as EmailSiph<strong>on</strong> and<br />
Cherry Picker) often ignore this file. 28<br />
The robots.txt file is a simple text file that c<strong>on</strong>tains some keywords and file specificati<strong>on</strong>s. Each line of<br />
the file is either blank or c<strong>on</strong>sists of a single keyword and its related informati<strong>on</strong>. The keywords are used<br />
to tell robots which porti<strong>on</strong>s of a <strong>Web</strong> site are excluded.<br />
28<br />
Other methods for c<strong>on</strong>trolling malicious bots exist; however, they are changing c<strong>on</strong>stantly as the malicious bot operators and<br />
<strong>Web</strong> administrators develop new methods of counteracting each other’s techniques. Given the c<strong>on</strong>stantly changing nature<br />
of this area, discussi<strong>on</strong> of these techniques is bey<strong>on</strong>d the scope of this document. More informati<strong>on</strong> is available at<br />
http://www.<strong>on</strong>guard<strong>on</strong>line.gov/spam.html.<br />
5-7