01.09.2015 Views

4.0

1NSchAb

1NSchAb

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

34<br />

Web Application Penetration Testing<br />

2. “robots.txt” saved as “www.google.com-robots.txt”<br />

3. Sending Allow: URIs of www.google.com to web proxy i.e.<br />

127.0.0.1:8080<br />

/catalogs/about sent<br />

/catalogs/p? sent<br />

/news/directory sent<br />

...<br />

4. Done.<br />

cmlh$<br />

Analyze robots.txt using Google Webmaster Tools<br />

Web site owners can use the Google “Analyze robots.txt” function to<br />

analyse the website as part of its “Google Webmaster Tools” (https:/<br />

www.google.com/webmasters/tools). This tool can assist with testing<br />

and the procedure is as follows:<br />

1. Sign into Google Webmaster Tools with a Google account.<br />

2. On the dashboard, write the URL for the site to be analyzed.<br />

3. Choose between the available methods and follow the on screen<br />

instruction.<br />

META Tag<br />

tags are located within the HEAD section of each HTML Document<br />

and should be consistent across a web site in the likely event<br />

that the robot/spider/crawler start point does not begin from a document<br />

link other than webroot i.e. a “deep link”[5].<br />

If there is no “” entry then the “Robots<br />

Exclusion Protocol” defaults to “INDEX,FOLLOW” respectively. Therefore,<br />

the other two valid entries defined by the “Robots Exclusion Protocol”<br />

are prefixed with “NO...” i.e. “NOINDEX” and “NOFOLLOW”.<br />

Web spiders/robots/crawlers can intentionally ignore the “

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!