22.01.2015 Views

MAS 801 It's a Discreetly Discrete World – Mathematics in ... - Spms

MAS 801 It's a Discreetly Discrete World – Mathematics in ... - Spms

MAS 801 It's a Discreetly Discrete World – Mathematics in ... - Spms

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Crawler<br />

• Program that browses WWW <strong>in</strong> a methodical, automated fashion<br />

• The process is called crawl<strong>in</strong>g or spider<strong>in</strong>g<br />

• downloads a copy of all the visited pages for later use by an <strong>in</strong>dexer<br />

• Characteristics of WWW that makes crawl<strong>in</strong>g difficult<br />

• WWW has large volume<br />

• can only download a fraction of the web pages <strong>in</strong> a given time;<br />

need to prioritize which page to download<br />

• WWW has fast rate of change<br />

• by the time the pages of a site are downloaded, new pages may<br />

have been added, or content of pages may have changed<br />

• Behaviour of a crawler is governed by its policies<br />

• Selection policy: which page to download<br />

• Revisit policy: when to check for changes

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!