MAS 801 It's a Discreetly Discrete World – Mathematics in ... - Spms
MAS 801 It's a Discreetly Discrete World – Mathematics in ... - Spms
MAS 801 It's a Discreetly Discrete World – Mathematics in ... - Spms
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Crawler<br />
• Program that browses WWW <strong>in</strong> a methodical, automated fashion<br />
• The process is called crawl<strong>in</strong>g or spider<strong>in</strong>g<br />
• downloads a copy of all the visited pages for later use by an <strong>in</strong>dexer<br />
• Characteristics of WWW that makes crawl<strong>in</strong>g difficult<br />
• WWW has large volume<br />
• can only download a fraction of the web pages <strong>in</strong> a given time;<br />
need to prioritize which page to download<br />
• WWW has fast rate of change<br />
• by the time the pages of a site are downloaded, new pages may<br />
have been added, or content of pages may have changed<br />
• Behaviour of a crawler is governed by its policies<br />
• Selection policy: which page to download<br />
• Revisit policy: when to check for changes