03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A p p e n d i x B<br />

M u l t i p r o c e s s i n g<br />

When a <strong>PHP</strong> script is executed using the CLI SAPI (i.e. from a command line), that<br />

instance of execution exists as a process in the local operating system. The Process<br />

Control (pcntl) extension makes it possible for <strong>PHP</strong> scripts <strong>to</strong> perform what is called<br />

a process fork. This entails the original <strong>PHP</strong> process (then called the parent process)<br />

creating a copy of itself (appropriately called the child process) that includes everything<br />

from the data associated <strong>with</strong> the process <strong>to</strong> the current point of execution<br />

following the fork instruction.<br />

Once the fork completes, both processes exist and execute independently of each<br />

other. The parent process can fork itself multiple times in succession and retains a<br />

limited awareness of its child processes. In particular, it is notified when any given<br />

child process terminates.<br />

Because each child process is a copy of its parent process, the number of child<br />

processes that can be forked is limited by the hardware resources of the local system.<br />

CPU will merely limit the speed at which all child processes can complete. M e m o r y,<br />

however, can become a bottleneck if the local system’s RAM is exceeded and it has <strong>to</strong><br />

resort <strong>to</strong> using swap space for s<strong>to</strong>rage.<br />

This restriction creates the desire <strong>to</strong> fork as many processes as possible <strong>to</strong> complete<br />

a job in parallel <strong>with</strong>out hitting any resource limits. This is generally achieved<br />

by using a predetermined cap on the number of processes <strong>to</strong> fork. Once any given<br />

child processes completes, however, it may be desirable <strong>to</strong> create a new child process

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!