03.02.2014 Views

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

php|architect's Guide to Web Scraping with PHP - Wind Business ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Rolling Y o u Own r ” 83<br />

• The stream_get_contents function is used <strong>to</strong> read all available data from the<br />

connection, in this case the response <strong>to</strong> the request.<br />

• The fclose function is used <strong>to</strong> explicitly terminate the connection.<br />

Depending on the nature and requirements of the project, not all facets of a request<br />

may be known at one time. In this situation, it is desirable <strong>to</strong> encapsulate request<br />

metadata in a data structure such as an associative array or an object. From this, a<br />

central unit of logic can be used <strong>to</strong> read that metadata and construct a request in the<br />

form of a string based on it.<br />

Manually constructing requests <strong>with</strong>in a string as shown in the example above also<br />

doesn’t have ideal readability. If exact requests are known ahead of time and do not<br />

vary, an alternative approach is s<strong>to</strong>ring them in a data source of some type, then retrieving<br />

them at runtimeand sending them o v e r the connection as they are. Whether<br />

it is possible <strong>to</strong> take this approach depends on the level of variance in requests going<br />

between the web scraping application and the target application.<br />

If the need arises <strong>to</strong> manually build query strings or URL-encoded POST request<br />

bodies, the http_build_query function allows this <strong>to</strong> be done using associative arrays.<br />

P a r s i n gR e s p o n s e s<br />

Once you’ve received a response, the next step is obtaining the data you need from<br />

it. T a k thei response n g from the last example, let’s examine what this might look like.<br />

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!