10.07.2015 Views

Beginning Web Development With Perl : From Novice to ... - Nabo

Beginning Web Development With Perl : From Novice to ... - Nabo

Beginning Web Development With Perl : From Novice to ... - Nabo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 5 ■ LWP MODULES 103For this example, assume that you have a browser object called $browser and proxy servercalled proxy.example.com. If you want <strong>to</strong> set the HTTP proxy server for use within the program,the invocation of the proxy() method looks like this:$browser->proxy("http","http://proxy.example.com");It’s quite common for a proxy server <strong>to</strong> be used for URLs that are outside the local network.Inside the network, a proxy server should not be used. For these cases, the LWPincludes a no_proxy() method that accepts a comma-separated list of domains for which noproxy server should be used. Assume that you have a server located at local.example.comfor which you want direct access, as opposed <strong>to</strong> access through the proxy. The no_proxy()method call looks like this:$browser->no_proxy("local.example.com");Calling no_proxy() with an empty list clears out the list of hosts:$browser->no_proxy();Removing HTML Tags from a PageAs you’ve undoubtedly seen if you’ve followed the examples in this chapter, the content thatcomes back from a GET request is the raw, uncensored HTML (and other language) contentfrom the web server. To say that this is difficult for a human <strong>to</strong> read and interpret is an understatement.Unfortunately, there is no surefire method for extracting the useful text from a webpage. However, you have some options for retrieving the text from a page.For example, Listing 5-6 shows the Get.pl example shown earlier in the chapter, but modified<strong>to</strong> use HTML::FormatText <strong>to</strong> produce output that is more human-friendly.Listing 5-6. Using HTML::FormatText <strong>to</strong> Retrieve the Text from a Page!/usr/bin/perl -wuse strict;use HTML::TreeBuilder;use HTML::FormatText;use LWP::Simple;my $webpage = get("http://www.braingia.org/");my $htmltree = HTML::TreeBuilder->new->parse($webpage);my $output = HTML::FormatText->new();print $output->format($htmltree);

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!