1=head1 NAME 2 3lwptut -- An LWP Tutorial 4 5=head1 DESCRIPTION 6 7LWP (short for "Library for WWW in Perl") is a very popular group of 8Perl modules for accessing data on the Web. Like most Perl 9module-distributions, each of LWP's component modules comes with 10documentation that is a complete reference to its interface. However, 11there are so many modules in LWP that it's hard to know where to start 12looking for information on how to do even the simplest most common 13things. 14 15Really introducing you to using LWP would require a whole book -- a book 16that just happens to exist, called I<Perl & LWP>. But this article 17should give you a taste of how you can go about some common tasks with 18LWP. 19 20 21=head2 Getting documents with LWP::Simple 22 23If you just want to get what's at a particular URL, the simplest way 24to do it is LWP::Simple's functions. 25 26In a Perl program, you can call its C<get($url)> function. It will try 27getting that URL's content. If it works, then it'll return the 28content; but if there's some error, it'll return undef. 29 30 my $url = 'http://www.npr.org/programs/fa/?todayDate=current'; 31 # Just an example: the URL for the most recent /Fresh Air/ show 32 33 use LWP::Simple; 34 my $content = get $url; 35 die "Couldn't get $url" unless defined $content; 36 37 # Then go do things with $content, like this: 38 39 if($content =~ m/jazz/i) { 40 print "They're talking about jazz today on Fresh Air!\n"; 41 } 42 else { 43 print "Fresh Air is apparently jazzless today.\n"; 44 } 45 46The handiest variant on C<get> is C<getprint>, which is useful in Perl 47one-liners. If it can get the page whose URL you provide, it sends it 48to STDOUT; otherwise it complains to STDERR. 49 50 % perl -MLWP::Simple -e "getprint 'http://www.cpan.org/RECENT'" 51 52That is the URL of a plain text file that lists new files in CPAN in 53the past two weeks. You can easily make it part of a tidy little 54shell command, like this one that mails you the list of new 55C<Acme::> modules: 56 57 % perl -MLWP::Simple -e "getprint 'http://www.cpan.org/RECENT'" \ 58 | grep "/by-module/Acme" | mail -s "New Acme modules! Joy!" $USER 59 60There are other useful functions in LWP::Simple, including one function 61for running a HEAD request on a URL (useful for checking links, or 62getting the last-revised time of a URL), and two functions for 63saving/mirroring a URL to a local file. See L<the LWP::Simple 64documentation|LWP::Simple> for the full details, or chapter 2 of I<Perl 65& LWP> for more examples. 66 67 68 69=for comment 70 ########################################################################## 71 72 73 74=head2 The Basics of the LWP Class Model 75 76LWP::Simple's functions are handy for simple cases, but its functions 77don't support cookies or authorization, don't support setting header 78lines in the HTTP request, generally don't support reading header lines 79in the HTTP response (notably the full HTTP error message, in case of an 80error). To get at all those features, you'll have to use the full LWP 81class model. 82 83While LWP consists of dozens of classes, the main two that you have to 84understand are L<LWP::UserAgent> and L<HTTP::Response>. LWP::UserAgent 85is a class for "virtual browsers" which you use for performing requests, 86and L<HTTP::Response> is a class for the responses (or error messages) 87that you get back from those requests. 88 89The basic idiom is C<< $response = $browser->get($url) >>, or more fully 90illustrated: 91 92 # Early in your program: 93 94 use LWP 5.64; # Loads all important LWP classes, and makes 95 # sure your version is reasonably recent. 96 97 my $browser = LWP::UserAgent->new; 98 99 ... 100 101 # Then later, whenever you need to make a get request: 102 my $url = 'http://www.npr.org/programs/fa/?todayDate=current'; 103 104 my $response = $browser->get( $url ); 105 die "Can't get $url -- ", $response->status_line 106 unless $response->is_success; 107 108 die "Hey, I was expecting HTML, not ", $response->content_type 109 unless $response->content_type eq 'text/html'; 110 # or whatever content-type you're equipped to deal with 111 112 # Otherwise, process the content somehow: 113 114 if($response->decoded_content =~ m/jazz/i) { 115 print "They're talking about jazz today on Fresh Air!\n"; 116 } 117 else { 118 print "Fresh Air is apparently jazzless today.\n"; 119 } 120 121There are two objects involved: C<$browser>, which holds an object of 122class LWP::UserAgent, and then the C<$response> object, which is of 123class HTTP::Response. You really need only one browser object per 124program; but every time you make a request, you get back a new 125HTTP::Response object, which will have some interesting attributes: 126 127=over 128 129=item * 130 131A status code indicating 132success or failure 133(which you can test with C<< $response->is_success >>). 134 135=item * 136 137An HTTP status 138line that is hopefully informative if there's failure (which you can 139see with C<< $response->status_line >>, 140returning something like "404 Not Found"). 141 142=item * 143 144A MIME content-type like "text/html", "image/gif", 145"application/xml", etc., which you can see with 146C<< $response->content_type >> 147 148=item * 149 150The actual content of the response, in C<< $response->decoded_content >>. 151If the response is HTML, that's where the HTML source will be; if 152it's a GIF, then C<< $response->decoded_content >> will be the binary 153GIF data. 154 155=item * 156 157And dozens of other convenient and more specific methods that are 158documented in the docs for L<HTTP::Response>, and its superclasses 159L<HTTP::Message> and L<HTTP::Headers>. 160 161=back 162 163 164 165=for comment 166 ########################################################################## 167 168 169 170=head2 Adding Other HTTP Request Headers 171 172The most commonly used syntax for requests is C<< $response = 173$browser->get($url) >>, but in truth, you can add extra HTTP header 174lines to the request by adding a list of key-value pairs after the URL, 175like so: 176 177 $response = $browser->get( $url, $key1, $value1, $key2, $value2, ... ); 178 179For example, here's how to send some commonly used headers, in case 180you're dealing with a site that would otherwise reject your request: 181 182 183 my @ns_headers = ( 184 'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)', 185 'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*', 186 'Accept-Charset' => 'iso-8859-1,*,utf-8', 187 'Accept-Language' => 'en-US', 188 ); 189 190 ... 191 192 $response = $browser->get($url, @ns_headers); 193 194If you weren't reusing that array, you could just go ahead and do this: 195 196 $response = $browser->get($url, 197 'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)', 198 'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*', 199 'Accept-Charset' => 'iso-8859-1,*,utf-8', 200 'Accept-Language' => 'en-US', 201 ); 202 203If you were only ever changing the 'User-Agent' line, you could just change 204the C<$browser> object's default line from "libwww-perl/5.65" (or the like) 205to whatever you like, using the LWP::UserAgent C<agent> method: 206 207 $browser->agent('Mozilla/4.76 [en] (Win98; U)'); 208 209 210 211=for comment 212 ########################################################################## 213 214 215 216=head2 Enabling Cookies 217 218A default LWP::UserAgent object acts like a browser with its cookies 219support turned off. There are various ways of turning it on, by setting 220its C<cookie_jar> attribute. A "cookie jar" is an object representing 221a little database of all 222the HTTP cookies that a browser knows about. It can correspond to a 223file on disk or 224an in-memory object that starts out empty, and whose collection of 225cookies will disappear once the program is finished running. 226 227To give a browser an in-memory empty cookie jar, you set its C<cookie_jar> 228attribute like so: 229 230 use HTTP::CookieJar::LWP; 231 $browser->cookie_jar( HTTP::CookieJar::LWP->new ); 232 233To save a cookie jar to disk, see L<< HTTP::CookieJar/dump_cookies >>. 234To load cookies from disk into a jar, see L<< 235HTTP::CookieJar/load_cookies >>. 236 237=for comment 238 ########################################################################## 239 240 241 242=head2 Posting Form Data 243 244Many HTML forms send data to their server using an HTTP POST request, which 245you can send with this syntax: 246 247 $response = $browser->post( $url, 248 [ 249 formkey1 => value1, 250 formkey2 => value2, 251 ... 252 ], 253 ); 254 255Or if you need to send HTTP headers: 256 257 $response = $browser->post( $url, 258 [ 259 formkey1 => value1, 260 formkey2 => value2, 261 ... 262 ], 263 headerkey1 => value1, 264 headerkey2 => value2, 265 ); 266 267For example, the following program makes a search request to AltaVista 268(by sending some form data via an HTTP POST request), and extracts from 269the HTML the report of the number of matches: 270 271 use strict; 272 use warnings; 273 use LWP 5.64; 274 my $browser = LWP::UserAgent->new; 275 276 my $word = 'tarragon'; 277 278 my $url = 'http://search.yahoo.com/yhs/search'; 279 my $response = $browser->post( $url, 280 [ 'q' => $word, # the Altavista query string 281 'fr' => 'altavista', 'pg' => 'q', 'avkw' => 'tgz', 'kl' => 'XX', 282 ] 283 ); 284 die "$url error: ", $response->status_line 285 unless $response->is_success; 286 die "Weird content type at $url -- ", $response->content_type 287 unless $response->content_is_html; 288 289 if( $response->decoded_content =~ m{([0-9,]+)(?:<.*?>)? results for} ) { 290 # The substring will be like "996,000</strong> results for" 291 print "$word: $1\n"; 292 } 293 else { 294 print "Couldn't find the match-string in the response\n"; 295 } 296 297 298 299=for comment 300 ########################################################################## 301 302 303 304=head2 Sending GET Form Data 305 306Some HTML forms convey their form data not by sending the data 307in an HTTP POST request, but by making a normal GET request with 308the data stuck on the end of the URL. For example, if you went to 309C<www.imdb.com> and ran a search on "Blade Runner", the URL you'd see 310in your browser window would be: 311 312 http://www.imdb.com/find?s=all&q=Blade+Runner 313 314To run the same search with LWP, you'd use this idiom, which involves 315the URI class: 316 317 use URI; 318 my $url = URI->new( 'http://www.imdb.com/find' ); 319 # makes an object representing the URL 320 321 $url->query_form( # And here the form data pairs: 322 'q' => 'Blade Runner', 323 's' => 'all', 324 ); 325 326 my $response = $browser->get($url); 327 328See chapter 5 of I<Perl & LWP> for a longer discussion of HTML forms 329and of form data, and chapters 6 through 9 for a longer discussion of 330extracting data from HTML. 331 332 333 334=head2 Absolutizing URLs 335 336The URI class that we just mentioned above provides all sorts of methods 337for accessing and modifying parts of URLs (such as asking sort of URL it 338is with C<< $url->scheme >>, and asking what host it refers to with C<< 339$url->host >>, and so on, as described in L<the docs for the URI 340class|URI>. However, the methods of most immediate interest 341are the C<query_form> method seen above, and now the C<new_abs> method 342for taking a probably-relative URL string (like "../foo.html") and getting 343back an absolute URL (like "http://www.perl.com/stuff/foo.html"), as 344shown here: 345 346 use URI; 347 $abs = URI->new_abs($maybe_relative, $base); 348 349For example, consider this program that matches URLs in the HTML 350list of new modules in CPAN: 351 352 use strict; 353 use warnings; 354 use LWP; 355 my $browser = LWP::UserAgent->new; 356 357 my $url = 'http://www.cpan.org/RECENT.html'; 358 my $response = $browser->get($url); 359 die "Can't get $url -- ", $response->status_line 360 unless $response->is_success; 361 362 my $html = $response->decoded_content; 363 while( $html =~ m/<A HREF=\"(.*?)\"/g ) { 364 print "$1\n"; 365 } 366 367When run, it emits output that starts out something like this: 368 369 MIRRORING.FROM 370 RECENT 371 RECENT.html 372 authors/00whois.html 373 authors/01mailrc.txt.gz 374 authors/id/A/AA/AASSAD/CHECKSUMS 375 ... 376 377However, if you actually want to have those be absolute URLs, you 378can use the URI module's C<new_abs> method, by changing the C<while> 379loop to this: 380 381 while( $html =~ m/<A HREF=\"(.*?)\"/g ) { 382 print URI->new_abs( $1, $response->base ) ,"\n"; 383 } 384 385(The C<< $response->base >> method from L<HTTP::Message|HTTP::Message> 386is for returning what URL 387should be used for resolving relative URLs -- it's usually just 388the same as the URL that you requested.) 389 390That program then emits nicely absolute URLs: 391 392 http://www.cpan.org/MIRRORING.FROM 393 http://www.cpan.org/RECENT 394 http://www.cpan.org/RECENT.html 395 http://www.cpan.org/authors/00whois.html 396 http://www.cpan.org/authors/01mailrc.txt.gz 397 http://www.cpan.org/authors/id/A/AA/AASSAD/CHECKSUMS 398 ... 399 400See chapter 4 of I<Perl & LWP> for a longer discussion of URI objects. 401 402Of course, using a regexp to match hrefs is a bit simplistic, and for 403more robust programs, you'll probably want to use an HTML-parsing module 404like L<HTML::LinkExtor> or L<HTML::TokeParser> or even maybe 405L<HTML::TreeBuilder>. 406 407 408 409 410=for comment 411 ########################################################################## 412 413=head2 Other Browser Attributes 414 415LWP::UserAgent objects have many attributes for controlling how they 416work. Here are a few notable ones: 417 418=over 419 420=item * 421 422C<< $browser->timeout(15); >> 423 424This sets this browser object to give up on requests that don't answer 425within 15 seconds. 426 427 428=item * 429 430C<< $browser->protocols_allowed( [ 'http', 'gopher'] ); >> 431 432This sets this browser object to not speak any protocols other than HTTP 433and gopher. If it tries accessing any other kind of URL (like an "ftp:" 434or "mailto:" or "news:" URL), then it won't actually try connecting, but 435instead will immediately return an error code 500, with a message like 436"Access to 'ftp' URIs has been disabled". 437 438 439=item * 440 441C<< use LWP::ConnCache; $browser->conn_cache(LWP::ConnCache->new()); >> 442 443This tells the browser object to try using the HTTP/1.1 "Keep-Alive" 444feature, which speeds up requests by reusing the same socket connection 445for multiple requests to the same server. 446 447 448=item * 449 450C<< $browser->agent( 'SomeName/1.23 (more info here maybe)' ) >> 451 452This changes how the browser object will identify itself in 453the default "User-Agent" line is its HTTP requests. By default, 454it'll send "libwww-perl/I<versionnumber>", like 455"libwww-perl/5.65". You can change that to something more descriptive 456like this: 457 458 $browser->agent( 'SomeName/3.14 (contact@robotplexus.int)' ); 459 460Or if need be, you can go in disguise, like this: 461 462 $browser->agent( 'Mozilla/4.0 (compatible; MSIE 5.12; Mac_PowerPC)' ); 463 464 465=item * 466 467C<< push @{ $ua->requests_redirectable }, 'POST'; >> 468 469This tells this browser to obey redirection responses to POST requests 470(like most modern interactive browsers), even though the HTTP RFC says 471that should not normally be done. 472 473 474=back 475 476 477For more options and information, see L<the full documentation for 478LWP::UserAgent|LWP::UserAgent>. 479 480 481 482=for comment 483 ########################################################################## 484 485 486 487=head2 Writing Polite Robots 488 489If you want to make sure that your LWP-based program respects F<robots.txt> 490files and doesn't make too many requests too fast, you can use the LWP::RobotUA 491class instead of the LWP::UserAgent class. 492 493LWP::RobotUA class is just like LWP::UserAgent, and you can use it like so: 494 495 use LWP::RobotUA; 496 my $browser = LWP::RobotUA->new('YourSuperBot/1.34', 'you@yoursite.com'); 497 # Your bot's name and your email address 498 499 my $response = $browser->get($url); 500 501But HTTP::RobotUA adds these features: 502 503 504=over 505 506=item * 507 508If the F<robots.txt> on C<$url>'s server forbids you from accessing 509C<$url>, then the C<$browser> object (assuming it's of class LWP::RobotUA) 510won't actually request it, but instead will give you back (in C<$response>) a 403 error 511with a message "Forbidden by robots.txt". That is, if you have this line: 512 513 die "$url -- ", $response->status_line, "\nAborted" 514 unless $response->is_success; 515 516then the program would die with an error message like this: 517 518 http://whatever.site.int/pith/x.html -- 403 Forbidden by robots.txt 519 Aborted at whateverprogram.pl line 1234 520 521=item * 522 523If this C<$browser> object sees that the last time it talked to 524C<$url>'s server was too recently, then it will pause (via C<sleep>) to 525avoid making too many requests too often. How long it will pause for, is 526by default one minute -- but you can control it with the C<< 527$browser->delay( I<minutes> ) >> attribute. 528 529For example, this code: 530 531 $browser->delay( 7/60 ); 532 533...means that this browser will pause when it needs to avoid talking to 534any given server more than once every 7 seconds. 535 536=back 537 538For more options and information, see L<the full documentation for 539LWP::RobotUA|LWP::RobotUA>. 540 541 542 543 544 545=for comment 546 ########################################################################## 547 548=head2 Using Proxies 549 550In some cases, you will want to (or will have to) use proxies for 551accessing certain sites and/or using certain protocols. This is most 552commonly the case when your LWP program is running (or could be running) 553on a machine that is behind a firewall. 554 555To make a browser object use proxies that are defined in the usual 556environment variables (C<HTTP_PROXY>, etc.), just call the C<env_proxy> 557on a user-agent object before you go making any requests on it. 558Specifically: 559 560 use LWP::UserAgent; 561 my $browser = LWP::UserAgent->new; 562 563 # And before you go making any requests: 564 $browser->env_proxy; 565 566For more information on proxy parameters, see L<the LWP::UserAgent 567documentation|LWP::UserAgent>, specifically the C<proxy>, C<env_proxy>, 568and C<no_proxy> methods. 569 570 571 572=for comment 573 ########################################################################## 574 575=head2 HTTP Authentication 576 577Many web sites restrict access to documents by using "HTTP 578Authentication". This isn't just any form of "enter your password" 579restriction, but is a specific mechanism where the HTTP server sends the 580browser an HTTP code that says "That document is part of a protected 581'realm', and you can access it only if you re-request it and add some 582special authorization headers to your request". 583 584For example, the Unicode.org admins stop email-harvesting bots from 585harvesting the contents of their mailing list archives, by protecting 586them with HTTP Authentication, and then publicly stating the username 587and password (at C<http://www.unicode.org/mail-arch/>) -- namely 588username "unicode-ml" and password "unicode". 589 590For example, consider this URL, which is part of the protected 591area of the web site: 592 593 http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html 594 595If you access that with a browser, you'll get a prompt 596like 597"Enter username and password for 'Unicode-MailList-Archives' at server 598'www.unicode.org'". 599 600In LWP, if you just request that URL, like this: 601 602 use LWP; 603 my $browser = LWP::UserAgent->new; 604 605 my $url = 606 'http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html'; 607 my $response = $browser->get($url); 608 609 die "Error: ", $response->header('WWW-Authenticate') || 'Error accessing', 610 # ('WWW-Authenticate' is the realm-name) 611 "\n ", $response->status_line, "\n at $url\n Aborting" 612 unless $response->is_success; 613 614Then you'll get this error: 615 616 Error: Basic realm="Unicode-MailList-Archives" 617 401 Authorization Required 618 at http://www.unicode.org/mail-arch/unicode-ml/y2002-m08/0067.html 619 Aborting at auth1.pl line 9. [or wherever] 620 621...because the C<$browser> doesn't know any the username and password 622for that realm ("Unicode-MailList-Archives") at that host 623("www.unicode.org"). The simplest way to let the browser know about this 624is to use the C<credentials> method to let it know about a username and 625password that it can try using for that realm at that host. The syntax is: 626 627 $browser->credentials( 628 'servername:portnumber', 629 'realm-name', 630 'username' => 'password' 631 ); 632 633In most cases, the port number is 80, the default TCP/IP port for HTTP; and 634you usually call the C<credentials> method before you make any requests. 635For example: 636 637 $browser->credentials( 638 'reports.mybazouki.com:80', 639 'web_server_usage_reports', 640 'plinky' => 'banjo123' 641 ); 642 643So if we add the following to the program above, right after the C<< 644$browser = LWP::UserAgent->new; >> line... 645 646 $browser->credentials( # add this to our $browser 's "key ring" 647 'www.unicode.org:80', 648 'Unicode-MailList-Archives', 649 'unicode-ml' => 'unicode' 650 ); 651 652...then when we run it, the request succeeds, instead of causing the 653C<die> to be called. 654 655 656 657=for comment 658 ########################################################################## 659 660=head2 Accessing HTTPS URLs 661 662When you access an HTTPS URL, it'll work for you just like an HTTP URL 663would -- if your LWP installation has HTTPS support (via an appropriate 664Secure Sockets Layer library). For example: 665 666 use LWP; 667 my $url = 'https://www.paypal.com/'; # Yes, HTTPS! 668 my $browser = LWP::UserAgent->new; 669 my $response = $browser->get($url); 670 die "Error at $url\n ", $response->status_line, "\n Aborting" 671 unless $response->is_success; 672 print "Whee, it worked! I got that ", 673 $response->content_type, " document!\n"; 674 675If your LWP installation doesn't have HTTPS support set up, then the 676response will be unsuccessful, and you'll get this error message: 677 678 Error at https://www.paypal.com/ 679 501 Protocol scheme 'https' is not supported 680 Aborting at paypal.pl line 7. [or whatever program and line] 681 682If your LWP installation I<does> have HTTPS support installed, then the 683response should be successful, and you should be able to consult 684C<$response> just like with any normal HTTP response. 685 686For information about installing HTTPS support for your LWP 687installation, see the helpful F<README.SSL> file that comes in the 688libwww-perl distribution. 689 690 691=for comment 692 ########################################################################## 693 694 695 696=head2 Getting Large Documents 697 698When you're requesting a large (or at least potentially large) document, 699a problem with the normal way of using the request methods (like C<< 700$response = $browser->get($url) >>) is that the response object in 701memory will have to hold the whole document -- I<in memory>. If the 702response is a thirty megabyte file, this is likely to be quite an 703imposition on this process's memory usage. 704 705A notable alternative is to have LWP save the content to a file on disk, 706instead of saving it up in memory. This is the syntax to use: 707 708 $response = $ua->get($url, 709 ':content_file' => $filespec, 710 ); 711 712For example, 713 714 $response = $ua->get('http://search.cpan.org/', 715 ':content_file' => '/tmp/sco.html' 716 ); 717 718When you use this C<:content_file> option, the C<$response> will have 719all the normal header lines, but C<< $response->content >> will be 720empty. Errors writing to the content file (for example due to 721permission denied or the filesystem being full) will be reported via 722the C<Client-Aborted> or C<X-Died> response headers, and not the 723C<is_success> method: 724 725 if ($response->header('Client-Aborted') eq 'die') { 726 # handle error ... 727 728Note that this ":content_file" option isn't supported under older 729versions of LWP, so you should consider adding C<use LWP 5.66;> to check 730the LWP version, if you think your program might run on systems with 731older versions. 732 733If you need to be compatible with older LWP versions, then use 734this syntax, which does the same thing: 735 736 use HTTP::Request::Common; 737 $response = $ua->request( GET($url), $filespec ); 738 739 740=for comment 741 ########################################################################## 742 743 744=head1 SEE ALSO 745 746Remember, this article is just the most rudimentary introduction to 747LWP -- to learn more about LWP and LWP-related tasks, you really 748must read from the following: 749 750=over 751 752=item * 753 754L<LWP::Simple> -- simple functions for getting/heading/mirroring URLs 755 756=item * 757 758L<LWP> -- overview of the libwww-perl modules 759 760=item * 761 762L<LWP::UserAgent> -- the class for objects that represent "virtual browsers" 763 764=item * 765 766L<HTTP::Response> -- the class for objects that represent the response to 767a LWP response, as in C<< $response = $browser->get(...) >> 768 769=item * 770 771L<HTTP::Message> and L<HTTP::Headers> -- classes that provide more methods 772to HTTP::Response. 773 774=item * 775 776L<URI> -- class for objects that represent absolute or relative URLs 777 778=item * 779 780L<URI::Escape> -- functions for URL-escaping and URL-unescaping strings 781(like turning "this & that" to and from "this%20%26%20that"). 782 783=item * 784 785L<HTML::Entities> -- functions for HTML-escaping and HTML-unescaping strings 786(like turning "C. & E. BrontE<euml>" to and from "C. & E. Brontë") 787 788=item * 789 790L<HTML::TokeParser> and L<HTML::TreeBuilder> -- classes for parsing HTML 791 792=item * 793 794L<HTML::LinkExtor> -- class for finding links in HTML documents 795 796=item * 797 798The book I<Perl & LWP> by Sean M. Burke. O'Reilly & Associates, 7992002. ISBN: 0-596-00178-9, L<http://oreilly.com/catalog/perllwp/>. The 800whole book is also available free online: 801L<http://lwp.interglacial.com>. 802 803=back 804 805 806=head1 COPYRIGHT 807 808Copyright 2002, Sean M. Burke. You can redistribute this document and/or 809modify it, but only under the same terms as Perl itself. 810 811=head1 AUTHOR 812 813Sean M. Burke C<sburke@cpan.org> 814 815=for comment 816 ########################################################################## 817 818=cut 819 820# End of Pod 821