12008-10-15 Hans de Graaff <hans@degraaff.org> 2 3 * Checkbot 1.80 is released 4 52008-07-08 Hans de Graaff <hans@degraaff.org> 6 7 * checkbot (handle_doc): Tighten up the check for a robots tag so 8 that nofollow text later in the document won't be matched, thus 9 skipping the whole document, bug 2005950. 10 112007-05-05 Brandon Bell <Brandon_Bell@bcit.ca> 12 13 * checkbot: mms scheme can be ignored safely. 14 152007-04-30 Hans de Graaff <hans@degraaff.org> 16 17 * checkbot (printAllServers): Clarify that 'Unique links' actually 18 is 'Documents scanned'. 19 202007-02-26 Hans de Graaff <hans@degraaff.org> 21 22 * checkbot (handle_doc): Handle the case where decoded_content is 23 not available as per bug 1665075. 24 252007-02-26 Gerald Preifer <gerald@pfeifer.com> 26 27 * checkbot (check_point): Simplify and add a comment. 28 292007-02-26 Hans de Graaff <hans@degraaff.org> 30 31 * Makefile.PL: Require LWP 5.803 or better. decoded_content got 32 added in 5.802 and 5.803 added some important bugfixes. 33 342007-02-03 Hans de Graaff <hans@degraaff.org> 35 36 * Checkbot 1.79 is released 37 38 * RELEASE-PROCESS: Add the release process documentation. 39 402007-01-27 Gerald Pfeifer <gerald@pfeifer.com> 41 42 * checkbot (init_suppression): Check and provide error if 43 suppression file is in fact a directory. 44 452006-12-28 Hans de Graaff <hans@degraaff.org> 46 47 * checkbot: Add summary to tables to make files XHTML 1.1 compliant. 48 492006-11-16 Hans de Graaff <hans@degraaff.org> 50 51 * checkbot (handle_doc): Parse the decoded content so that all 52 character set issues are dealt with before parsing. This solves 53 bug 1264729. 54 552006-11-14 Hans de Graaff <hans@degraaff.org> 56 57 * checkbot (performRequest): Simplify the code dealing with 58 problems of HEAD requests by retrying all 500 reponses instead of 59 special-cases particular failures that we happen to know 60 about. This type of problem is all to common, and if there really 61 is a problem GET will find it anyway. 62 (add_error): Allow regular expressions in the suppression 63 file. Based on patch from Eric Noack 64 652006-11-14 Eric Noack <en@lightwerk.com> 66 67 * checkbot (send_mail): Indicate how many errors are detected in 68 the notification email's subject. 69 (handle_doc): Use the URL with which the document was received for 70 the problem reports and internal accounting, but keep on using the 71 proper base URL as defined by the reponse object when retrieving 72 links from the document. This fixes the case where a weird BASE 73 URL in a document could make it unclear where the actual problem 74 was. 75 762006-10-28 Hans de Graaff <hans@degraaff.org> 77 78 * checkbot (performRequest): Handle case where an FTP server may 79 not be able to handle a HEAD request. This may cause a lot of data 80 to be transferred in those cases. 81 822006-05-03 Hans de Graaff <hans@degraaff.org> 83 84 * Checkbot 1.78 is released 85 862005-12-18 Hans de Graaff <hans@degraaff.org> 87 88 * checkbot (printServerProblems): Make pages XHTML compliant again. 89 902005-12-18 Jens Schweikhardt <schweikh@schweikhardt.net> 91 92 * checkbot: Add classes and ids so that more styling options for 93 CSS are available. 94 * checkbot2.css: Example CSS file using the new classes and ids. 95 962005-11-11 Hans de Graaff <hans@degraaff.org> 97 98 * checkbot: React in a more subtle way if the Time::Duration 99 module is not found. 100 1012005-09-22 Hans de Graaff <hans@degraaff.org> 102 103 * Makefile.PL: Check for presence of Net::SSL and explain the 104 effects if this it not present. 105 1062005-08-20 Hans de Graaff <hans@degraaff.org> 107 108 * checkbot (handle_doc): Ignore some 'links' found by LinkExtor 109 which do not need to link to live links. Fixed bugs #1264447 and 110 #1107832. 111 112 * test.html: Add test cases for it. 113 1142005-08-06 Hans de Graaff <hans@degraaff.org> 115 116 * checkbot (performRequest): Switch from HEAD to GET on a 400 117 error, as the most likely cause is that the server has trouble 118 with HEAD requests. 119 1202005-08-05 Hans de Graaff <hans@degraaff.org> 121 122 * checkbot (handle_doc): Also show how many new links are found on 123 a page, not just the total number of links. 124 (performRequest): Don't retry GET method on a 403 error. 125 (handle_doc): Properly handle newlines in the matches for title 126 and robots meta tag. 127 1282005-07-28 Hans de Graaff <hans@degraaff.org> 129 130 * Checkbot 1.77 is released. 131 132 * checkbot: Fix use of $VERSION so that it compiles and can be 133 used by MakeMaker at the same time. 134 (handle_doc): Check for presence of robots meta tag and act on it. 135 Based on a patch by Donald Willingham. 136 1372005-07-25 Hans de Graaff <hans@degraaff.org> 138 139 * Checkbot 1.76 is released. 140 1412005-06-07 Hans de Graaff <hans@degraaff.org> 142 143 * checkbot (printServerProblems): Include title of page. 144 (handle_doc): Extract title for later printing. 145 Add new hash url_title to store page titles. 146 Based on a patch from John Bintz. 147 1482005-04-23 Hans de Graaff <hans@degraaff.org> 149 150 * checkbot: Add documentation on use of file:/// URLs. 151 1522005-01-23 Hans de Graaff <hans@degraaff.org> 153 154 * checkbot: Only send mail when Checkbot has detected any 155 problems, based on suggestion from Thomas Kuerten. 156 157 Print duration of run on final report, and refactor use of start 158 time variable to facilitate this. Feature depends on availability 159 of Time::Duration, but checkbot will work without it. Based on 160 patch from Adam Griff. 161 1622005-01-23 Adam Griff <griff@computer.org> 163 164 * checkbot (create_page): Print out more options on results page. 165 1662005-01-21 Hans de Graaff <hans@degraaff.org> 167 168 * checkbot: Remove automatic version number based on CVS version 169 now that commits will be more frequent than releases. 170 1712004-11-12 Hans de Graaff <hans@degraaff.org> 172 173 * checkbot (handle_url): Ignore javascript: URLs instead of 174 generating a 904 error. It would be nice to handle these as well. 175 1762004-05-26 Hans de Graaff <hans@degraaff.org> 177 178 * Makefile.PL: Sync HTML::Parser requirement with required 179 versions of libwww-perl. 180 1812004-05-03 Hans de Graaff <hans@degraaff.org> 182 183 * checkbot: Write better documentation for --file option. 184 1852004-04-26 Hans de Graaff <hans@degraaff.org> 186 187 * checkbot: Minor documentation changes thank to Jens 188 Schweikhardt. 189 1902004-04-22 Hans de Graaff <hans@degraaff.org> 191 192 * Checkbot 1.75 is released. 193 1942004-04-21 Hans de Graaff <hans@degraaff.org> 195 196 * checkbot (print_help): Use a here-doc for the help for easier 197 maintenance. 198 (init_modules): Add --noproxy options to set list of domains which 199 will not be passed through the proxy. 200 2012004-04-18 Hans de Graaff <hans@degraaff.org> 202 203 * checkbot (handle_url): Create an error if an unknown scheme is 204 encountered and only ignore known schemes like mailto: 205 2062004-03-30 Hans de Graaff <hans@degraaff.org> 207 208 * checkbot: Add explanation about error message which indicates 209 lack of SSL support. 210 2112004-03-28 Hans de Graaff <hans@degraaff.org> 212 213 * checkbot: Add EXAMPLES section to the perldoc documentation with 214 an example of the most simple invocation. Needs more examples... 215 Update help text for --mailto to confirm that more than one 216 address is possible. 217 218 * checkbot: Add new --cookies option to accept cookies from 219 servers. Based on patch from Roger Pilkey. 220 2212004-02-09 Hans de Graaff <hans@degraaff.org> 222 223 * Makefile.PL: Show correct text if LWP test fails. 224 2252004-01-05 Hans de Graaff <hans@degraaff.org> 226 227 * Makefile.PL: Now require LWP 5.76 to avoid problems with 500 228 "Need a field name" HTTP errors being generated by LWP. 229 2302003-12-29 Gerald Pfeifer <gerald@pfeifer.com> 231 232 * checkbot: Improve description of --proxy. 233 (print_help): Ditto. 234 2352003-12-21 Hans de Graaff <hans@degraaff.org> 236 237 * checkbot (performRequest): $url->authority may not be defined 238 for the URL we are checking. 239 2402003-12-17 Hans de Graaff <hans@degraaff.org> 241 242 * Checkbot 1.74 is released 243 244 * checkbot (add_error): Take into account that status message can 245 be undefined. 246 2472003-12-15 Hans de Graaff <hans@degraaff.org> 248 249 * checkbot: Put Checkbot errors in a hash to have one set of 250 descriptions around. 251 (handle_doc): Use it. 252 (checkbot_status_message): Use it to ind the status message for a 253 code from HTTP codes, Checkbot codes, or a generic status message. 254 (printServerProblems): Use it. 255 (handle_url): Move checks for --dontwarn and --suppression 256 features from here ... 257 (add_error): ... to here so that it applies to all errors. 258 2592003-12-14 Hans de Graaff <hans@degraaff.org> 260 261 * checkbot: Document that Checkbot defines its own response codes 262 for common problems. 263 No longer a need for the %warning hash. 264 (add_error): New function to add a new error into the hashes. 265 (handle_url): Use it. 266 (handle_doc): Use it for what previously were warnings. 267 (printServerWarnings): Obsolete as warnings have been changed to 268 use the normal error handling routines. 269 Marked --allow-simple-hosts option as deprecated, because this can 270 now be handled in a more generic way by the --dontwarn mechanism. 271 (print_help): Removed --allow-simple-hosts option from help. 272 (add_to_queue): Move code to check for double slash in URL to ... 273 (handle_doc): ... here as Checkbot error 903. 274 2752003-11-29 Hans de Graaff <hans@degraaff.org> 276 277 * checkbot (printServerProblems): Oops. Make sure all output is 278 going to the right file, not stdout. 279 Add new --suppress option which reads a file with response code / 280 URL combinations to be suppressed in the output, based on patch by 281 Rob Chekaluk. 282 (init_suppression): Read suppresson file and fill has with 283 results. 284 (handle_url): Use it. 285 (print_help): Document it. 286 2872003-11-24 Hans de Graaff <hans@degraaff.org> 288 289 * checkbot: Add example to --ignore argument. 290 2912003-11-23 Hans de Graaff <hans@degraaff.org> 292 293 * checkbot (init_modules): Delete commented-out code to enable 294 HTTP 1.1 in LWP. HTTP 1.1 has been the default in LWP for a while 295 and does not need special code to be enabled. 296 2972003-11-21 Hans de Graaff <hans@degraaff.org> 298 299 * checkbot (printServerProblems): Don't assume that status_message 300 is defined for all possible codes, based on patch by Thomas 301 Kuerten. 302 3032003-10-18 Hans de Graaff <hans@degraaff.org> 304 305 * Makefile.PL: Require LWP 5.70 because problems with HEAD of 306 ftp:// links have been solved in this release. 307 3082003-09-05 Hans de Graaff <hans@degraaff.org> 309 310 * checkbot (printServerProblems): Put line breaks in HTML file in 311 a more logical place. 312 3132003-08-31 Hans de Graaff <hans@degraaff.org> 314 315 * Checkbot 1.73 released 316 3172003-08-30 Hans de Graaff <hans@degraaff.org> 318 319 * checkbot (printServerProblems): Protect against undefined status. 320 3212003-08-29 Hans de Graaff <hans@degraaff.org> 322 323 * checkbot (handle_doc): Ignore URIs matching --ignore as they are 324 being found. 325 (handle_url): Remove check for --ignore option here. 326 Update documentation for --ignore. 327 (print_help): Idem. 328 3292003-08-21 Hans de Graaff <hans@degraaff.org> 330 331 * checkbot: Made --interval description a bit more clear. 332 3332003-07-26 Hans de Graaff <hans@degraaff.org> 334 335 * checkbot (init_modules): Uncomment proxy support, but it now 336 applies to all requests, not just external ones. 337 (print_help): Update --proxy help text. 338 Update perldoc documentation. 339 3402003-07-05 Hans de Graaff <hans@degraaff.org> 341 342 * checkbot: Additional explanation for --exclude option. 343 3442003-06-28 Bernd Petrovitsch <bernd@firmix.at> 345 346 * checkbot.css: Additional cleaning up of the CSS file. 347 3482003-06-26 Bernd Petrovitsch <bernd@firmix.at> 349 350 * checkbot: Produce valid XHTML 1.1 pages. 351 352 * checkbot.css: Clean up of the CSS file. 353 3542003-05-04 Hans de Graaff <hans@degraaff.org> 355 356 * Checkbot 1.72 released 357 358 * checkbot: Applied spelling fixes from Jens Schweikhardt. 359 (clean_up): Factored out of check_links so that it can also be 360 called when we catch a signal. 361 (got_signal): Catch signals like SIGINT and handle them, based on 362 patch by Jens Schweikhardt. 363 3642003-04-06 Hans de Graaff <hans@degraaff.org> 365 366 * checkbot (handle_url): No longer ignore URLs with a query 367 string. If checking these is not wanted then the --exclude option 368 can be used, and an example for this is now included in the 369 documentation. 370 3712003-03-30 Hans de Graaff <hans@degraaff.org> 372 373 * checkbot (printServerProblems): Add links to different error 374 codes on a server page for quick navigation. 375 3762003-02-22 Paul Merchant, Jr. <Paul.L.Merchant.Jr@Dartmouth.EDU> 377 378 * checkbot: Initialize the statistics counters to avoid warnings. 379 3802003-01-15 Hans de Graaff <hans@degraaff.org> 381 382 * checkbot (output): Correct the check for --verbose; not 383 specifying it now generates no output. 384 3852003-01-06 Hans de Graaff <hans@degraaff.org> 386 387 * checkbot (handle_doc): The host name check does not make much 388 sense for news: scheme URLs. 389 3902003-01-03 Hans de Graaff <hans@degraaff.org> 391 392 * checkbot (init_globals): Only remove file from default --match 393 argument when there is a path component in the start URL. 394 Initialize problem counter to avoid warning about uninitialized 395 value. 396 3972002-12-29 Hans de Graaff <hans@degraaff.org> 398 399 * Checkbot 1.71 released 400 401 * checkbot (handle_url): Make sure we feed is_internal a string. 402 (handle_url): Use existing variable instead of Referer header to 403 store parent URL. 404 405 * Checkbot 1.70 created for testing, but not released 406 407 * checkbot (performRequest): Add HTTP 403 error to list of error 408 codes to retry with a GET. 409 (handle_url): Only follow redirections for internal links. 410 4112002-12-28 Hans de Graaff <hans@degraaff.org> 412 413 * checkbot: Removed reference to AnyDBM_File because it is not 414 used anywhere. 415 Rewrote global statistics gathering to be more simple and more 416 accurate. 417 Added --filter option which allows rewriting of URLs before they 418 are checked, based on patch from Eli the Bearded <eli@netusa.net>. 419 Simplified storage of URLs with problems 420 (get_headers): Removed. 421 (performRequest): Included code from get_headers here. 422 (count_problems): Updated for new storage of URLs 423 (printServerProblems): Idem. 424 (handle_url): Idem. 425 (handle_doc): Idem. 426 (count_problems): Idem. 427 (printServerProblems): Idem. 428 (handle_doc): Add code to report all pages on which a problematic 429 URL appears. 430 (init_globals): Changed default --match argument to exclude final 431 page name. 432 433 4342002-12-27 Hans de Graaff <hans@degraaff.org> 435 436 * checkbot (output): Moved printing, including indentation and 437 verbose checking, to function 'output'. 438 (handle_doc): No more distinction between internal and external 439 links, we throw all links found in the queue. 440 (handle_doc): Removed statistics for now, they are too buggy. 441 (is_checked): New function takes into account that we sometimes 442 translate hostnames to IP addresses. 443 (handle_doc): Use it. 444 (check_internal): Removed dependency on statistics, use actual 445 queue contents to determine when all links are checked. 446 (handle_url): Only query server for file type on 447 application/octet-stream documents. 448 (is_internal): New function to determine if URL is internal. 449 (handle_url): Rewritten to use new functions and to deal with 450 external URLs being mixed in, and generally cleaned up. 451 (handle_url): Moved --internal-only checks here. 452 (check_external): Removed. 453 (check_links): Renamed from check_internal. 454 Added small blurb to documentation on distinction between internal 455 and external links and the way checkbot checks these. 456 457 * t/test.t: Added simple test case: can checkbot be run without 458 arguments? 459 4602002-12-25 Hans de Graaff <hans@degraaff.org> 461 462 * Checkbot 1.69 released 463 4642002-12-25 Hans de Graaff <hans@degraaff.org> 465 466 * checkbot (get_headers): Make sure feedback on HEAD requests gets 467 indented properly. 468 4692002-12-23 Hans de Graaff <hans@degraaff.org> 470 471 * checkbot (init_globals): Anchor automatic match argument based 472 on start URLs at the beginning. 473 4742002-12-16 Jens Schweikhardt <schweikh@schweikhardt.net> 475 476 * checkbot (check_external): Fixed printf to be print so that 477 actual information can be printed using --verbose. 478 4792002-12-02 Hans de Graaff <hans@degraaff.org> 480 481 * checkbot (get_headers): Also add 406 as an error which might 482 indicate that the web server doesn't like us doing a HEAD, so GET 483 instead. 484 4852002-12-01 Hans de Graaff <hans@degraaff.org> 486 487 * Makefile.PL: Updated based on libwww-perl Makefile.PL. 488 489 * checkbot: Remove the preamble cruft and just assume perl will be 490 /usr/bin/perl. Therefore also renamed checkbot.pl -> checkbot. 491 Indicate that Checkbot is licensed under the same terms as Perl 492 itself. 493 494 * checkbot.pl (count_problems): Rewrote debugging code to handle 495 request without header() method, even though this should not be 496 possible it does happen in the wild. 497 (handle_doc): Perform fully-qualified hostname check for all URI's 498 which support a hostname. 499 5002002-11-30 Hans de Graaff <hans@degraaff.org> 501 502 * checkbot.pl (add_checked): Use ->can construct to check if URL 503 supports host method. 504 5052002-10-27 Hans de Graaff <hans@degraaff.org> 506 507 * checkbot.pl: Add hints for recursive or run-away checkbot 508 processes. 509 5102002-09-28 Hans de Graaff <hans@degraaff.org> 511 512 * Checkbot 1.68 released 513 5142002-08-05 Hans de Graaff <hans@degraaff.org> 515 516 * checkbot.pl (handle_doc): Comment out warning about external 517 URLs with non-checkable schemes to avoid lots of useless output. 518 5192002-06-09 Jostle Lemcke <jostle@users.sourceforge.net> 520 521 * checkbot.pl: Added --allow-simple-hosts option. This option 522 turns off the warnings for unqualified host names. 523 5242002-04-01 Hans de Graaff <hans@degraaff.org> 525 526 * checkbot.pl (handle_doc): Ignore URLs found in <base> 527 tags. Suggestion from Roman Maeder. 528 5292002-03-31 Hans de Graaff <hans@degraaff.org> 530 531 * checkbot.pl (print_help): Mention --style option in help message. 532 (check_internal): Always close CURRENT filehandle, and add warn 533 for potential problems with this based on patch and report from 534 Greg Larkin. 535 536 * checkbot.pl: Added HINTS AND TIPS section to 537 documentation. Added hint on using passive FTP based on feedback 538 from Roman Maeder. 539 5402002-03-31 Brent Verner <brent@rcfile.org> 541 542 * checkbot.pl (handle_doc): Only match http and https, not stuff 543 like httpa. 544 5452002-03-31 Paco Hope <paco@paco.to> 546 547 * checkbot.css: Contributed style sheet for Checkbot. Use with 548 --style option. 549 5502002-01-20 Roman Maeder <maeder@mathconsult.ch> 551 552 * checkbot.pl (handle_url): Use select() to sleep instead of 553 sleep() so that sleep interval can be fractional. 554 5552001-12-16 Hans de Graaff <hans@degraaff.org> 556 557 * Checkbot 1.67 released 558 5592001-11-16 Hans de Graaff <hans@degraaff.org> 560 561 * checkbot.pl: Add example for --match argument based on question 562 by Michael Lambert. 563 5642001-11-11 Hans de Graaff <hans@degraaff.org> 565 566 * checkbot.pl (count_problems): Quote meta characters in server 567 name and URL when matching them. 568 (handle_doc): Fix two minor bugs related to the move to URI. 569 5702001-11-11 Evaldas Imbrasas <evaldas@wolfram.com> 571 572 * checkbot.pl: Add --language option to allow language 573 negotiation. 574 575 * checkbot.pl (check_options): Set default for --sleep option to 0. 576 577 * checkbot.pl (check_internal): Only close <CURRENT> if it already 578 exists. 579 5802001-11-03 Hans de Graaff <hans@degraaff.org> 581 582 * checkbot.pl (printServerProblems): There might not be a response 583 message. 584 (handle_url): Use status_line instead of code and message for 585 HTTP::Response object. 586 (handle_doc): Also check external gopher links. 587 5882001-10-25 Hans de Graaff <hans@degraaff.org> 589 590 * Checkbot 1.66 released 591 592 * checkbot.pl (get_headers): URI doesn't know about netloc, but it 593 does know about authority. 594 (get_headers): $url is already absolute, no need for ->abs 595 5962001-10-18 Hans de Graaff <hans@degraaff.org> 597 598 * Checkbot 1.65 released 599 6002001-10-14 Hans de Graaff <hans@degraaff.org> 601 602 * checkbot.pl (handle_doc): Print a notice when external non 603 HTTP/FTP URLs are dropped. 604 6052001-09-29 Hans de Graaff <hans@degraaff.org> 606 607 * checkbot.pl (init_modules and other places): Remove 608 URI::URL::strict call and use of new URI::URL because it is 609 obsolete, we should use the URI classes now. 610 6112001-09-23 Hans de Graaff <hans@degraaff.org> 612 613 * checkbot.pl (init_globals): Initialize last checkpoint time with 614 0 instead of current time, so that we write out a set of pages 615 right at the start. This will catch problems with permissions for 616 these pages as early as possible. 617 6182001-07-01 Hans de Graaff <hans@degraaff.org> 619 620 * checkbot.pl (get_server_type): Take into account that we might 621 not learn anything about the server 622 6232001-05-06 Hans de Graaff <hans@degraaff.org> 624 625 * checkbot.pl (get_headers): Factored out of check_external so 626 that moving to using GET requests only will be easier later. 627 6282001-04-30 Hans de Graaff <hans@degraaff.org> 629 630 * checkbot.pl (send_mail): Really fix printing of starting URLs in 631 email. All URLs are now printed in the subject and body of the 632 message. 633 6342001-04-15 Hans de Graaff <hans@degraaff.org> 635 636 * Checkbot 1.64 released 637 6382001-03-13 Hans de Graaff <hans@degraaff.org> 639 640 * checkbot.pl (send_mail): Fix printing of starting URL in email. 641 6422001-03-04 Nick Hibma <n_hibma@qubesoft.com> 643 644 * checkbot.pl (printServerWarnings): Removed duplicate print statement. 645 6462001-02-10 Boris Lantrewitz <lantrewi@do.isst.fhg.de> 647 648 * checkbot.pl (init_globals): Allow more environment variables to 649 be used to set the temporary directory. 650 (send_mail): Avoid using printf to the handle for those systems 651 where printf on a pipe is not implemented. 652 6532001-01-14 Hans de Graaff <hans@degraaff.org> 654 655 * Checkbot 1.63 released 656 6572001-01-02 Hans de Graaff <hans@degraaff.org> 658 659 * Makefile.PL (chk_version): Require LWP 5.50, which contains an 660 important bugfix when dealing with relative redirects. 661 6622001-01-01 Hans de Graaff <hans@degraaff.org> 663 664 * checkbot.pl (init_globals): If no --match is given, construct 665 one based on all the start URLs given. Suggested by Mathieu 666 Guillaume. 667 6682000-12-31 Hans de Graaff <hans@degraaff.org> 669 670 * checkbot.pl (create_page): Remove the .bak file when the new 671 file is written, unless --debug is in effect. 672 6732000-12-31 OBARA Kiyotake <obara@vc-net.ne.jp> 674 675 * checkbot.pl (print_server): Create correct URLs when --file 676 argument contains directories as well as a filename. 677 6782000-12-31 David Brownlee <abs@purplei.com> 679 680 * checkbot.pl (create_page): Fix typo in die message. 681 6822000-12-24 Hans de Graaff <hans@degraaff.org> 683 684 * checkbot.pl: Added a small blurb in the documentation about the 685 URLs Checkbot will find and check. 686 6872000-12-24 Petter Reinholdtsen <pere@hungry.com> 688 689 * checkbot.pl (handle_url): Deal with redirect responses without 690 Location header. 691 6922000-11-18 Roman Maeder <maeder@mathconsult.ch> 693 694 * checkbot.pl (handle_url): Remove check which would not check 695 files named the same as the main report file. If you don't want 696 Checkbot to check its intermediate pages, use the --exclude 697 option. 698 699 * checkbot.pl (handle_url): Ask server for file type when 700 requesting http:// URLs to be on the safe side, as using 701 guess_media_type() is not always correct. 702 7032000-10-28 Nick Hibma <n_hibma@qubesoft.com> 704 705 * checkbot.pl (check_external): Only print when --verbose is true. 706 (printServerProblems): Fix proper printing of <hr>. 707 (handle_doc): Include proper URL for report for unqualified URLs. 708 7092000-10-01 TAKAKU Masao <masao@ulis.ac.jp> 710 711 * checkbot.pl (print_server): Make pages well-formed by inserting 712 <html> and <body> tags. 713 7142000-09-24 Hans de Graaff <hans@degraaff.org> 715 716 * Checkbot 1.62 released 717 7182000-09-16 Hans de Graaff <hans@degraaff.org> 719 720 * checkbot.pl (send_mail): Only mention URL in the subject of the 721 mail if one is given through the --url option. 722 (check_external): The ALEPH web server is also broken with respect 723 to HEAD requests. 724 7252000-09-04 Hans de Graaff <hans@degraaff.org> 726 727 * checkbot.pl (check_external): JavaWebServer is also broken with 728 respect to HEAD requests. 729 7302000-08-26 Hans de Graaff <hans@degraaff.org> 731 732 * checkbot.pl (create_page): Add --style option which allows a 733 link to a CSS file to be included in each Checkbot page. 734 7352000-08-20 Nick Hibma <n_hibma@qubesoft.com> 736 737 * checkbot.pl (check_external): Some servers don't set the Server: 738 header. Check to see if the server field is set in a response to 739 avoid warnings. 740 741 * checkbot.pl (add_checked): Add --enable-virtual option to use 742 hostname instead of IP address to distinguish servers. This allows 743 checking of multiple virtual servers. 744 7452000-08-13 Hans de Graaff <hans@degraaff.org> 746 747 * Makefile.PL: Add a check for HTML::Parser. Require latest 748 version, 3.10, because I'm not sure older versions work correctly. 749 7502000-06-29 Hans de Graaff <hans@degraaff.org> 751 752 * Checkbot 1.61 released 753 754 * Makefile.PL (chk_version): Add version checked for in output. 755 7562000-06-18 Larry Gilbert <larry@n2h2.com> 757 758 * checkbot.pl (check_external): Use GET instead of HEAD for 759 confused closed-source servers. 760 7612000-06-18 Hans de Graaff <hans@degraaff.org> 762 763 * Makefile.PL (chk_version): require URI 1.07 as it contains bug 764 fixes for using Base URLs. 765 766 * checkbot.pl: Change email and web address 767 7682000-04-30 Hans de Graaff <graaff@xs4all.nl> 769 770 * Checkbot 1.60 released 771 772 * checkbot.pl (check_options): Add option --dontwarn to exclude 773 certain types of warnings. Based on idea by David Hoekman. 774 7752000-04-29 Mark Roedel <roedelm@letu.edu> 776 777 * checkbot.pl (handle_url): Deal with "300 Multiple Choices" 778 response which does not offer a URL to redirect to. 779 7802000-04-09 David Hoekman <dhoekman@halcyon.com> 781 782 * checkbot.pl (init_globals): Allow for TMPDIR with or without 783 trailing / 784 7852000-04-08 Hans de Graaff <Hans de Graaff <graaff@xs4all.nl>> 786 787 * checkbot.pl: Updated contact information in file header. 788 7892000-03-26 Hans de Graaff <graaff@xs4all.nl> 790 791 * checkbot.pl (check_options): Add message about skipping of 792 external links. Also removes warning about single use of variable. 793 7942000-03-06 Brian McNett <webmaster@mycoinfo.com> 795 796 * checkbot.pl: On a Mac, ask command line options 797 through MacPerl mechanism. 798 7992000-02-06 Hans de Graaff <graaff@xs4all.nl> 800 801 * checkbot.pl (init_globals): Check wether URLs on the command 802 line have a proper host. Thanks to Charles Williams for the 803 report. 804 8052000-01-30 Hans de Graaff <graaff@xs4all.nl> 806 807 * Checkbot 1.59 released 808 809 * checkbot.pl (handle_doc): Use eof instead of parse(undef) to end 810 parsing. 811 8122000-01-15 Hans de Graaff <graaff@xs4all.nl> 813 814 * checkbot.pl (handle_doc): Show warnings about hostnames only on 815 the console when --verbose. 816 8172000-01-09 Hans de Graaff <graaff@xs4all.nl> 818 819 * checkbot.pl: Added option --internal-only to skip checking of 820 external links altogether. Idea by David Hoekman 821 <dhoekman@halcyon.com> 822 8232000-01-02 Hans de Graaff <graaff@xs4all.nl> 824 825 * checkbot.pl (handle_doc): Use canonical URI from LinkExtor, 826 which simplifies the rest of the logic and gets things working 827 with the new version of LinkExtor. 828 8292000-01-01 Stephane Bortzmeyer <bortzmeyer@pasteur.fr> 830 831 * checkbot.pl (init_globals): Create Checkbot workdir in $TMPDIR 832 if defined, /tmp otherwise. 833 8341999-12-31 Hans de Graaff <graaff@xs4all.nl> 835 836 * checkbot.pl (handle_doc): Change frag to fragment. 837 8381999-11-07 Hans de Graaff <graaff@xs4all.nl> 839 840 * checkbot.pl (handle_doc): Add warning for URLs for which LWP 841 can't determine a hostname, and don't check them further. 842 8431999-10-24 Hans de Graaff <graaff@xs4all.nl> 844 845 * checkbot.pl (print_help): Added line on --interval option. 846 8471999-10-23 Hans de Graaff <graaff@xs4all.nl> 848 849 * checkbot.pl (init_globals): Fixed proper determination of server 850 prefix if a filename is supplied, thanks to Michael Baumer. 851 8521999-10-02 Hans de Graaff <graaff@xs4all.nl> 853 854 * checkbot.pl (init_modules): Added use URI. 855 8561999-08-21 Hans de Graaff <graaff@xs4all.nl> 857 858 * Makefile.PL (chk_version): Added check for URI. 859 8601999-07-17 Hans de Graaff <graaff@xs4all.nl> 861 862 * README: Added blurb on the announcements mailing list. 863 8641999-07-06 Hans de Graaff <graaff@xs4all.nl> 865 866 * checkbot.pl (add_checked): Deal with the fact that a mailto: URL 867 has no host component. Thanks to John Croft for the report. 868 8691999-06-27 Hans de Graaff <graaff@xs4all.nl> 870 871 * checkbot.pl (handle_url): Really fix relative redirection URLs 872 using the URI class. Thanks for Thomas Zander for the report and 873 reproducible failing URL. 874 8751999-05-03 Hans de Graaff <graaff@xs4all.nl> 876 877 * checkbot.pl (printServerWarnings): Also change clustering of URLs. 878 8791999-05-02 Hans de Graaff <graaff@xs4all.nl> 880 881 * checkbot.pl (signature): Add quotes around the URL in the 882 signature. 883 (printServerProblems): Fixed clustering of URLs so that faulty 884 links are listed under the URL that contains them, instead of the 885 other way around. This ordering problem was introduced in 1.53. 886 8871999-04-10 Hans de Graaff <graaff@xs4all.nl> 888 889 * checkbot.pl (handle_url): Make sure a redirected URL is fully 890 qualified (based on the original URL) to avoid dying on it 891 later. Thanks to David Hoekman for the initial analysis. 892 8931999-04-05 Hans de Graaff <graaff@xs4all.nl> 894 895 * checkbot.pl (printAllServers): Taken out of create_page for 896 clarity. 897 (printServerWarnings): Keep warning headers from being printed for 898 each warning. 899 9001999-03-15 Hans de Graaff <graaff@xs4all.nl> 901 902 * README: Explain which Perl modules are needed. 903 9041999-02-20 Hans de Graaff <graaff@xs4all.nl> 905 906 * checkbot.pl (printServerWarnings): Fix printing of warnings so 907 that headers are only printed once. 908 (print_server): get correct IP address for web servers with 909 non-standard port numbers. 910 9111999-02-08 Hans de Graaff <graaff@xs4all.nl> 912 913 * Makefile.PL (chk_version): Added location of Mail::Send. 914 9151999-01-18 Hans de Graaff <graaff@xs4all.nl> 916 917 * checkbot.pl (count_problems): Change counting of problems to 918 deal with new structure. 919 9201999-01-17 Hans de Graaff <graaff@xs4all.nl> 921 922 * checkbot.pl (printServerProblems): Changed to accomodate new 923 inventory of problem response. This new method allow multiple bad 924 links to one URL be all reported all at once. Also use 925 standardized response descriptions based on a patch by Benjamin 926 Franz <snowhare@nihongo.org>. 927 9281999-01-10 Hans de Graaff <graaff@xs4all.nl> 929 930 * checkbot.pl (byReferringPage): Added to allow sorting of 931 problems by referer. 932 (byProblem): Removed code to compare by exact message and 933 referer. 934 Removed the pre-amble to generate correct perl path because it is 935 a bit too cumbersome during development. 936 9371998-12-31 Hans de Graaff <graaff@xs4all.nl> 938 939 * checkbot.pl (handle_url): Do a HEAD request when the guessed 940 content-type matches application/octet-stream to get the real 941 content-type from the server. 942 9431998-12-27 Hans de Graaff <graaff@xs4all.nl> 944 945 * checkbot.pl (handle_doc): Added warning for HTTP URLs without a 946 fully-qualified hostname. 947 948 * checkbot.pl (printServerWarnings): Added a mechanism to also 949 display checkbot warnings, unrelated to the HTTP responses, on the 950 results pages. 951 9521998-10-24 Hans de Graaff <graaff@xs4all.nl> 953 954 * checkbot.pl (setup): Explicitly set record separator $/ 955 This appears needed for perl 5.005, and fixes a problem 956 where no URLs would appear to match except the first few. 957 9581998-10-10 Hans de Graaff <graaff@xs4all.nl> 959 960 * checkbot.pl: Made POD conform to new scripts format better. 961 9621998-06-21 Hans de Graaff <graaff@xs4all.nl> 963 964 * checkbot.pl (init_modules): HTML::Parse is no longer needed, 965 removed. 966 967Sat Sep 6 16:00:12 1997 Hans de Graaff <graaff@xs4all.nl> 968 969 * checkbot 1.51 released 970 971Sat Aug 30 18:05:39 1997 Hans de Graaff <graaff@xs4all.nl> 972 973 * checkbot.pl (init_globals): assume file: scheme when no scheme 974 is present. 975 976 * checkbot.pl: Small portability stuff for perl 5.004 and LWP 5.11. 977 978Sun Aug 17 08:56:38 1997 Hans de Graaff <graaff@xs4all.nl> 979 980 * README: Changed email addresses to point to new ISP. 981 982Mon Apr 28 09:08:29 1997 Hans de Graaff <graaff@xs4all.nl> 983 984 * checkbot.pl: Parsing VERSION is somewhat tricky. Fixed. 985 986Sun Apr 27 21:02:58 1997 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 987 988 * checkbot.pl (check_external): Close EXTERNAL after use. 989 990Sun Apr 20 10:24:09 1997 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 991 992 * checkbot.pl: Fixed a number of small bugs reported by Jost Krieger. 993 Regular expressions can now be used with the options. 994 Added --interval option to denote maximum interval between updates. 995 996Sat Apr 5 17:03:46 1997 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 997 998 * checkbot.pl (init_globals): Added checks for URLs without a scheme. 999 1000Fri Mar 14 11:17:21 1997 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1001 1002 * checkbot.pl (print_help): Fix typo. 1003 1004Tue Jan 14 16:51:36 1997 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1005 1006 * checkbot.pl (check_internal): Check whether there are really 1007 entries in the new queue when changing queues. 1008 1009Sat Jan 4 14:26:04 1997 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1010 1011 * checkbot.pl (print_help): --seconds should be --sleep in help. 1012 1013Mon Dec 30 12:03:14 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1014 1015 * checkbot.pl (handle_url): If a URL is exclude'd, only use HEAD 1016 on it, not GET. 1017 Starting URLs can now be entered on the command line in addition 1018 to the --url option. --url takes precedence. --match is 1019 initialized with first URL if not given as separate option. 1020 1021Mon Dec 23 20:21:32 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1022 1023 * checkbot.pl (print_server_problems): Each error message was 1024 evaluated as a regexp, potentially crashing checkbot on a bad 1025 regexp (e.g. including the string '++'). 1026 1027Mon Dec 23 15:15:05 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1028 1029 * checkbot.pl (ip_address): Deal with IP-address not found. 1030 1031Sun Dec 8 12:55:33 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1032 1033 * checkbot.pl (send_mail): --note didn't work; Checkbot would 1034 crash when no external links were found. 1035 1036Wed Dec 4 12:43:14 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1037 1038 * checkbot.pl (add_checked): All checked URLs are indexed using 1039 IP-address to avoid checking pages multiple times for multiple 1040 CNAME's. 1041 1042Mon Nov 4 14:19:30 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1043 1044 * checkbot.pl (send_mail): Braino in URL fixed. 1045 1046Sun Oct 27 20:16:38 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1047 1048 * checkbot.pl (init_globals): Don't let --match default to the 1049 --url until after we possible change the URL (this happens for 1050 file:/ URLs, currently) 1051 1052Wed Oct 23 14:22:15 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1053 1054 * checkbot.pl (check_point): Oops, checking would occur every minute 1055 1056Mon Oct 21 13:41:48 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1057 1058 * checkbot.pl (print_help): Added version number to help info. 1059 1060Wed Oct 16 21:05:58 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1061 1062 * checkbot.pl: Added --proxy option for checking external links 1063 through a proxy server 1064 1065Sat Sep 28 09:26:48 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1066 1067 * checkbot.pl (init_globals): Changed /var/tmp to /tmp. 1068 (check_point): Slower exponential rate, upper limit of 3 hours 1069 1070 * Makefile.PL: Added check for Mail::Send 1071 1072 * README: Added 1073 1074Thu Sep 26 17:01:36 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1075 1076 * checkbot.pl: Switched from short options to long options. 1077 I was already running out of meaningful options, so before adding 1078 additional stuff I wanted to move to Long options first. You 1079 should be able to abbreviate most options to the previous values. 1080 Notable exception is -m, which has become --match. 1081 1082Wed Sep 25 10:58:06 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1083 1084 * checkbot.pl: 1085 Renamed from checkbot 1086 Added preamble to set proper path for perl (code from Gisle Aas) 1087 1088 * Makefile.PL: First version, installs checkbot and checkbot.1 1089 1090 * checkbot: Changed $revision to $VERSION for MakeMaker. 1091 1092Thu Sep 12 15:09:07 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1093 1094 * index.html: updated required modules and location. 1095 1096 * checkbot: require LWP-5.02, because it fixes a few nasty bugs. 1097 1098Thu Sep 5 16:00:42 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1099 1100 * index.html: 1101 Removed old and out-of-date documentation. Replaced by link to 1102 automatically generated html version of POD documentation 1103 within Checkbot. 1104 1105 * checkbot: 1106 Fixed documentation bugs. 1107 Really fix the case insensitive comparison. 1108 1109Sun Sep 1 20:31:46 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1110 1111 * checkbot (print_server_problems): 1112 Make comparison for error message case insensitive. 1113 1114Fri Aug 30 20:19:56 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1115 1116 * checkbot: Fixed several typo's. 1117 1118Wed Aug 7 10:06:29 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1119 1120 * checkbot (handle_doc): 1121 The new LinkExtractor is nice, but I shouldn't treat its output as 1122 a hash when it is an array, and thus skipping every other link. 1123 1124Mon Aug 5 08:46:24 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1125 1126 * checkbot (print_server): 1127 Fixed silly bug in calculating the percentage of problems on each 1128 server. 1129 1130Fri Aug 2 21:38:39 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1131 1132 * checkbot: Added several patches by Bruce Speyer: 1133 Added -N note option to go along with -M, -z to suppress reporting 1134 errors on matching links. 1135 Added enough logic to catch gopher URLS if no gopher server found. 1136 Need further logic to parse gopher returned menu for bad file or 1137 directory. 1138 1139 * checkbot: Made a good start with POD documentation inside the 1140 checkbot file. Try 'perldoc checkbot'. 1141 1142 * TODO: Added number of suggestions by Luuk de Boer. 1143 1144 * checkbot (send_mail): Include summary of links checked in message. 1145 1146Fri Aug 2 13:01:02 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1147 1148 * checkbot: 1149 Added check for correct LWP version. We now need 5.01, due to bugs 1150 in the handling of the BASE attribute in previous versions. 1151 1152Sat Jul 27 21:13:26 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1153 1154 * checkbot: 1155 Added several patches by Bruce Speyer: 1156 Optimized some static regular expressions. 1157 Fixed not setting the timeout, making the -t option useless. 1158 1159Mon Jul 22 22:28:34 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1160 1161 * checkbot (create_page): 1162 Fixed number of columns in summary output. 1163 1164Sat Jul 20 11:49:23 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1165 1166 * checkbot (handle_doc): Changed to use the new HTML::LinkExtor, 1167 which will be present in LWP5.01. Should be more efficient, and 1168 less prone to memory leaks. 1169 1170Sat Jul 13 12:41:23 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1171 1172 * checkbot (create_page): Forgot to add the ratio on the page. 1173 (check_external): Fix problems with different `wc` output. 1174 1175Sat Jun 22 11:30:12 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1176 1177 * checkbot: Use correct base URL as returned with the document. 1178 Only check document when we used 'GET' to receive it. 1179 Remove magic guessing with ending slash of starting url. 1180 Deal with redirections by inserting redirected URLs into queue 1181 again. 1182 1183Thu Jun 20 15:58:20 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1184 1185 * checkbot: Major cleanup of initialization code. Also added todo 1186 counts to progression page, and proper todo handling for external 1187 links. 1188 1189Sun Jun 16 21:16:28 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1190 1191 * checkbot: Added -M option: send mail when Checkbot is done. 1192 Fixed division by zero bug when external links == 0 1193 1194Tue Jun 4 12:46:39 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1195 1196 * checkbot: Better way to ignore fragments. 1197 1198Sat Jun 1 15:14:52 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1199 1200 * checkbot: Don't print decimals with the precentages. 1201 Major update of counting, and printing counts. Cleaned up 1202 variables, corrected counting, made display more consistent and 1203 clear. 1204 1205Wed May 29 21:18:26 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1206 1207 * checkbot: Small fixes to support lwp-win32 as well, thanks to 1208 Martin Cleaver. 1209 1210Mon May 27 09:21:30 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1211 1212 * checkbot: oops, small error in regexp caused script to append a 1213 slash to almost all start-url's. Fixed. 1214 1215 * checkbot (handle_doc): External links without full URL's were 1216 not always handled properly. 1217 1218Sun May 26 10:04:39 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1219 1220 * checkbot: If the starting URL doesn't end in a slash, and 1221 doesn't have an extension, assume we need to add a slash. 1222 1223 * index.html: Add version number to web page, and make sure it gets 1224 updated automatically. 1225 1226Wed May 22 09:58:36 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1227 1228 * checkbot: Changed verbose output of links found on pages. 1229 1230Tue May 14 16:43:38 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1231 1232 * TODO: updated with respect to recent changes. 1233 1234Mon May 13 15:06:05 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1235 1236 * checkbot: Added LWP version number to agent field, changed page 1237 update policy, and updated script to LWP5b13. 1238 1239Sat May 4 21:38:56 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1240 1241 * checkbot: Changed checked array to an associative array. Will 1242 consume more memory, but drastically cut back on lookup time. 1243 1244 Rewrote handle_url logic to be more clear. Also fixed bug where 1245 servers would be added to the list unjustly. 1246 1247 Sleep was only done on problem links, not after each request. 1248 1249 Also added checks for already checked links while scanning through 1250 the document, and only add those links not checked to the queue. 1251 1252 Add percentage problem links for each individual server. 1253 1254Mon Apr 29 08:43:12 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1255 1256 * checkbot: Deal with unknown or non-determinable server types. 1257 1258 Only add links to the external queue when we know we can check 1259 their protocol. 1260 1261 Additional changes to layout and content of pages. 1262 1263Sun Apr 28 21:16:51 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1264 1265 * checkbot: Rewrote report page. 1266 1267Wed Apr 24 22:39:43 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1268 1269 * checkbot: Added a number of patches by Tim MacKenzie 1270 Added -s option to set the seconds of sleep between requests. 1271 Remove work files when *not* debugging. 1272 Only compile -m and -x regular expressions once. 1273 Also check external ftp and nntp links (using HEAD only). 1274 Get rid of huge memory leak! (Also noted by Fabrice Gaillard) 1275 1276Fri Mar 29 10:58:24 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1277 1278 * checkbot: 1279 Got rid of warnings about some variables. 1280 Fixed problem with incorrect automatic -m argument when scanning 1281 local files. 1282 1283Sun Mar 24 18:01:05 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1284 1285 * checkbot: 1286 Added code to support regular expressions with the -m and -x 1287 arguments. Thanks to Thomas Thiel for the patch and suggestions. 1288 1289 No strict checking on schemes, fixes problem with unknown schemes 1290 stopping checkbot. Thanks to Pierre-Yves Foucou. 1291 1292 * checkbot: 1293 Should create direcory for temporary files, and remove it 1294 afterwards. Noted by Steve Fisk. 1295 1296Sat Mar 16 13:40:48 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1297 1298 * checkbot: 1299 Made a number of changes from or based on patches by Thomas Thiel: 1300 1301 Added missing t option in Getopts string. 1302 1303 Made -m argument optional. If not given, the -u argument is also 1304 used as the start argument. 1305 1306 Temporary files are now created in a separate directory. Its name 1307 contains the PID of Checkbot, to allow several concurrent 1308 Checkbots being run. Also remove temporary files, unless 1309 debugging. 1310 1311 Implement file:// scheme to allow direct checking (without HTTP 1312 server) 1313 1314Fri Mar 15 11:06:13 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1315 1316 * checkbot: 1317 Fixed warnings (and in the process, a small bug as well). 1318 Added URL and proper name to help. 1319 1320Sat Mar 2 11:51:45 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1321 1322 * checkbot: 1323 Added 'require 5.002' (because libwww-perl5b8 requires it). 1324 Added 'use strict', and fixed problems resulting from this. This 1325 can be seen as a first step towards fixing the huge 1326 memory-consumption. 1327 Updated help. 1328 1329Tue Feb 27 09:57:57 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1330 1331 * checkbot: 1332 Fixed bug which occured when -x option was not present. 1333 Updated script to use libwww-perl5b8 function names. This is not 1334 backward compatible with versions prior to beta 8. 1335 1336Mon Feb 26 12:46:08 1996 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1337 1338 * checkbot: 1339 Fixed bug with Referer header for external URL's. 1340 Also make server pages auto-refresh. 1341 1342Sat Feb 24 11:48:15 1996 Hans de Graaff <Hans.deGraaff@twi72.twi.tudelft.nl> 1343 1344 * TODO: New file. 1345 1346 * checkbot: Added single -x option as an additional exclude pattern. 1347 This overrules the -m match attribute. 1348 1349Mon Dec 11 14:13:30 1995 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1350 1351 * index.html 1352 Added libwww-perl5 address, and added a usage section. 1353 1354 * checkbot.pl 1355 Removed this old perl4 version. 1356 1357Fri Dec 8 13:41:43 1995 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1358 1359 * checkbot: 1360 Major rewrite of most of the internal routines. The routines are 1361 much more structured now, and broken up into smaller routines. 1362 I also changed the way checked links are remembered. It should be 1363 much less efficient, CPU-wise, but more efficient memory-wise. 1364 1365Fri Nov 24 16:45:18 1995 Hans de Graaff <J.J.deGraaff@twi.tudelft.nl> 1366 1367 * checkbot: 1368 Fixed small problems, mostly with output. 1369 Fixed checking of external links 1370 Changed sorting order 1371 1372 * checkbot: 1373 Perl5 version now works for the most part. Although Checkbot isn't 1374 fully finished I at least feel confident to release it. 1375 1376Fri Aug 25 11:23:36 1995 Hans de Graaff <graaff@is.twi.tudelft.nl> 1377 1378 * Made a start with the perl5 version of checkbot. The modules in 1379 perl5 (e.g. LWP) look very promising, and should make checkbot 1380 quite a bit better. 1381