1Changelog for HTML-Tree 2 35.07 2017-08-31 4 Release by Kent Fredric 5 [FIXES] 6 * Workaround more @INC issues with Module::Build and sudo RT#122199 7 85.06 2017-04-28 9 Release by Kent Fredric 10 11 * Revert XML escaping changes from 5.04 due to large numbers of 12 broken dependents 13 - RT#121310 https://rt.cpan.org/Ticket/Display.html?id=121310 14 - https://github.com/rjbs/MasonX-Resolver-WidgetFactory/issues/1 15 - https://github.com/kentfredric/HTML-Tree/issues/1 16 175.05 2017-04-26 18 19 [FIXES] 20 * Revert Dist::Zilla Removal 21 - https://github.com/jfearn/HTML-Tree/issues/7 22 - Vendor note: It should be simpler to compare 5.03 and 5.05 23 than to compare 5.04 and 5.05, or 5.03 and 5.04. 24 - Fixes RT#12230: Undeclared dep on Test::Fatal 25 - https://rt.cpan.org/Ticket/Display.html?id=121230 26 * Proper fix for '.' in @INC 27 - https://rt.cpan.org/Ticket/Display.html?id=120521 28 295.04 2017-04-17 30 Release by Jeff Fearn 31 32 [FIXES] 33 * Remove Distzilla to fix RT #120521 #89820 34 * Add POD to htmltree RT #116367 35 * Speed up is_inside method RT #113415 36 - From Todd Rinaldo https://github.com/madsen/HTML-Tree/pull/5 37 * Fix extra spaces being added to comments RT #94311 38 - From Tomaz Solc 39 * Don't needlessly escape characters in element content RT #93431 40 - From Tomaz Solc 41 425.03 2012-09-22 43 Release by Christopher J. Madsen 44 45 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 46 * as_HTML no longer indents <textarea> (Tomohiro Hosaka) (RT #70385) 47 48 [FIXES] 49 * as_trimmed_text did not accept '0' for extra_chars 50 51 [DOCUMENTATION] 52 * Explain that as_text never adds whitespace (RT #66498) 53 * Explain what extra_chars can contain for as_trimmed_text. 54 55 565.02 2012-06-27 57 Release by Christopher J. Madsen 58 59 [TESTS] 60 * Do not attempt to check result of $! in construct_tree.t 61 (The fix in 5.01 was not successful.) 62 63 645.01 2012-06-20 65 Release by Christopher J. Madsen 66 67 [TESTS] 68 * Force C locale in construct_tree.t (in non-English locales, 69 $! will produce messages in a different language) (RT #77823) 70 * Add test for preserving whitespace while parsing. 71 72 735.00 2012-06-12 74 Release by Christopher J. Madsen 75 76 There are only some minor documentation changes since 4.903. 77 This is a summary of the most significant changes since 4.2. 78 79 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 80 * Use weak references to avoid memory leaks 81 See "Weak References" in HTML::Element for details. 82 * new_from_file now dies if the file cannot be opened. $! records 83 the specific problem. (Previously, you got a tree with a few 84 implicit elements.) 85 * Some methods normally returning a scalar could return the empty 86 list in certain circumstances. This has been corrected. The 87 affected methods are: address, deobjectify_text, detach, is_inside, 88 & pindex. 89 * deprecate the Version sub/method. Use the VERSION method instead. 90 91 [ENHANCEMENTS] 92 * add new_from_url (Using LWP) (David Pottage) (RT #68097) 93 94 [DOCUMENTATION] 95 * Explain that parse_file (and new_from_file) opens files in binary mode 96 BUT THIS IS PLANNED TO CHANGE. 97 98 994.903 2012-06-08 100 Trial Release by Christopher J. Madsen 101 102 [DOCUMENTATION] 103 * Explain that parse_file (and new_from_file) opens files in binary mode 104 BUT THIS IS PLANNED TO CHANGE. 105 106 [TESTS] 107 * test error handling for new_from_file & new_from_url 108 * remove use_ok from most tests 109 (if the module won't load, the tests can't pass anyway) 110 111 1124.902 2012-06-06 113 Trial Release by Christopher J. Madsen 114 115 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 116 * new_from_url now dies if the request fails or the response is not HTML 117 118 1194.901 2012-06-06 120 Trial Release by Christopher J. Madsen 121 122 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 123 * new_from_file now dies if the file cannot be opened. $! records 124 the specific problem. (Previously, you got a tree with a few 125 implicit elements.) 126 * Some methods normally returning a scalar could return the empty 127 list in certain circumstances. This has been corrected. The 128 affected methods are: address, deobjectify_text, detach, is_inside, 129 & pindex. 130 131 [FIXES] 132 * new_from_url did not call eof after parsing 133 134 [DOCUMENTATION] 135 * Improve SEE ALSO for HTML::TreeBuilder 136 * General documentation cleanup 137 138 1394.900 2012-06-01 140 Trial Release by Christopher J. Madsen 141 142 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 143 * Use weak references to avoid memory leaks 144 See "Weak References" in HTML::Element for details. 145 * deprecate the Version sub/method. Use the VERSION method instead. 146 147 [ENHANCEMENTS] 148 * add new_from_url (Using LWP) (David Pottage) (RT #68097) 149 150 1514.2 2011-04-06 152 Release by Jeff Fearn 153 154 [FIXES] 155 * Tied all $VERSION to HTML::Element to ensure latest package is used 156 for all modules. RT #66110 157 * Moved perlcritic tests to xt/author 158 [DOCUMENTATION] 159 * Added text and link to "Perl and LWP" book. 160 * Fix Authors is all PM files. 161 162 1634.1 2010-10-25 164 Release by Jeff Fearn 165 166 [FIXES] 167 * '/' is a valid attribute (pull from tokuhirom) (RT #61809) 168 * Change check fo subclasses in as_HTML. (RT #61673) 169 * Fix ProhibitThreeArgumentOpen being triggered. (RT #61857) 170 171 1724.0 2010-09-20 173 Release by Jeff Fearn 174 175 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 176 * Changes to entity encoding from ord values to XML entities may break 177 tests expecting � style encoding. 178 * Attribute names are now validated in as_XML and invalid names will 179 cause an error. 180 * HTML-Tree now requires at least Perl 5.8.0 181 182 [FIXES] 183 * Optionally empty tags with content now have close tag. (RT #49932 #41806) 184 * Added attribute name validation. (RT #23439) 185 * Added span to @TAGS in AsSubs. (RT #55848) 186 * Changed tag encoding to human readable form, e.g. >, and stopped 187 re-encoding encoded tags (RT #55835) 188 * Added no_expand_entities option to disable entity decoding when 189 parsing source. (RT #24947) 190 * Fix replace_with not setting parent for an array of content. 191 (RT #28204 #45495) 192 * Removed newline being appended to as_HTML output. (RT #41739) 193 * Fix invalid parent for subsclasses. (RT #36247) 194 * Fixed #! line in tests (RT #41945) 195 * Switched to Module::Build 196 * Fixed Perl::Critic errors 197 * Added lots of use strict and use warnings 198 * Fix PERL_UNICODE breaking tests. (RT #28404) 199 * Add check for class type to traverse. (RT #35948) 200 * Move attribute name validation to as_XML. (RT #60619) 201 * Fix critic test exploding if Test::Perl::Critic isn't installed. 202 * Fix annoying message about x.yy_z not being numeric in t/building.t 203 * Added extra_chars options to as_trimmed_text (RT #26436) 204 * Added catch for broken table tags (RT #59980) 205 * Replace parentheses for constants. (RT #58880) 206 * Removed build deps Devel::Cover, Test::Pod::Coverage, Test::Perl::Critic. 207 (RT #58878) 208 * Added create_makefile_pl => 'traditional' to Build.PL (RT #58878) 209 210 [ENHANCEMENTS] 211 * (Ricardo Signes RT #26282) The secret hack to allow elements to be created 212 from classes other than HTML::Element has been cleaned up and documented 213 for the benefit of TreeBuilder subclasses. 214 q.v., HTML::TreeBuilder->element_class 215 * Added HTML::Element::encoded_content to control encoding of entities on 216 output. 217 218 [TESTS] 219 * Added test for optionally empty tags, like A. 220 * Added test for invalid attribute name. 221 * Added more tests for entity parsing. 222 * Add parent test from Christopher J. Madsen. (RT #28204) 223 * Add subclass test. (RT #36247) 224 225 [DOCUMENTATION] 226 * Docs spelling patch from Ansgar Burchardt <ansgar@43-1.org> (RT #55836) 227 * Added definition of white space to as_trimmed_text. (RT #26436) 228 229 2303.23 2006-11-12 231 Release by Pete Krawczyk <petek@cpan.org> 232 233 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 234 * Mark-Jason Dominus points out that the fix for as_html was not 235 proper, and broken behavior should never be codified. Fixed 236 as_html so an empty string doesn't encode entites, instead of 237 blaming the behavior on HTML::Entities. (RT 18571) 238 239 2403.22 2006-11-11 241 Release by Pete Krawczyk <petek@cpan.org> 242 243 [THINGS THAT MAY BREAK YOUR CODE OR TESTS] 244 * HTML::Element::as_XML now only escapes five characters, instead 245 of escaping everything but alphanumerics and spaces. This is 246 more in line with the XML spec, and will no longer escape wide 247 characters as two (or more) entities. Resolves RT 14260. Thanks 248 to Carl Franks and somewhere [at] confuzzled.lu for assistance. 249 250 [FIXES] 251 * A string comparison was commented to use lc() on both sides, but 252 didn't. This caused HTML::Element::look_down to not properly find 253 elements in certain cases. Thanks to Andrew Suhachov. (RT 21114) 254 255 [TESTS] 256 * Added several new tests and enhanced others. Thanks to Rocco 257 Caputo for t/attributes.t, and several others for providing 258 test cases in their RT bugs. 259 260 [DOCUMENTATION] 261 * Fixed description of HTML::Element::all_attr_names. Thanks 262 to dsteinbrunner [at] pobox.com for catching it. 263 * Fixed example code in HTML::Element::push_content. Thanks 264 to dsteinbrunner [at] pobox.com for catching it. (RT 21293) 265 * Fixed description of HTML::Element::as_HTML. Thanks to 266 Mark-Jason Dominus for catching it. (RT 18569) 267 268 2693.21 2006-08-06 270 Release by Pete Krawczyk <petek@cpan.org> 271 272 [FIXES] 273 * Updated HTML::Parser requirement to 3.46 to fix a bug in 274 tag-rendering.t, noted in RT 20816 and 19796. Thanks to 275 Gordon Lack and Ricardo Signes 276 * Fixed HTML::TreeBuilder to not remove where it shouldn't, 277 using patch supplied in RT 17481. Thanks to Chris Madsen. 278 279 [DOCUMENTATION] 280 * HTML-Tree has a new maintainer: Pete Krawczyk <petek@cpan.org> 281 282 2833.20 2006-06-04 284 Release by Andy Lester 285 286 No code changes. Just making sure all notes go to Andy Lester, 287 not Sean Burke. 288 289 2903.19_04 2006-02-01 291 Trial Release by Andy Lester 292 293 [FIXES] 294 * Modified starttag() so that it could render a literal HTML::Element 295 correctly. Added a test case for this in tag-rendering.t 296 Thanks to Terrence Brannon. 297 298 2993.19_03 2005-11-25 300 Trial Release by Andy Lester 301 302 [THINGS THAT MAY BREAK YOUR CODE] 303 * The store_declarations() method has been restored, but defaults 304 to true instead of false. 305 306 3073.19_02 2005-11-24 308 Trial Release by Andy Lester 309 310 [THINGS THAT MAY BREAK YOUR CODE] 311 * The store_declarations() method has been removed. 312 * Non-closing HTML tags like <IMG> are now rendered as <IMG />. 313 * All values in tags are now double-quoted. Previously, all-numeric 314 values weren't quoted. 315 316 [FIXES] 317 * The DOCTYPE declaration now always gets put back at the top of 318 the recreated document. Thanks, Terrence Brannon. 319 * Non-closing HTML tags like <IMG> are now rendered as <IMG />. 320 Thanks to Ian Malpass. 321 * All values in tags are now double-quoted. 322 323 [DOCUMENTATION] 324 * Updated docs from Terrence Brannon. 325 326 3273.19_01 2005-11-09 328 Trial Release by Andy Lester 329 330 -- No new functionality. New tests, though! 331 Thanks to the Chicago Perl Mongers for their work. 332 333 3343.18 2003-09-15 335 Release by Sean M. Burke <sburke@cpan.org> 336 337 -- bugfix to test, adding qr// to look_(down|up) 338 339 Accepting Iain 'Spoon' Truskett's neat patch for qr// as lookdown 340 operators (previously you had to do sub { $_[0]=~ m/.../}). 341 342 Rewrote some tests, notably parsefile.t, which was pointlessly 343 failing because of an incompatibility with an HTML::Parser version. 344 345 Removed the disused ancient utils "dtd2pm" and "ent" from the dist. 346 347 Added TODO file. 348 349 3503.17 2003-01-18 351 Release by Sean M. Burke <sburke@cpan.org> 352 353 -- minor bugfix 354 355 HTML::Element : Making as_HTML('',...) work just like 356 as_HTML(undef,...). Also fixing as_XML's docs to remove mention of 357 an unimplemented feature (specifying what characters to escape). 358 359 3603.16 2002-11-06 361 Release by Sean M. Burke <sburke@cpan.org> 362 363 -- just fixing a doc typo. 364 365 3663.15 2002-11-06 367 Release by Sean M. Burke <sburke@cpan.org> 368 369 -- a few new features. 370 371 Added the aliases "descendents" and "find" to HTML::Element. 372 373 Added a new method "simplify_pres" to HTML::Element. 374 375 3763.14 2002-10-19 377 Release by Sean M. Burke <sburke@cpan.org> 378 379 -- minor bugfix 380 381 Just fixes a few problems in HTML::Element with the number_lists 382 method. 383 384 3853.13 2002-08-16 386 Release by Sean M. Burke <sburke@cpan.org> 387 388 -- basically a bugfix version 389 390 It turns out that 3.12 had a hideous HTML::TreeBuilder bug that 391 made the whole thing damn near useless. Fixed. 392 Many many thanks to Michael Koehne for catching this! 393 394 Wrote t/parse.t, to catch this sort of thing from happening again. 395 396 Fixed a bug that would treat <td> outside any table context 397 as <tr><table><td> instead of <table><tr><td> 398 399 4003.12 2002-07-30 401 Release by Sean M. Burke <sburke@cpan.org> 402 403 Added as_trimmed_text method to HTML::Element, as described 404 (prophesied?) in the fantabulous new book /Perl & LWP/. 405 406 Bugfix: fixed unshift_content when given a LoL. (_parent wasn't 407 getting set right.) 408 409 HTML::Element and HTML::TreeBuilder now enforce at least some 410 minimal sanity on what can be in a tag name. (Notably, no spaces, 411 slashes, or control characters.) 412 413 Semi-bugfix: $element->replace_with(...) can now take LoLs in its 414 replacement list. 415 416 Bumped HTML::Element version up to 3.12 (right from 3.09) 417 418 Semi-bugfix: as_XML now doesn't use named entities in its return 419 value -- it always uses numeric entities. 420 421 Added behavior: new_frow_lol can now do clever things in list 422 context. 423 424 HTML::Tree -- added blurb for /Perl & LWP/ 425 426 HTML::TreeBuilder -- added blurb for /Perl & LWP/ 427 Also added a few tweaks to do better with XHTML parsing. 428 Added guts() and disembowel() methods, for parsing document fragments. 429 430 TODO: desperately need to add tests to t/ 431 432 4333.11 2001-03-14 434 Release by Sean M. Burke <sburke@cpan.org> 435 436 Bugfix: Klaus-Georg Adams <Klaus-Georg.Adams@sap.com> reported that 437 the topmost frameset element in an HTML::TreeBuilder tree wasn't 438 getting its _parent attribute set. Fixed. 439 440 Minor bugfix: the root element of a new HTML::TreeBuilder tree was 441 missing its initial "_implicit" attribute. Fixed. 442 443 Two handy new methods in HTML::TreeBuilder: 444 * HTML::TreeBuilder->new_from_content(...) 445 * HTML::TreeBuilder->new_from_file($filename) 446 a.k.a.: HTML::TreeBuilder->new_from_file($fh) 447 448 4493.10 2001-03-10 450 Release by Sean M. Burke <sburke@cpan.org> 451 452 Now bundling three relevent The Perl Journal articles by me: 453 HTML::Tree::AboutObjects, HTML::Tree::AboutTrees, and 454 HTML::Tree::Scanning. 455 456 Vadims_Beilins@swh-t.lv observes that $h->push_content(LoL) 457 doesn't take care of _parent bookkeeping right. FIXED. 458 John Woffindin <john@xoren.co.nz> notes a similar bug in clone(); 459 FIXED. 460 461 Adding no_space_compacting feature to TreeBuilder, at suggestion of 462 Victor Wagner <vitus@ice.ru>. 463 464 Incorporating the clever suggestion (from Martin H. Sluka, 465 <martin@sluka.de>) that $element->extract_links's returned LoL 466 should contain a third item (for the attribute name) in the 467 per-link listref. I also add a fourth item, the tagname of the 468 element. 469 470 New method, "elementify", in HTML::TreeBuilder. 471 472 Various improvements and clarifications to the POD in 473 HTML::TreeBuilder and HTML::Element. 474 475 Some new methods in HTML::Element: "number_lists", 476 "objectify_text", and "deobjectify_text". 477 478 HTML::Element and HTML::TreeBuilder versions both bumped up from 479 3.08 to 3.10, to keep pace with the HTML::Tree version. 480 481 4823.09 2001-01-21 483 Release by Sean M. Burke <sburke@cpan.org> 484 485 Changed HTML/Element/traverse.pod to HTML/Element/traverse.pm 486 487 Wrote overview file: HTML/Tree.pm 488 489 4903.08 2000-11-03 491 Release by Sean M. Burke <sburke@cpan.org> 492 493 In Element and TreeBuilder: fixed handling of textarea content -- 494 Thanks to Ronald J Kimball <rjk@linguist.dartmouth.edu> for 495 catching this. 496 497 In Element: a few internal changes to make it subclassable by the 498 forthcoming XML::Element et al. 499 500 5013.07 2000-10-20 502 Release by Sean M. Burke <sburke@cpan.org> 503 504 In Element: made new_from_lol accept existing HTML::Element objects 505 as part of the loltree. Thanks for Bob Glickstein 506 <bobg@zanshin.com> for the suggestion. 507 508 In Element: feeding an arrayref to push_content, unshift_content, 509 or splice_content now implicitly calls new_from_lol. 510 511 In Element: reversed the change in as_HTML/XML/Lisp_form that would 512 skip dumping attributes with references for values. It reacted 513 undesirably with objects that overload stringify; to wit, URI.pm 514 objects. 515 516 5173.06 2000-10-15 518 Release by Sean M. Burke <sburke@cpan.org> 519 520 In Element: methods added: $x->id, $x->idf, $x->as_XML, 521 $x->as_Lisp_form 522 523 In Element: internal optimization: as_HTML no longer uses the 524 tag() accessor. Should cause no change in behavior. 525 526 In Element: as_HTML (via starttag) no longer tries to dump 527 attributes whose values are references, or whose names 528 are null-string or "/". This should cause no change in 529 behavior, as there's no normal way for any document to parse 530 to a tree containing any such attributes. 531 532 In Element: minor rewordings or typo-fixes in the POD. 533 534 5353.05 2000-10-02 536 Release by Sean M. Burke <sburke@cpan.org> 537 538 In Element: fixed typo in docs for the content_refs_list method. 539 Had: 540 foreach my $item ($h->content_array_ref) { 541 Corrected to: 542 foreach my $item (@{ $h->content_array_ref }) { 543 544 In Element: fixed bug in $h->left that made it useless in scalar 545 context. Thanks to Toby Thurston <toby@wildfire.dircon.co.uk> for 546 spotting this. 547 548 In Element: added new method $h->tagname_map 549 550 In TreeBuilder: Some minor corrections to the logic of handling TD 551 and TH elements -- basically bug fixes, in response to an astute 552 bug report from Toby Thurston <toby@wildfire.dircon.co.uk>. 553 554 In TreeBuilder: Fixed lame bug that made strict-p mode nearly 555 useless. It may now approach usability! 556 557 This dist contains a simple utility called "htmltree" that parses 558 given HTML documents, and dumps their parse tree. (It's not 559 actually new in this version, but was never mentioned before.) 560 561 In TreeBuilder, a change of interest only to advanced programmers 562 familiar with TreeBuilder's source and perpetually undocumented 563 features: there is no $HTML::TreeBuilder::Debug anymore. 564 565 If you want to throw TreeBuilder into Debug mode, you have to do it 566 at compile time -- by having a line like this BEFORE any line that 567 says "use HTML::TreeBuilder": 568 569 sub HTML::TreeBuilder::DEBUG () {3}; 570 571 where "5" is whatever debug level (0 for no debug output) that you 572 want TreeBuilder to be in. All the in TreeBuilder that used to say 573 574 print "...stuff..." if $Debug > 1; 575 576 now say 577 578 print "...stuff..." if DEBUG > 1; 579 580 where DEBUG is the constant-sub whose default value set at compile 581 time is 0. The point of this is that for the typical 582 compilation-instance of TreeBuilder will run with DEBUG = 0, and 583 having that set at compile time means that all the "print ... if 584 DEBUG" can be optimized away at compile time, so they don't appear 585 in the code tree for TreeBuilder. This leads to a typical ~10% 586 speedup in TreeBuilder code, since it's no longer having to 587 constantly interrogate $Debug. 588 589 Note that if you really do NEED the debug level to vary at runtime, 590 say: 591 sub HTML::TreeBuilder::DEBUG () { $HTML::TreeBuilder::DEBUG }; 592 and then change that variable's value as need be. Do this only if 593 necessary, tho. 594 595 BTW, useful line to have in your ~/.cshrc: 596 alias deparse 'perl -MO=Deparse \!*' 597 I found it useful for deparsing TreeBuilder.pm to make sure that 598 the DEBUG-conditional statements really were optimized away 599 as I intended. 600 601 6023.04 2000-09-04 603 Release by Sean M. Burke <sburke@cpan.org> 604 605 In TreeBuilder: added p_strict, an option to somewhat change 606 behavior of implicating "</p>"s. 607 Added store_comments, store_declarations, store_pis, to control 608 treatment of comments, declarations, and PIs when parsing. 609 610 In Element: documented the pseudo-elements (~comment, ~declaration, 611 ~pi, and ~literal). Corrected as_HTML dumping of ~pi elements. 612 613 Removed formfeeds from source of Element and TreeBuilder -- 614 different editors (and Perl) treat them differently as far as 615 incrementing the line counter; so Perl might report an error on 616 line 314, but preceding formfeeds might make your editor think that 617 that line is actually 316 or something, resulting in confusion all 618 around. Ahwell. 619 620 6213.03 2000-08-26 622 Release by Sean M. Burke <sburke@cpan.org> 623 624 Introduced an optimization in TreeBuilder's logic for checking that 625 body-worthy elements are actually inserted under body. Should 626 speed things up a bit -- it saves two method calls per typical 627 start-tag. Hopefully no change in behavior. 628 629 Whoops -- 3.01's change in the return values of TreeBuilder's 630 (internal) end(...) method ended up breaking the processing of list 631 elements. Fixed. Thanks to Claus Schotten for spotting this. 632 633 Whoops 2 -- Margarit A. Nickolov spotted that TreeBuilder 634 documented a implicit_body_p_tag method, but the module didn't 635 define it. I must have deleted it some time or other. Restored. 636 637 6383.02 2000-08-20 639 Release by Sean M. Burke <sburke@cpan.org> 640 641 Fixed a silly typo in Element that made delete_ignorable_whitespace 642 useless. 643 644 Made Element's $tree->dump take an optional output-filehandle 645 argument. 646 647 Added (restored?) "use integer" to TreeBuilder. 648 649 6503.01 2000-08-20 651 Release by Sean M. Burke <sburke@cpan.org> 652 653 Now depends on HTML::Tagset for data tables of HTML elements and 654 their characteristics. 655 656 Version numbers for HTML::TreeBuilder and HTML::Element, as well as 657 for the package, moved forward to 3.01. 658 659 Minor changes to HTML::TreeBuilder's docs. 660 661 HTML::TreeBuilder now knows not to amp-decode text children of 662 CDATA-parent elements. Also exceptionally stores comments under 663 CDATA-parent elements. 664 665 TreeBuilder should now correctly parse documents with frameset 666 elements. Tricky bunch of hacks. 667 668 TreeBuilder now ignores those pointless "x-html" tags that a 669 certain standards-flouting monopolistic American software/OS 670 company's mailer wraps its HTML in. 671 672 Introduced "tweaks" in HTML::TreeBuilder -- an experimental 673 (and quite undocumented) feature to allow specifying callbacks 674 to be called when specific elements are closed; makes possible 675 rendering (or otherwise scanning and/or manipulating) documents 676 as they are being parsed. Inspired by Michel Rodriguez's clever 677 XML::Twig module. Until I document this, email me if you're 678 interested. 679 680 HTML::Element's as_HTML now knows not to amp-escape children of 681 CDATA-parent elements. Thanks to folks who kept reminding me about this. 682 683 HTML::Element's as_HTML can now take an optional parameter 684 specifying which non-empty elements will get end-tags omitted. 685 686 HTML::Element's traverse's docs moved into separate POD, 687 HTML::Element::traverse. 688 689 Added HTML::Element methods all_attr_names and 690 all_external_attr_names. Fixed bug in all_external_attr. 691 692 Added HTML::Element method delete_ignorable_whitespace. 693 (Actually just moved from HTML::TreeBuilder, where it was 694 undocumented, and called tighten_up.) 695 696 Adding a bit of sanity checking to Element's look_down, look_up. 697 698 Added some formfeeds to the source of Element and TreeBuilder, 699 to make hardcopy a bit more readable. 700 701 7020.68 2000-06-28 703 Release by Sean M. Burke <sburke@cpan.org> 704 705 Fixed doc typo for HTML::Element's lineage_tag_names method. 706 707 Fixed lame bug in HTML::Element's all_external_attr that made it 708 quite useless. Thanks to Rich Wales <richw@webcom.com> for the bug 709 report and patch. 710 711 Changed as_text to no longer DEcode entities, as it formerly did, 712 and was documented to. Since entities are already decoded by time 713 text is stored in the tree, another decoding step is wrong. Neither 714 me nor Gisle Aas can remember what that was doing there in the 715 first place. 716 717 Changed as_text to not traverse under 'style' and 'script' 718 elements. Rewrote as_text's traverser to be iterative. 719 720 Added a bit of text to HTML::AsSubs to recommend using XML::Generator. 721 722 7230.67 2000-06-12 724 Release by Sean M. Burke <sburke@cpan.org> 725 726 Just changes to HTML::Element... 727 728 Introduced look_up and look_down. Thanks to the folks on the 729 libwww list for helping me find the right form for that idea. 730 Deprecated find_by_attribute 731 732 Doc typo fixed: at one point in the discussion of "consolidating 733 text", I said push_content('Skronk') when I meant 734 unshift_content('Skronk'). Thanks to Richard Y. Kim (ryk@coho.net) 735 for pointing this out. 736 737 Added left() and right() methods. 738 739 Made address([address]) accept relative addresses (".3.0.1") 740 741 Added content_array_ref and content_refs_list. 742 743 Added a bit more clarification to bits of the Element docs here and there. 744 745 Made find_by_tag_name work iteratively now, for speed. 746 747 7480.66 2000-05-18 749 Release by Sean M. Burke <sburke@cpan.org> 750 751 Noting my new email address. 752 753 Fixed bug in HTML::Element::detach_content -- it would return 754 empty-list, instead of returing the nodes detached. 755 756 Fixed big in HTML::Element::replace_with_content -- it would 757 accidentally completely kill the parent's content list! 758 Thanks to Reinier Post and others for spotting this error. 759 760 Fixed big in HTML::Element::replace_with -- it put replacers 761 in the content list of of the new parent, !but! forgot to update 762 each replacer's _parent attribute. 763 Thanks to Matt Sisk for spotting this error. 764 765 7660.65 2000-03-26 767 Release by Sean M. Burke <sburke@netadventure.net> 768 769 Important additions to HTML::Element : 770 771 Totally reimplemented the traverse() method, and added features, 772 now providing a somewhat-new interface. It's still 773 backwards-compatible both syntactically and semantically. 774 775 Added methods: content_list, detach_content, replace_linkage, 776 normalize_content, preinsert, postinsert, and has_insane_linkage. 777 778 $h->attr('foo', undef) now actually deletes the attribute 779 'foo' from $h, instead of setting it to undef. Hopefully 780 this won't break any existing code! 781 782 Rearranged the order of some sections in the Element docs 783 for purely pedagogical reasons. 784 785 Bugfix: $tree->clone failed to delete the internal 786 _head and _body attributes of the clone (used by TreeBuilder), 787 $tree->clone->delete ended up deleting most/all of the original! 788 Fixed. Added cavets to the docs warning against cloning 789 TreeBuilder objects that are in mid-parse (not that I think most 790 users are exactly rushing to do this). 791 Thanks to Bob Glickstein for finding and reporting this bug. 792 793 Added some regression/sanity tests in t/ 794 795 A bit more sanity checking in TreeBuilder: checks for _head and 796 _body before including it. 797 798 Modded TreeBuilder's calls to traverse() to be use new [sub{...},0] 799 calling syntax, for sake of efficiency. 800 801 Added some undocumented and experimental code in Element and 802 TreeBuilder for using HTML::Element objects to represent 803 comments, PIs, declarations, and "literals". 804 805 8060.64 2000-03-08 807 Release by Sean M. Burke <sburke@netadventure.net> 808 809 Bugfix: $element->replace_with_content() would cause 810 a fatal error if any of $element's content nodes were 811 text segments. Fixed. 812 813 8140.63 2000-03-08 815 Release by Sean M. Burke <sburke@netadventure.net> 816 817 Fixed a typo in the SYNOPSIS of TreeBuilder.pm: I had "->destroy" for 818 "->delete"! 819 820 Added $element->clone and HTML::Element->clone_list(nodes) methods, 821 as Marek Rouchal very helpfully suggested. 822 823 $tree->as_HTML can now indent, hopefully properly. The logic to do 824 so is pretty frightening, and regrettably doesn't wrap, and it's 825 not obvious how to make it capable of doing so. 826 827 $tree->as_text can now take a 'skip_dels' parameter. 828 829 Added $h->same_as($j) method. 830 831 Added $h->all_attr method. 832 833 Added $h->new_from_lol constructor method. 834 835 8360.62 1999-12-18 837 Release by Sean M. Burke <sburke@netadventure.net> 838 839 Incremented HTML::AsSubs version to 1.13, and HTML::Parse version 840 to 2.7, to avoid version confusion with the old (<0.60) HTML-Tree 841 dist. 842 843 Re-simplified the options to HTML::Element::traverse, removing the 844 verbose_for_text option. (The behavior that it turned on, is now 845 always on; this should not cause any problems with any existing 846 code.) 847 848 Fixed HTML::Element::delete_content, and made an 849 HTML::TreeBuilder::delete to override it for TreeBuilder nodes, 850 which have their own special attributes. 851 852 HTML::Element::find_by_attribute, find_by_attribute, and get_attr_i 853 now behave differently in scalar context, if you're the sort that 854 likes context on method calls. HTML::Element::descendant is now 855 optimized in scalar context. 856 857 Fixed up some of the reporting of lineages in some $Debug-triggered 858 messages. 859 860 Fixed minor bug in updating pos when a text node under HTML 861 implicates BODY (and maybe P). 862 863 You should not use release 0.61 864 865 8660.61 1999-12-15 867 Release by Sean M. Burke <sburke@netadventure.net> 868 869 Versions in this dist: 870 HTML::Parse: 2.6 871 HTML::TreeBuilder: 2.91 872 HTML::Element: 1.44 873 HTML::AsSubs: 1.12 874 875 No longer including the Formatter modules. 876 877 Lots of new methods and changes in HTML::Element; reorganized docs. 878 879 Added new HTML tags to HTML::Element's and HTML::TreeBuilder's 880 internal tables. 881 882 Reworked the logic in HTML::TreeBuilder. Previous versions dealt 883 badly with tables, and attempts to enforce content-model rules 884 occasionally went quite awry. This new version is much less 885 agressive about content-model rules, and works on the principle 886 that if the HTML source is cock-eyed, there's limits to what can be 887 done to keep the syntax tree from being cock-eyed. 888 889 HTML::TreeBuilder now also tries to ignore ignorable whitespace. 890 The resulting parse trees often have half (or fewer) the number of 891 nodes, without all the ignorable " " nodes like before. 892 893 8940.53 1999-12-15 895 Release by Gisle Aas <gisle@aas.no> 896 897 Make it compatible with HTML-Parser-3.00 898 899 9000.52 1999-11-10 901 Release by Gisle Aas <gisle@aas.no> 902 903 Fix SYNOPSIS for HTML::FormatText as suggested by 904 Michael G Schwern <schwern@pobox.com> 905 906 Updated my email address. 907 908 9090.51 1998-07-07 910 Release by Gisle Aas <aas@sn.no> 911 912 Avoid new warnings introduced by perl5.004_70 913 914 9150.50 1998-04-01 916 Release by Gisle Aas <aas@sn.no> 917 918 the HTML::* modules that dealt with HTML syntax trees 919 was unbundled from libwww-perl-5.22. 920