1<?xml version="1.0" encoding="ISO-8859-1" ?> 2<!DOCTYPE package SYSTEM "http://pear.php.net/dtd/package-1.0"> 3<package version="1.0"> 4 <name>XML_HTMLSax</name> 5 <summary>A SAX based parser for HTML and other badly formed XML documents</summary> 6 <description>XML_HTMLSax is a SAX based XML parser for badly formed XML documents, such as HTML. 7 The original code base was developed by Alexander Zhukov and published at http://sourceforge.net/projects/phpshelve/. Alexander kindly gave permission to modify the code and license for inclusion in PEAR. 8 9 PEAR::XML_HTMLSax provides an API very similar to the native PHP Expat extension, allowing handlers using one to be easily adapted to the other. The key difference is HTMLSax will not break on badly formed XML, allowing it to be used for parsing HTML documents. Otherwise HTMLSax supports all the handlers available from Expat except namespace and external entity handlers. Provides methods for handling XML escapes as well as JSP/ASP opening and close tags. 10 11 Version 2 has had it's internals completely overhauled to use a Lexer, delivering performance *approaching* that of the native XML extension, as well as a radically improved, modular design that makes adding further functionality easy. 12 13 The public API has remained the same as older versions, except for the set_option() method, the available options having been renamed. Additional options are now also available, which allow HTMLSax to behave almost exactly like the native Expat extension. For example if the contents of XML elements contain linefeeds, tabs and XML entities, HTMLSax can be instructed to trigger additional data handler calls. 14 15 A big thanks to Jeff Moore (lead developer of WACT: http://wact.sourceforge.net) who's largely responsible for new design, as well input from other members at Sitepoint's Advanced PHP forums: http://www.sitepointforums.com/showthread.php?threadid=121246. 16 17 Thanks also to Marcus Baker (lead developer of SimpleTest: http://www.lastcraft.com/simple_test.php) for sorting out the unit tests.</description> 18 <maintainers> 19 <maintainer> 20 <user>hfuecks</user> 21 <name>Harry Fuecks</name> 22 <email>hfuecks@phppatterns.com</email> 23 <role>lead</role> 24 </maintainer> 25 </maintainers> 26 <release> 27 <version>2.1.2</version> 28 <date>2003-12-05</date> 29 <license>PHP</license> 30 <state>stable</state> 31 <notes>* Bug fixed (thanks Jeff) where badly formed attributes resulted in infinite loop 32* Added additional boolean argument to open and close handler calls to spot empty tags like br/ - should not break exising APIs 33* Added XML_OPTION_FULL_ESCAPES which (when = 1) passes through the complete content in an XML escape, allowing comment / cdata reconstruction</notes> 34 <deps> 35 <dep type="php" rel="ge" version="4.0.5"/> 36 </deps> 37 <provides type="class" name="XML_HTMLSax_StateParser" /> 38 <provides type="class" name="XML_HTMLSax_StateParser_Lt430" extends="XML_HTMLSax_StateParser" /> 39 <provides type="class" name="XML_HTMLSax_StateParser_Gtet430" extends="XML_HTMLSax_StateParser" /> 40 <provides type="class" name="XML_HTMLSax_NullHandler" /> 41 <provides type="class" name="XML_HTMLSax" extends="Pear" /> 42 <provides type="function" name="XML_HTMLSax_StateParser::unscanCharacter" /> 43 <provides type="function" name="XML_HTMLSax_StateParser::ignoreCharacter" /> 44 <provides type="function" name="XML_HTMLSax_StateParser::scanCharacter" /> 45 <provides type="function" name="XML_HTMLSax_StateParser::scanUntilString" /> 46 <provides type="function" name="XML_HTMLSax_StateParser::scanUntilCharacters" /> 47 <provides type="function" name="XML_HTMLSax_StateParser::ignoreWhitespace" /> 48 <provides type="function" name="XML_HTMLSax_StateParser::parse" /> 49 <provides type="function" name="XML_HTMLSax_StateParser_Lt430::scanUntilCharacters" /> 50 <provides type="function" name="XML_HTMLSax_StateParser_Lt430::ignoreWhitespace" /> 51 <provides type="function" name="XML_HTMLSax_StateParser_Lt430::parse" /> 52 <provides type="function" name="XML_HTMLSax_StateParser_Gtet430::scanUntilCharacters" /> 53 <provides type="function" name="XML_HTMLSax_StateParser_Gtet430::ignoreWhitespace" /> 54 <provides type="function" name="XML_HTMLSax_StateParser_Gtet430::parse" /> 55 <provides type="function" name="XML_HTMLSax_NullHandler::DoNothing" /> 56 <provides type="function" name="XML_HTMLSax::set_object" /> 57 <provides type="function" name="XML_HTMLSax::set_option" /> 58 <provides type="function" name="XML_HTMLSax::set_data_handler" /> 59 <provides type="function" name="XML_HTMLSax::set_element_handler" /> 60 <provides type="function" name="XML_HTMLSax::set_pi_handler" /> 61 <provides type="function" name="XML_HTMLSax::set_escape_handler" /> 62 <provides type="function" name="XML_HTMLSax::set_jasp_handler" /> 63 <provides type="function" name="XML_HTMLSax::get_current_position" /> 64 <provides type="function" name="XML_HTMLSax::get_length" /> 65 <provides type="function" name="XML_HTMLSax::parse" /> 66 <provides type="class" name="XML_HTMLSax_StartingState" /> 67 <provides type="class" name="XML_HTMLSax_TagState" /> 68 <provides type="class" name="XML_HTMLSax_ClosingTagState" /> 69 <provides type="class" name="XML_HTMLSax_OpeningTagState" /> 70 <provides type="class" name="XML_HTMLSax_EscapeState" /> 71 <provides type="class" name="XML_HTMLSax_JaspState" /> 72 <provides type="class" name="XML_HTMLSax_PiState" /> 73 <provides type="function" name="XML_HTMLSax_StartingState::parse" /> 74 <provides type="function" name="XML_HTMLSax_TagState::parse" /> 75 <provides type="function" name="XML_HTMLSax_ClosingTagState::parse" /> 76 <provides type="function" name="XML_HTMLSax_OpeningTagState::parseAttributes" /> 77 <provides type="function" name="XML_HTMLSax_OpeningTagState::parse" /> 78 <provides type="function" name="XML_HTMLSax_EscapeState::parse" /> 79 <provides type="function" name="XML_HTMLSax_JaspState::parse" /> 80 <provides type="function" name="XML_HTMLSax_PiState::parse" /> 81 <provides type="class" name="XML_HTMLSax_Trim" /> 82 <provides type="class" name="XML_HTMLSax_CaseFolding" /> 83 <provides type="class" name="XML_HTMLSax_Linefeed" /> 84 <provides type="class" name="XML_HTMLSax_Tab" /> 85 <provides type="class" name="XML_HTMLSax_Entities_Parsed" /> 86 <provides type="class" name="XML_HTMLSax_Entities_Unparsed" /> 87 <provides type="function" name="XML_HTMLSax_Trim::trimData" /> 88 <provides type="function" name="XML_HTMLSax_CaseFolding::foldOpen" /> 89 <provides type="function" name="XML_HTMLSax_CaseFolding::foldClose" /> 90 <provides type="function" name="XML_HTMLSax_Linefeed::breakData" /> 91 <provides type="function" name="XML_HTMLSax_Tab::breakData" /> 92 <provides type="function" name="XML_HTMLSax_Entities_Parsed::breakData" /> 93 <provides type="function" name="XML_HTMLSax_Entities_Unparsed::breakData" /> 94 <provides type="function" name="html_entity_decode" /> 95 <filelist> 96 <file role="php" baseinstalldir="XML" md5sum="4646f0e3b0b6cb1af1f8d2f0eb558fcc" name="XML_HTMLSax.php"/> 97 <file role="php" baseinstalldir="XML" md5sum="04bd2e034cfa78902c883103549d952b" name="HTMLSax/XML_HTMLSax_States.php"/> 98 <file role="php" baseinstalldir="XML" md5sum="3bf6c70e6e4a3692f0833cdb4e6c077b" name="HTMLSax/XML_HTMLSax_Decorators.php"/> 99 <file role="doc" baseinstalldir="XML" md5sum="fa5e91af821291a1bd3f90ce2c8557a4" name="docs/Readme"/> 100 <file role="doc" baseinstalldir="XML" md5sum="212961f0b0437c92ce65128bf1e33740" name="docs/examples/SimpleExample.php"/> 101 <file role="doc" baseinstalldir="XML" md5sum="ee798189d1ff9b1f614ab4f13c916cc4" name="docs/examples/HTMLtoXHTML.php"/> 102 <file role="doc" baseinstalldir="XML" md5sum="23f122dad1412ef196b880c9b646662c" name="docs/examples/ExpatvsHtmlSax.php"/> 103 <file role="doc" baseinstalldir="XML" md5sum="6d8e0358d7581138624843192f29b1fc" name="docs/examples/example.html"/> 104 <file role="doc" baseinstalldir="XML" md5sum="90aaba50fabb9de12d0b4664f008d2dd" name="docs/tests/index.php"/> 105 <file role="doc" baseinstalldir="XML" md5sum="341998c9086a1196e2e8fcbbe2d0c9f1" name="docs/tests/unit_tests.php"/> 106 <file role="doc" baseinstalldir="XML" md5sum="db45e0a797ffece8914464c8e24b75a3" name="docs/tests/xml_htmlsax_test.php"/> 107 </filelist> 108 </release> 109 <changelog> 110 <release> 111 <version>2.1.1</version> 112 <date>2003-10-08</date> 113 <license>PHP</license> 114 <state>stable</state> 115 <notes>* Reporting of byte index with get_current_position() more accurate on opening tags (thanks to Alexander Orlov at x-code.com) 116* All parser options now available to PHP versions lt 4.3.x, using implementation of html_entity_decode in PHP 117 118</notes> 119 </release> 120 <release> 121 <version>2.1.0</version> 122 <date>2003-09-10</date> 123 <license>PHP</license> 124 <state>stable</state> 125 <notes>* Well (unit) tested with SimpleTest 126 127</notes> 128 </release> 129 <release> 130 <version>2.0.2</version> 131 <date>2003-08-11</date> 132 <license>PHP</license> 133 <state>alpha</state> 134 <notes>* API is backwards compatible apart from the renaming of parser options 135* Performance dramatically increased. Not much slower than Expat 136* Better handling of XML comments and CDATA 137* Option to trigger additional data handler calls for linefeeds and tabs 138* Option to trigger additional data handler calls for XML entities and parse them if required. 139* Added public get_current_position() and get_length() methods 140 141</notes> 142 </release> 143 <release> 144 <version>1.1</version> 145 <date>2003-06-26</date> 146 <license>PHP</license> 147 <state>stable</state> 148 <notes>* Bug fixes to Attribute_Parser to cope with newline, tag, forward slash and whitespace issues. 149</notes> 150 </release> 151 <release> 152 <version>1.0</version> 153 <date>2003-06-08</date> 154 <state>stable</state> 155 <notes>* Modifications to file structure to place Attributes_Parser.php 156 and State_Machine.php in subdirectory HTMLSax 157* XML_HTMLSax.php includes Attributes_Parser.php and State_Machine.php 158 using require_once() 159 160</notes> 161 </release> 162 <release> 163 <version>0.9.0rc2</version> 164 <date>2003-05-18</date> 165 <state>beta</state> 166 <notes>*First release under PEAR 167*Changed package name to XML_HTMLSax 168*Added patch from John Luxford to parse single quoted attributes 169*Modified State_Machine to be a simple variable store 170 171 172 173</notes> 174 </release> 175 <release> 176 <version>0.9.0rc1</version> 177 <date>2003-05-09</date> 178 <state>beta</state> 179 <notes>A summary of the main differences between this version 180 of HTML_Sax and HTMLSax2002082201 are as follows; 181 *Instead of extending HTMLSax with your own "handlers" class, 182 you now use the set_object() method to pass an instance of the 183 class to HTMLSax. 184 *Class method callbacks are specified using the following methods; 185 *set_element_handler('startHandler','endHandler') <tag> and </tag> 186 *set_data_handler('dataHandler') for contents of an element 187 *set_pi_handler('piHandler') for <?php ?>, <?xml ?> etc. 188 *set_escape_handler(') for anything beginning with <! 189 *set_jasp_handler() - set listener for <% %> tags 190 *Attributes which no value are created and set to true 191 *Comments are handled and may contain entities; < > 192 *The callback handlers will all be passed an instance of HTMLSax 193 in the same way as the native PHP XML Expat extension 194 *Setting of parser options is handled specifically by the set_option() 195 method. Available options are; 196 *skipWhiteSpace; instruct the parser to ignore whitespace characters 197 *trimDataNodes; trim whitespace inside character data 198 *breakOnNewLine; newline characters found in character data are treated 199 as new events triggering another data callback 200 *caseFolding; converts element names to uppercase 201 202</notes> 203 </release> 204 </changelog> 205</package> 206