1Changelog for parsifal 2====================== 3 41.1.0, 19.9.2008 5 Features added: 6 - XMLFLAG_USE_SIMPLEPULL for progressive parsing - can be used to 7 implement pull parser on top of parsifal. 8 - Xmlreader.c which implements/demostrates simple pull parsing see 9 samples/pull/README 10 - XMLFLAG_SPLIT_LARGE_CONTENT for controlling large binary content etc. 11 - libparsifal-config contributed by Tom Epperly (babel project) 12 - pns.h (public namespace) by Benjamin Allen (babel project) for 13 specifying prefix for the public functions in libparsifal 14 (for avoiding dll hell etc.). See pns.h for details. 15 - encodingAliasHandler for defining for example windows-1251 means 16 use ISO-8859-1 or for complete override of encoding 17 - Xmlplint: 18 Implemented better uri resolver. Useful for complex/compound dtds 19 See xmlplint/uriresolver.h for details. 20 Removed all unsafe strcpy/strcat calls and other unsafe stuff 21 Added -F for setting any parser flag 22 - xmlhash.c has been rewritten to be generic container. This improves 23 portability and slightly performance too. see xmlhash.h 24 - Added win32/mingw/dll Makefile and binaries, see also 25 samples/pull/buildmimgw.bat 26 Bugs fixed: 27 - Configure ignores CFLAGS, added --disable-gccflags 28 - EndDocument return value 29 - ReadCh incremented character position when encountering illegal char� 30 Portability improvements: 31 - Removed ub in qsort/bsearch callback parameters 32 - Valgring warnings in XMLStringbuf_ToString 33 - lots of minor bug fixes and portability improvements 34 351.0.0, 2.11.2005 36 1.0 is here! Fixed GCC4 issues and linux/unix configure script to use 37 iconv by default if present in the system. 38 390.9.9, 2.10.2005 40 Getting close to the 1.0 release: 41 - Added Validation for TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 42 'ENTITY' | 'ENTITIES'| 'NMTOKEN' | 'NMTOKENS' attributes. Checking 43 the existence of unparsed entities and NOTATIONS have been left out 44 currently but otherwise XML names, entity names, NMTOKEN(S) are checked 45 for validity along with ID/IDREF(S) rules 46 - XMLIsNameChar and XMLIsNameStartChar added to the public API 47 - Now Tests for valid reader->buf (!NULL) in XMLParser_GetCurrentColumn 48 and in XMLParser_GetContextBytes (so calling these after the parsing 49 has been finished is possible - although most likely usually these are 50 called during parsing - more specifically in the error handler) 51 - Added meaningful return values for xmlplint process (see xmlplint page) 52 530.9.3, 21.08.2005 54 - Added XMLFLAG_VALIDATION_WARNINGS flag/feature (false by default). 55 Now Parsifal can collect all validation errors (they can be treated as 56 warnings) and doesn't abort on first error encountered. 57 - Added xmlplint command line tool that has for example limited xml 58 catalogs support + needed features for easy integration with text 59 editors. See docs/xmlplint for more info. Xmlplint includes a lot 60 of useful code to be used when working with Parsifal for example 61 libcurl "pull interface" curlread.c 62 - Altered localName behaviour for startElementHandler (should have done this 63 earlier). Now when element is in the default namespace localName will be 64 correctly set and isn't empty string anymore. Thanks to Hans Dykstra for 65 the fix/for the remainder about the existence of this old issue - if 66 someone needs an ability to determine whether element is in the default 67 namespace or is in a namespace defined by namespace prefix he/she can 68 easily test for ':'. Need for this SHOULD be very rare - semantically 69 these two case are equivalent and that's it. 70 - Fixed elementDecl bug: occurred when XMLFLAG_REPORT_DTD_EXT (validating 71 mode) and elementDeclHandler were set + document contained both internal 72 and external DTD. Cause: ParseDTD didn't reset RT->cpNames and 73 RT->cpNodesPool to NULL when freeing/destroying them. 74 - Fixed resolveEntityHandler bug: if resolveEntityHandler was skipping DTD 75 - by returning XML_OK but w/o setting any reader data for entity (for 76 example when using XMLParser_SetExternalSubset for DTD loading) this led 77 to segfault (bogus reader after ResolveExternalDTD). cause: 78 ResolveExternalDTD didn't restore the main reader. 79 - Fixed xmltest for better validating mode support; only test types valid 80 and invalid are validated (others are parsed with XMLFLAG_VALIDATION_ 81 WARNINGS=True (warnings are not displayed). Xmltest is still the old 82 hackish version though. 83 840.9.2, 17.4.2005 85 - Added element hint to certain validation errors. For example if you leave 86 XHTML head element empty you get the following error: "Content model for 87 'head' doesn't allow it to end here. Try: script, style, meta, link..." 88 Gives also hint for mismatching enumeration attribute values. 89 - Fixed bug for ignorableWhitespace and entity references - references 90 triggered isWS=0 so this was a major bug. Now ParseEntityRef 91 also keeps isWS=1 flag properly for for example. 92 - XMLParser_GetCurrentColumn/ErrorColumn fixed to return correct UTF-8 93 character count value and not byte offset. 94 - XMLParser_GetContextBytes function added. Returns column byte offset info 95 that was previously returned by GetCurrentColumn + is used to get 96 pointer to current context line/buffer. Helper.c includes routine 97 that returns formatted context for example when document contains invalid 98 token, < unescaped in attribute value, GetFormattedContext returns: 99 <e a="va<l"></e> 100 --------^ 101 Context is available during parsing - including DTD parsing/errors. This 102 is very nice feature when tracking down well-formedness/validation errors 103 expecially when streaming data from network. 104 - Now doesn't scope xml:id attribute - xml:id isn't available via 105 XMLParser_GetPrefixMapping anymore. 106 - Added some new parsifal_tests testcases. 107 - Tweaked some example files 108 Fixes for dtdvalid.c: 109 - Included correction for #REQUIRED enumeration attributes checking bug. 110 - Now checks that doctype name matches root element name WHEN validation 111 filter hasn't been specified - which is the default case. 112 - Other minor fixes for out of memory conditions. 113 1140.9.1, 14.2.2005 115 - Implemented validation for EnumeratedType and NotationType attributes. 116 Still no support for TokenizedType attributes like IDs and no checking 117 for existence of NOTATIONs etc. (maybe they're on the wrong side of 118 80/20 - they should be implemented outside the core?). 119 - Fixed memory leak issues when parsing certain malformed DTDs in 120 validating mode; where endDTD for DTDValidator wasn't called and 121 thus dtd->ElementTable and dtd->cpNodesPool weren't set. There 122 was also an issue with checking the need for freeing the validator 123 instance for reuse: whether dtd->ElementDecls existed wasn't good for 124 that check (for reason mentioned above), testing for existence of 125 dtd->cpNodesPool is better way. Also set dtd->ElementTable and 126 dtd->cpNodesPool to NULL in ParseValidateDTD() - otherwise they would 127 contain invalid value in DTDValidate_StartElement WHEN REUSING 128 validator AND current doc doesn't have DTD. 129 - Preliminary conformance testing in validating mode implemented for 130 xmltest. Xmltest still uses clumsy html output - should switch to xml 131 sometime for better post-processing options. Xmltest also needs a lot 132 of fixing in the other areas and canonxml.c needs some fixing too. 133 However, currently tests are passed W/O ANY MEMORY LEAKS in validating 134 mode too and although there's things to be done in validating mode, 135 testing tells that parsifal validation is already VERY STABLE. 136 - Better base directory handling added for nsvalid.c and winurl.c. 137 Base directory handling is left out of the core parser - this is a 138 conscious choice as well as leaving checking for legal systemID chars 139 for higher level i.e. for http library etc. 140 - Added samples/misc/helper.c that will be a place for some helper 141 routines like UTF8BufToLatin, GetBaseDir etc. 142 - Also tweaked nsvalid.c error reporting. 143 - Docs said (in validation section): "Of course you can use LPXMLPARSER 144 UserData too but you must get it via LPXMLDTDVALIDATOR parser 145 parameter". And before that: "UserData parameter for your LPXMLPARSER 146 will be LPXMLDTDVALIDATOR" - duh! 147 1480.9.0, 31.1.2005 149 - Preliminary DTD validation support. Quite comprehensive actually, 150 missing support for validation of TokenizedType and EnumeratedType 151 attributes but in other respects very useful feature expecially in 152 SAX parsing; simplifies state handling code a lot. 153 - Fixed some code uselessly included when DTD_SUPPORT was not defined: 154 in ParseAttributes (defaulted attributes handling) and TrieTok. 155 - Fixed "exotic" bug which occurred when for exampe 156 <!ENTITY % pe SYSTEM "out.pe"> 157 <!ENTITY ent "%pe;"> 158 AND out.pe would contain encoding declaration; parsing of encodingDecl 159 used same buffer (RT->charsBuf) that entity parsing was using - fixed 160 by creating own XMLSTRINGBUF in ParseXmlDecl as a safety measure. 161 - fixed memory leak which was introduced in 0.8.3 (occurred when 162 DTD_SUPPORT was off and <!DOCTYPE was present) 163 - fixed memory leak when startDocument returned XML_ABORT: iconv wasn't 164 released in that case. 165 - Fixed bistream BISFIXBUF to set initial bufsize to blocksize*2 (this 166 results in fewer reallocs and fiddling of outbut buffer when parsing 167 internal entities for example) 168 - Added XMLParser_SetExternalSubset that provides similar features 169 as SAX getExternalSubset 170 - Now doesn't report error in non-validating mode when DOCTYPE and root 171 element names don't match. Infact we don't currently test this at all 172 to allow "selective validation". 173 - Fixed xmlcfg.h to use stdint.h for UINT32 if platform can't be 174 otherwise determined 175 - Fixed bug in XMLVector_Remove(!) 176 - Moved some infrequently needed dtd specific definitions into xmldtd.h 177 1780.8.3, 11.8.2004 179 This release introduces many improvements to the parsing algorithms; 180 New Trie algorithm based routines speed up DTD tokenizer and improves 181 overall performance. Other optimizations have also been done bringing 182 parser performance very close to the perfomance of the fastest 183 XML 1.0 parsers available while still retaining lightweight 184 implementation and without compromising XML conformance. Portability 185 has been improved (see News) and of course a few bugs have been fixed: 186 - Parser reported inaccurate ErrorLine in some cases where DTD token 187 (name) ended with LF 188 - Xmltest VC project files were screwed up because they were 189 accidentally run thru CRLF to LF conversion. 190 1910.8.2, 1.7.2004 192 This release fixes some bugs that occurred when parsing deeply nested 193 parameter entities. Also improved tokenizer a bit. These improved 194 some XML 1.0 conformance issues as well. 195 1960.8.1, 13.6.2004 197 Two bugfixes: 198 - GetSystemID/GetPublicID returned wrong values (and sometimes even 199 garbage). This is now fixed and tested properly. 200 - Was unable to parse documents starting with xml-stylesheet PI 201 (w/o xml declaration) Also updated XMLCONF testsuite to current version 202 and added some other tricky regression tests that process for example 203 docbook.dtd etc. Examples were also revisited. 204 2050.8.0, 1.6.2004 206 This release adds DTD processing support; parameter entities, attribute 207 defaulting and DTD declaration events such as elementDeclHandler, 208 attributeDeclHandler etc. are now available. 209 - Added GetCurrentEntity function. see manual 210 - Fixed GetSystemID/GetPublicID to return accurate info. see manual 211 - Fixed inaccuracy in column position reporting on certain error conditions 212 - Fixed bug that make parser try to free invalid pointer when error occurred 213 parsing attributes - this lead to crash in some cases. 214 2150.7.5, 23.3.2004 216 Dinand Vanvelzen pointed out a bug in XMLStringbuf_Init when parameter 217 initSize was 0; _Init was calling malloc with 0 byte allocation request! 218 This infact succeeded when m$ RTL was used and same with linux C runtime 219 but Borland RTL malloc implementation failed because of this! Also fixed 220 some other minor things like XMLNormalizeBuf which now trims the buffer too 221 - one could ask why it didn't do this before... ;-) 222 Dinand also pointed out an old BYTE #define problem in windows which I've 223 ignored in the past. (win API uses BYTE typedef which conflicts with 224 parsifal definition). Actually I should replace all occurrences of BYTE 225 with char but I keep it for bacwards compatibility! Sorry Dinand - putting 226 parsifal.h include AFTER windows.h for example works for me. Other option 227 is to use: 228 #include "libparsifal/parsifal.h" 229 #undef BYTE 230 #include "windows.h" 231 2320.7.4, 15.2.2004 233 Support for linking with GNU libiconv; Now it's possible to parse documents 234 in various encodings such as UTF-16, UTF-32, EUC-JP, SHIFT_JIS, 235 ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16} etc. Internal encoding routines and 236 encoding detection has also changed considerably and is now more mature 237 implementing XML spec appendix F "Autodetection of Character Encodings". 238 239 - Memory corruption bug has been fixed (occurred when attribute 240 count grew bigger than 16 or tagstack grew bigger than 16) - added 241 XMLPool routines that make memory handling more sophisticated and safe. 242 - Added BIS_ERR_INPUT/XMLP_ERR_IO for easier handling of input source 243 callback errors. Input source errors should be distinguished from EOF 244 condition expecially when external entities are parsed; entity can appear 245 well-formed and ok even when infact there is a stream error when only EOF 246 is checked. 247 - Now reports line and column position for all illegal characters too 248 - Many other minor fixes and code clean-up 249 2500.7.3, 23.10.2003 251 Fixed stupid pointer relocating/reallocation bug in ReadCh()'s CR to LF 252 conversion routine. Thanks to Andrew Gray for tracing the bug for me - 253 actually I found this bug myself today too not knowing about Andrew... 254 What a scary syncronized world ;) 255 Added mingw/dev-cpp (free IDE for mingw) static library project. 256 see win32/mingw/dev-cpp/static directory for more info. 257 2580.7.2, 30.9.2003 259 Parsing engine has been somewhat rewritten. New parser makes 260 more accurate position info and error info possible; see doc for info 261 about XMLParser_GetCurrentColumn etc. Parser is now much more simple 262 and there is some performance benefit too. 263 API hasn't changed much, only XMLFLAG_CONVERT_EOL got to go. 264 See also APIchanges. 265 Fixed bugs: 266 - ISO-8859-1 encoded document with long whitespace section made 267 parsing fail because of faulty CRLF conversion routine. 268 2690.7.1, 24.7.2003 270 Fixed bugs: 271 - EOL conversion bug when using ISO-8859-1 encoding (was converting 272 CRLF to LFLF in some cases). 273 - Duplicate attribute checking when XMLFLAG_NAMESPACE_PREFIXES is off 274 - Several error condition memory leaks plugged in attribute/namespacedecl 275 containing internal entities. 276 - Eliminated some GCC warnings 277 278 Also added some new conformance testcases and fixed some sample code. 279 2800.7.0, 3.7.2003 281 Added internal and external general entity support. Optimizations (about 282 50% performance increase). Optimizations in memory management: XMLVector 283 improved and optimized. Major code clean up (got rid of stupid and 284 confusing LPXMLPARSER object-global temp variables). Major well-formedness 285 / XML spec tests and corrections - sources now include OASIS XML testsuite 286 parser. API clean up. See APIChanges for more details. Better samples 287 included. Some C++ tests done too. 288 2890.6.8, 10.3.2003 290 Various bugs fixed: internal DTD subset parsing bug fixed ( 291 TOK_END_DTD string ']>' length was 1 in BufferedIStream_Read 3. param!!!). 292 There's still an issue about entity declarations 293 containing ]> in quotes which causes Parsifal to reject document (solving 294 this would require some sort of simple subset validator). See ISSUES. 295 Fixed "1 tag document error" bug (this is for all authors of documents 296 that contain only 1 element!) 297 Bug fixed in BufferedIStream_Peek (assumed memcmp returns 1 or 0 when 298 infact it can return < -1 which means false "fatal errors" from 299 BufferedIStream_Peek). Thanks to Keijiro Takahashi for pointing this out. 300 Added isrcmem.h helper macros and declarations for parsing memory buffers 301 (adding more build-in inputsources to Parsifal isn't currently in my TODO 302 list) Also added new sample program xmlcopy (a great piece of s**tware 303 I tell you!) 304 305 I've run many more OASIS conformance tests and it seems that Parsifal 306 is quite loyal to XML spec at least when it comes to well-formedness. 307 Namespaces seems to be ok too (although there's no "mandatory" 308 http://www.w3.org/XML/1998/namespace uris set for xmlns declarations 309 for example). Still no new test results included in this release. 310 Done little benchmarking too. See html docs. 311 3120.6.7, 1.3.2003 313 Added Whitespace property; gives more control over how Parsifal 314 treats whitespace in element content and in attributes. 315 Whitespace handling info also added to Manual.html. 316 Fixed attribute parsing bug: name=XXX'value' 317 where junk in XXX was possible! Strange I never bumped into this 318 until running some OASIS tests... 319 Now tests illegal entities like  also and 320 reports illegal characters as ERR_XMLP_ILLEGAL_CHAR (new PARSER_ERRCODE) 321 not ERR_XMLP_ENCODING (also reports char value itself inside single 322 quotes in ErrorString). Fixed also BUFFEREDISTREAM encoding bug. 323 3240.6.6, 23.2.2003 325 Critical bug fix for unicode (UTF-8) validator. Now tested with 326 OASIS XML Conformance Subcommittee japanese UTF-8 test files. 327 Planning to release "test packs" separately for wfparser - 328 current distribution is still including basic well formedness 329 tests. Solved some ANSI C header issues as well. 330 3310.6.5, 15.2.2003 332 Multi platform release; makefiles for building shared library 333 for unix and stuff. Some bug fixes. 334 3350.6.4, 15.11.2002 336 Initial release as "xmlproc" 337 338 Name changed to "parsifal" 18.11.2002. Thanks for Mr. Lars Marius Garshol 339 who kindly pointed out about the existence of his parser xmlproc! No harm 340 done? (He has an very nice site at http://www.garshol.priv.no which 341 includes for example very thorough info about free XML tools). 342