1Changelog for parsifal
2======================
3
41.1.0, 19.9.2008
5    Features added:
6    - XMLFLAG_USE_SIMPLEPULL for progressive parsing - can be used to
7      implement pull parser on top of parsifal.
8    - Xmlreader.c which implements/demostrates simple pull parsing see
9      samples/pull/README
10    - XMLFLAG_SPLIT_LARGE_CONTENT for controlling large binary content etc.
11    - libparsifal-config contributed by Tom Epperly (babel project)
12    - pns.h (public namespace) by Benjamin Allen (babel project) for
13      specifying prefix for the public functions  in libparsifal
14      (for avoiding dll hell etc.). See pns.h for details.
15    - encodingAliasHandler for defining for example windows-1251 means
16      use ISO-8859-1 or for complete override of encoding
17    - Xmlplint:
18      Implemented better uri resolver. Useful for complex/compound dtds
19      See xmlplint/uriresolver.h for details.
20      Removed all unsafe strcpy/strcat calls and other unsafe stuff
21      Added -F for setting any parser flag
22    - xmlhash.c has been rewritten to be generic container. This improves
23      portability and slightly performance too. see xmlhash.h
24    - Added win32/mingw/dll Makefile and binaries, see also
25      samples/pull/buildmimgw.bat
26    Bugs fixed:
27    - Configure ignores CFLAGS, added --disable-gccflags
28    - EndDocument return value
29    - ReadCh incremented character position when encountering illegal char�
30    Portability improvements:
31    - Removed ub in qsort/bsearch callback parameters
32    - Valgring warnings in XMLStringbuf_ToString
33    - lots of minor bug fixes and portability improvements
34
351.0.0, 2.11.2005
36    1.0 is here! Fixed GCC4 issues and linux/unix configure script to use
37    iconv by default if present in the system.
38
390.9.9, 2.10.2005
40    Getting close to the 1.0 release:
41    - Added Validation for TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' |
42      'ENTITY' | 'ENTITIES'| 'NMTOKEN' | 'NMTOKENS' attributes. Checking
43      the existence of unparsed entities and NOTATIONS have been left out
44      currently but otherwise XML names, entity names, NMTOKEN(S) are checked
45      for validity along with ID/IDREF(S) rules
46    - XMLIsNameChar and XMLIsNameStartChar added to the public API
47    - Now Tests for valid reader->buf (!NULL) in XMLParser_GetCurrentColumn
48      and in XMLParser_GetContextBytes (so calling these after the parsing
49      has been finished is possible - although most likely usually these are
50      called during parsing - more specifically in the error handler)
51    - Added meaningful return values for xmlplint process (see xmlplint page)
52
530.9.3, 21.08.2005
54    - Added XMLFLAG_VALIDATION_WARNINGS flag/feature (false by default).
55      Now Parsifal can collect all validation errors (they can be treated as
56      warnings) and doesn't abort on first error encountered.
57    - Added xmlplint command line tool that has for example limited xml
58      catalogs support + needed features for easy integration with text
59      editors. See docs/xmlplint for more info. Xmlplint includes a lot
60      of useful code to be used when working with Parsifal for example
61      libcurl "pull interface" curlread.c
62    - Altered localName behaviour for startElementHandler (should have done this
63      earlier). Now when element is in the default namespace localName will be
64      correctly set and isn't empty string anymore. Thanks to Hans Dykstra for
65      the fix/for the remainder about the existence of this old issue - if
66      someone needs an ability to determine whether element is in the default
67      namespace or is in a namespace defined by namespace prefix he/she can
68      easily test for ':'. Need for this SHOULD be very rare - semantically
69      these two case are equivalent and that's it.
70    - Fixed elementDecl bug: occurred when XMLFLAG_REPORT_DTD_EXT (validating
71      mode) and elementDeclHandler were set + document contained both internal
72      and external DTD. Cause: ParseDTD didn't reset RT->cpNames and
73      RT->cpNodesPool to NULL when freeing/destroying them.
74    - Fixed resolveEntityHandler bug: if resolveEntityHandler was skipping DTD
75      - by returning XML_OK but w/o setting any reader data for entity (for
76      example when using XMLParser_SetExternalSubset for DTD loading) this led
77      to segfault (bogus reader after ResolveExternalDTD). cause:
78      ResolveExternalDTD didn't restore the main reader.
79    - Fixed xmltest for better validating mode support; only test types valid
80      and invalid are validated (others are parsed with XMLFLAG_VALIDATION_
81      WARNINGS=True (warnings are not displayed). Xmltest is still the old
82      hackish version though.
83
840.9.2, 17.4.2005
85    - Added element hint to certain validation errors. For example if you leave
86      XHTML head element empty you get the following error: "Content model for
87      'head' doesn't allow it to end here. Try: script, style, meta, link..."
88      Gives also hint for mismatching enumeration attribute values.
89    - Fixed bug for ignorableWhitespace and entity references - references
90      triggered isWS=0 so this was a major bug. Now ParseEntityRef
91      also keeps isWS=1 flag properly for 
 for example.
92    - XMLParser_GetCurrentColumn/ErrorColumn fixed to return correct UTF-8
93      character count value and not byte offset.
94    - XMLParser_GetContextBytes function added. Returns column byte offset info
95      that was previously returned by GetCurrentColumn + is used to get
96      pointer to current context line/buffer. Helper.c includes routine
97      that returns formatted context for example when document contains invalid
98      token, < unescaped in attribute value, GetFormattedContext returns:
99      <e a="va<l"></e>
100      --------^
101      Context is available during parsing - including DTD parsing/errors. This
102      is very nice feature when tracking down well-formedness/validation errors
103      expecially when streaming data from network.
104    - Now doesn't scope xml:id attribute - xml:id isn't available via
105      XMLParser_GetPrefixMapping anymore.
106    - Added some new parsifal_tests testcases.
107    - Tweaked some example files
108    Fixes for dtdvalid.c:
109    - Included correction for #REQUIRED enumeration attributes checking bug.
110    - Now checks that doctype name matches root element name WHEN validation
111      filter hasn't been specified - which is the default case.
112    - Other minor fixes for out of memory conditions.
113
1140.9.1, 14.2.2005
115    - Implemented validation for EnumeratedType and NotationType attributes.
116      Still no support for TokenizedType attributes like IDs and no checking
117      for existence of NOTATIONs etc. (maybe they're on the wrong side of
118      80/20 - they should be implemented outside the core?).
119    - Fixed memory leak issues when parsing certain malformed DTDs in
120      validating mode; where endDTD for DTDValidator wasn't called and
121      thus dtd->ElementTable and dtd->cpNodesPool weren't set. There
122      was also an issue with checking the need for freeing the validator
123      instance for reuse: whether dtd->ElementDecls existed wasn't good for
124      that check (for reason mentioned above), testing for existence of
125      dtd->cpNodesPool is better way. Also set dtd->ElementTable and
126      dtd->cpNodesPool to NULL in ParseValidateDTD() - otherwise they would
127      contain invalid value in DTDValidate_StartElement WHEN REUSING
128      validator AND current doc doesn't have DTD.
129    - Preliminary conformance testing in validating mode implemented for
130      xmltest. Xmltest still uses clumsy html output - should switch to xml
131      sometime for better post-processing options. Xmltest also needs a lot
132      of fixing in the other areas and canonxml.c needs some fixing too.
133      However, currently tests are passed W/O ANY MEMORY LEAKS in validating
134      mode too and although there's things to be done in validating mode,
135      testing tells that parsifal validation is already VERY STABLE.
136    - Better base directory handling added for nsvalid.c and winurl.c.
137      Base directory handling is left out of the core parser - this is a
138      conscious choice as well as leaving checking for legal systemID chars
139      for higher level i.e. for http library etc.
140    - Added samples/misc/helper.c that will be a place for some helper
141      routines like UTF8BufToLatin, GetBaseDir etc.
142    - Also tweaked nsvalid.c error reporting.
143    - Docs said (in validation section): "Of course you can use LPXMLPARSER
144      UserData too but you must get it via LPXMLDTDVALIDATOR parser
145      parameter". And before that: "UserData parameter for your LPXMLPARSER
146      will be LPXMLDTDVALIDATOR" - duh!
147
1480.9.0, 31.1.2005
149    - Preliminary DTD validation support. Quite comprehensive actually,
150      missing support for validation of TokenizedType and EnumeratedType
151      attributes but in other respects very useful feature expecially in
152      SAX parsing; simplifies state handling code a lot.
153    - Fixed some code uselessly included when DTD_SUPPORT was not defined:
154      in ParseAttributes (defaulted attributes handling) and TrieTok.
155    - Fixed "exotic" bug which occurred when for exampe
156      <!ENTITY % pe SYSTEM "out.pe">
157      <!ENTITY ent "%pe;">
158      AND out.pe would contain encoding declaration; parsing of encodingDecl
159      used same buffer (RT->charsBuf) that entity parsing was using - fixed
160      by creating own XMLSTRINGBUF in ParseXmlDecl as a safety measure.
161    - fixed memory leak which was introduced in 0.8.3 (occurred when
162      DTD_SUPPORT was off and <!DOCTYPE was present)
163    - fixed memory leak when startDocument returned XML_ABORT: iconv wasn't
164      released in that case.
165    - Fixed bistream BISFIXBUF to set initial bufsize to blocksize*2 (this
166      results in fewer reallocs and fiddling of outbut buffer when parsing
167      internal entities for example)
168    - Added XMLParser_SetExternalSubset that provides similar features
169      as SAX getExternalSubset
170    - Now doesn't report error in non-validating mode when DOCTYPE and root
171      element names don't match. Infact we don't currently test this at all
172      to allow "selective validation".
173    - Fixed xmlcfg.h to use stdint.h for UINT32 if platform can't be
174      otherwise determined
175    - Fixed bug in XMLVector_Remove(!)
176    - Moved some infrequently needed dtd specific definitions into xmldtd.h
177
1780.8.3, 11.8.2004
179    This release introduces many improvements to the parsing algorithms;
180    New Trie algorithm based routines speed up DTD tokenizer and improves
181    overall performance. Other optimizations have also been done bringing
182    parser performance very close to the perfomance of the fastest
183    XML 1.0 parsers available while still retaining lightweight
184    implementation and without compromising XML conformance. Portability
185    has been improved (see News) and of course a few bugs have been fixed:
186    - Parser reported inaccurate ErrorLine in some cases where DTD token
187      (name) ended with LF
188    - Xmltest VC project files were screwed up because they were
189      accidentally run thru CRLF to LF conversion.
190
1910.8.2, 1.7.2004
192    This release fixes some bugs that occurred when parsing deeply nested
193    parameter entities. Also improved tokenizer a bit. These improved
194    some XML 1.0 conformance issues as well.
195
1960.8.1, 13.6.2004
197    Two bugfixes:
198    - GetSystemID/GetPublicID returned wrong values (and sometimes even
199    garbage). This is now fixed and tested properly.
200    - Was unable to parse documents starting with xml-stylesheet PI
201    (w/o xml declaration) Also updated XMLCONF testsuite to current version
202    and added some other tricky regression tests that process for example
203    docbook.dtd etc. Examples were also revisited.
204
2050.8.0, 1.6.2004
206    This release adds DTD processing support; parameter entities, attribute
207    defaulting and DTD declaration events such as elementDeclHandler,
208    attributeDeclHandler etc.  are now available.
209    - Added GetCurrentEntity function. see manual
210    - Fixed GetSystemID/GetPublicID to return accurate info. see manual
211    - Fixed inaccuracy in column position reporting on certain error conditions
212    - Fixed bug that make parser try to free invalid pointer when error occurred
213    parsing attributes - this lead to crash in some cases.
214
2150.7.5, 23.3.2004
216    Dinand Vanvelzen pointed out a bug in XMLStringbuf_Init when parameter
217    initSize was 0; _Init was calling malloc with 0 byte allocation request!
218    This infact succeeded when m$ RTL was used and same with linux C runtime
219    but Borland RTL malloc implementation failed because of this! Also fixed
220    some other minor things like XMLNormalizeBuf which now trims the buffer too
221    - one could ask why it didn't do this before... ;-)
222    Dinand also pointed out an old BYTE #define problem in windows which I've
223    ignored in the past. (win API uses BYTE typedef which conflicts with
224    parsifal definition). Actually I should replace all occurrences of BYTE
225    with char but I keep it for bacwards compatibility! Sorry Dinand - putting
226    parsifal.h include AFTER windows.h for example works for me. Other option
227    is to use:
228    #include "libparsifal/parsifal.h"
229    #undef BYTE
230    #include "windows.h"
231
2320.7.4, 15.2.2004
233    Support for linking with GNU libiconv; Now it's possible to parse documents
234    in various encodings such as UTF-16, UTF-32, EUC-JP, SHIFT_JIS,
235    ISO-8859-{1,2,3,4,5,7,9,10,13,14,15,16} etc. Internal encoding routines and
236    encoding detection has also changed considerably and is now more mature
237    implementing XML spec appendix F "Autodetection of Character Encodings".
238
239    - Memory corruption bug has been fixed (occurred when attribute
240      count grew bigger than 16 or tagstack grew bigger than 16) - added
241      XMLPool routines that make memory handling more sophisticated and safe.
242    - Added BIS_ERR_INPUT/XMLP_ERR_IO for easier handling of input source
243      callback errors. Input source errors should be distinguished from EOF
244      condition expecially when external entities are parsed; entity can appear
245      well-formed and ok even when infact there is a stream error when only EOF
246      is checked.
247    - Now reports line and column position for all illegal characters too
248    - Many other minor fixes and code clean-up
249
2500.7.3, 23.10.2003
251    Fixed stupid pointer relocating/reallocation bug in ReadCh()'s CR to LF
252    conversion routine. Thanks to Andrew Gray for tracing the bug for me -
253    actually I found this bug myself today too not knowing about Andrew...
254    What a scary syncronized world ;)
255    Added mingw/dev-cpp (free IDE for mingw) static library project.
256    see win32/mingw/dev-cpp/static directory for more info.
257
2580.7.2, 30.9.2003
259    Parsing engine has been somewhat rewritten. New parser makes
260    more accurate position info and error info possible; see doc for info
261    about XMLParser_GetCurrentColumn etc. Parser is now much more simple
262    and there is some performance benefit too.
263    API hasn't changed much, only XMLFLAG_CONVERT_EOL got to go.
264    See also APIchanges.
265    Fixed bugs:
266    - ISO-8859-1 encoded document with long whitespace section made
267      parsing fail because of faulty CRLF conversion routine.
268
2690.7.1, 24.7.2003
270    Fixed bugs:
271    - EOL conversion bug when using ISO-8859-1 encoding (was converting
272      CRLF to LFLF in some cases).
273    - Duplicate attribute checking when XMLFLAG_NAMESPACE_PREFIXES is off
274    - Several error condition memory leaks plugged in attribute/namespacedecl
275      containing internal entities.
276    - Eliminated some GCC warnings
277
278    Also added some new conformance testcases and fixed some sample code.
279
2800.7.0, 3.7.2003
281    Added internal and external general entity support. Optimizations (about
282    50% performance increase). Optimizations in memory management: XMLVector
283    improved and optimized. Major code clean up (got rid of stupid and
284    confusing LPXMLPARSER object-global temp variables). Major well-formedness
285    / XML spec tests and corrections - sources now include OASIS XML testsuite
286    parser. API clean up. See APIChanges for more details. Better samples
287    included. Some C++ tests done too.
288
2890.6.8, 10.3.2003
290    Various bugs fixed: internal DTD subset parsing bug fixed (
291    TOK_END_DTD string ']>' length was 1 in BufferedIStream_Read 3. param!!!).
292    There's still an issue about entity declarations
293    containing ]> in quotes which causes Parsifal to reject document (solving
294    this would require some sort of simple subset validator). See ISSUES.
295    Fixed "1 tag document error" bug (this is for all authors of documents
296    that contain only  1 element!)
297    Bug fixed in BufferedIStream_Peek (assumed memcmp returns 1 or 0 when
298    infact it can return < -1 which means false "fatal errors" from
299    BufferedIStream_Peek). Thanks to Keijiro Takahashi for pointing this out.
300    Added isrcmem.h helper macros and declarations for parsing memory buffers
301    (adding more build-in inputsources to Parsifal isn't currently in my TODO
302    list) Also added new sample program xmlcopy (a great piece of s**tware
303    I tell you!)
304
305    I've run many more OASIS conformance tests and it seems that Parsifal
306    is quite loyal to XML spec at least when it comes to well-formedness.
307    Namespaces seems to be ok too (although there's no "mandatory"
308    http://www.w3.org/XML/1998/namespace uris set for xmlns declarations
309    for example). Still no new test results included in this release.
310    Done little benchmarking too. See html docs.
311
3120.6.7, 1.3.2003
313    Added Whitespace property; gives more control over how Parsifal
314    treats whitespace in element content and in attributes.
315    Whitespace handling info also added to Manual.html.
316    Fixed attribute parsing bug: name=XXX'value'
317    where junk in XXX was possible! Strange I never bumped into this
318    until running some OASIS tests...
319    Now tests illegal entities like &#1; also and
320    reports illegal characters as ERR_XMLP_ILLEGAL_CHAR (new PARSER_ERRCODE)
321    not ERR_XMLP_ENCODING (also reports char value itself inside single
322    quotes in ErrorString). Fixed also BUFFEREDISTREAM encoding bug.
323
3240.6.6, 23.2.2003
325    Critical bug fix for unicode (UTF-8) validator. Now tested with
326    OASIS XML Conformance Subcommittee japanese UTF-8 test files.
327    Planning to release "test packs" separately for wfparser -
328    current distribution is still including basic well formedness
329    tests. Solved some ANSI C header issues as well.
330
3310.6.5, 15.2.2003
332    Multi platform release; makefiles for building shared library
333    for unix and stuff. Some bug fixes.
334
3350.6.4, 15.11.2002
336    Initial release as "xmlproc"
337
338    Name changed to "parsifal" 18.11.2002. Thanks for Mr. Lars Marius Garshol
339    who kindly pointed out about the existence of his parser xmlproc! No harm
340    done? (He has an very nice site at http://www.garshol.priv.no which
341    includes for example very thorough info about free XML tools).
342