1 (**
2 ===============================================================================================
3 Name : LibXmlParser
4 ===============================================================================================
5 Project : All Projects
6 ===============================================================================================
7 Subject : Progressive XML Parser for all types of XML Files
8 ===============================================================================================
9 Author : Stefan Heymann
10 Eschenweg 3
11 72076 T�bingen
12 GERMANY
13
14 E-Mail: stefan@destructor.de
15 URL: www.destructor.de
16 ===============================================================================================
17 Source, Legals ("Licence")
18 --------------------------
19 The official site to get this parser is http://www.destructor.de/
20
21 Usage and Distribution of this Source Code is ruled by the
22 "Destructor.de Source code Licence" (DSL) which comes with this file or
23 can be downloaded at http://www.destructor.de/
24
25 IN SHORT: Usage and distribution of this source code is free.
26 You use it completely on your own risk.
27
28 Postcardware
29 ------------
30 If you like this code, please send a postcard of your city to my above address.
31 ===============================================================================================
32 !!! All parts of this code which are not finished or not conforming exactly to
33 the XmlSpec are marked with three exclamation marks
34
35 -!- Parts where the parser may be able to detect errors in the document's syntax are
36 marked with the dash-exlamation mark-dash sequence.
37 ===============================================================================================
38 Terminology:
39 ------------
40 - Start: Start of a buffer part
41 - Final: End (last character) of a buffer part
42 - DTD: Document Type Definition
43 - DTDc: Document Type Declaration
44 - XMLSpec: The current W3C XML Recommendation (version 1.0 as of 1998-02-10), Chapter No.
45 - Cur*: Fields concerning the "Current" part passed back by the "Scan" method
46 ===============================================================================================
47 Scanning the XML document
48 -------------------------
49 - Create TXmlParser Instance MyXml := TXmlParser.Create;
50 - Load XML Document MyXml.LoadFromFile (Filename);
51 - Start Scanning MyXml.StartScan;
52 - Scan Loop WHILE MyXml.Scan DO
53 - Test for Part Type CASE MyXml.CurPartType OF
54 - Handle Parts ... : ;;;
55 - Handle Parts ... : ;;;
56 - Handle Parts ... : ;;;
57 END;
58 - Destroy MyXml.Free;
59 ===============================================================================================
60 Loading the XML document
61 ------------------------
62 You can load the XML document from a file with the "LoadFromFile" method.
63 It is beyond the scope of this parser to perform HTTP or FTP accesses. If you want your
64 application to handle such requests (URLs), you can load the XML via HTTP or FTP or whatever
65 protocol and hand over the data buffer using the "LoadFromBuffer" or "SetBuffer" method.
66 "LoadFromBuffer" loads the internal buffer of TXmlParser with the given null-terminated
67 string, thereby creating a copy of that buffer.
68 "SetBuffer" just takes the pointer to another buffer, which means that the given
69 buffer pointer must be valid while the document is accessed via TXmlParser.
70 ===============================================================================================
71 Encodings:
72 ----------
73 This XML parser kind of "understands" the following encodings:
74 - UTF-8
75 - ISO-8859-1
76 - Windows-1252
77
78 Any flavor of multi-byte characters (and this includes UTF-16) is not supported. Sorry.
79
80 Every string which has to be passed to the application passes the virtual method
81 "TranslateEncoding" which translates the string from the current encoding (stored in
82 "CurEncoding") into the encoding the application wishes to receive.
83 The "TranslateEncoding" method that is built into TXmlParser assumes that the application
84 wants to receive Windows ANSI (Windows-1252, about the same as ISO-8859-1) and is able
85 to convert UTF-8 and ISO-8859-1 encodings.
86 For other source and target encodings, you will have to override "TranslateEncoding".
87 ===============================================================================================
88 Buffer Handling
89 ---------------
90 - The document must be loaded completely into a piece of RAM
91 - All character positions are referenced by PChar pointers
92 - The TXmlParser instance can either "own" the buffer itself (then, FBufferSize is > 0)
93 or reference the buffer of another instance or object (then, FBuffersize is 0 and
94 FBuffer is not NIL)
95 - The Property DocBuffer passes back a pointer to the first byte of the document. If there
96 is no document stored (FBuffer is NIL), the DocBuffer returns a pointer to a NULL character.
97 ===============================================================================================
98 Whitespace Handling
99 -------------------
100 The TXmlParser property "PackSpaces" determines how Whitespace is returned in Text Content:
101 While PackSpaces is true, all leading and trailing whitespace characters are trimmed of, all
102 Whitespace is converted to Space #x20 characters and contiguous Whitespace characters are
103 compressed to one.
104 If the "Scan" method reports a ptContent part, the application can get the original text
105 with all whitespace characters by extracting the characters from "CurStart" to "CurFinal".
106 If the application detects an xml:space attribute, it can set "PackSpaces" accordingly or
107 use CurStart/CurFinal.
108 Please note that TXmlParser does _not_ normalize Line Breaks to single LineFeed characters
109 as the XmlSpec requires (XmlSpec 2.11).
110 The xml:space attribute is not handled by TXmlParser. This is on behalf of the application.
111 ===============================================================================================
112 Non-XML-Conforming
113 ------------------
114 TXmlParser does not conform 100 % exactly to the XmlSpec:
115 - UTF-16 is not supported (XmlSpec 2.2)
116 (Workaround: Convert UTF-16 to UTF-8 and hand the buffer over to TXmlParser)
117 - As the parser only works with single byte strings, all Unicode characters > 255
118 can currently not be handled correctly.
119 - Line breaks are not normalized to single Linefeed #x0A characters (XmlSpec 2.11)
120 (Workaround: The Application can access the text contents on its own [CurStart, CurFinal],
121 thereby applying every normalization it wishes to)
122 - The attribute value normalization does not work exactly as defined in the
123 Second Edition of the XML 1.0 specification.
124 - See also the code parts marked with three consecutive exclamation marks. These are
125 parts which are not finished in the current code release.
126
127 This list may be incomplete, so it may grow if I get to know any other points.
128 As work on the parser proceeds, this list may also shrink.
129 ===============================================================================================
130 Things Todo
131 -----------
132 - Introduce a new event/callback which is called when there is an unresolvable
133 entity or character reference
134 - Support Unicode
135 - Use Streams instead of reading the whole XML into memory
136 ===============================================================================================
137 Change History, Version numbers
138 -------------------------------
139 The Date is given in ISO Year-Month-Day (YYYY-MM-DD) order.
140 Versions are counted from 1.0.0 beginning with the version from 2000-03-16.
141 Unreleased versions don't get a version number.
142
143 Date Author Version Changes
144 -----------------------------------------------------------------------------------------------
145 2000-03-16 HeySt 1.0.0 Start
146 2000-03-28 HeySt 1.0.1 Initial Publishing of TXmlParser on the destructor.de Web Site
147 2000-03-30 HeySt 1.0.2 TXmlParser.AnalyzeCData: Call "TranslateEncoding" for CurContent
148 2000-03-31 HeySt 1.0.3 Deleted the StrPosE function (was not needed anyway)
149 2000-04-04 HeySt 1.0.4 TDtdElementRec modified: Start/Final for all Elements;
150 Should be backwards compatible.
151 AnalyzeDtdc: Set CurPartType to ptDtdc
152 2000-04-23 HeySt 1.0.5 New class TObjectList. Eliminated reference to the Delphi 5
153 "Contnrs" unit so LibXmlParser is Delphi 4 compatible.
154 2000-07-03 HeySt 1.0.6 TNvpNode: Added Constructor
155 2000-07-11 HeySt 1.0.7 Removed "Windows" from USES clause
156 Added three-exclamation-mark comments for Utf8ToAnsi/AnsiToUtf8
157 Added three-exclamation-mark comments for CHR function calls
158 2000-07-23 HeySt 1.0.8 TXmlParser.Clear: CurAttr.Clear; EntityStack.Clear;
159 (This was not a bug; just defensive programming)
160 2000-07-29 HeySt 1.0.9 TNvpList: Added methods: Node(Index), Value(Index), Name(Index);
161 2000-10-07 HeySt Introduced Conditional Defines
162 Uses Contnrs unit and its TObjectList class again for
163 Delphi 5 and newer versions
164 2001-01-30 HeySt Introduced Version Numbering
165 Made LoadFromFile and LoadFromBuffer BOOLEAN functions
166 Introduced FileMode parameter for LoadFromFile
167 BugFix: TAttrList.Analyze: Must add CWhitespace to ExtractName call
168 Comments worked over
169 2001-02-28 HeySt 1.0.10 Completely worked over and tested the UTF-8 functions
170 Fixed a bug in TXmlParser.Scan which caused it to start over when it
171 was called after the end of scanning, resulting in an endless loop
172 TEntityStack is now a TObjectList instead of TList
173 2001-07-03 HeySt 1.0.11 Updated Compiler Version IFDEFs for Kylix
174 2001-07-11 HeySt 1.0.12 New TCustomXmlScanner component (taken over from LibXmlComps.pas)
175 2001-07-14 HeySt 1.0.13 Bugfix TCustomXmlScanner.FOnTranslateEncoding
176 2001-10-22 HeySt Don't clear CurName anymore when the parser finds a CDATA section.
177 2001-12-03 HeySt 1.0.14 TObjectList.Clear: Make call to INHERITED method (fixes a memory leak)
178 2001-12-05 HeySt 1.0.15 TObjectList.Clear: removed call to INHERITED method
179 TObjectList.Destroy: Inserted SetCapacity call.
180 Reduces need for frequent re-allocation of pointer buffer
181 Dedicated to my father, Theodor Heymann
182 2002-06-26 HeySt 1.0.16 TXmlParser.Scan: Fixed a bug with PIs whose name is beginning
183 with 'xml'. Thanks to Uwe Kamm for submitting this bug.
184 The CurEncoding property is now always in uppercase letters (the XML
185 spec wants it to be treated case independently so when it's uppercase
186 comparisons are faster)
187 2002-03-04 HeySt 1.0.17 Included an IFDEF for Delphi 7 (VER150) and Kylix
188 There is a new symbol HAS_CONTNRS_UNIT which is used now to
189 distinguish between IDEs which come with the Contnrs unit and
190 those that don't.
191 *)
192
193 UNIT libxmlparser;
194
195 {$I jedi-sdl.inc}
196
197 INTERFACE
198
199 USES
200 SysUtils, Classes,
201 (*$IFDEF HAS_CONTNRS_UNIT *) // The Contnrs Unit was introduced in Delphi 5
202 Contnrs,
203 (*$ENDIF*)
204 Math;
205
206 CONST
207 CVersion = '1.0.17'; // This variable will be updated for every release
208 // (I hope, I won't forget to do it everytime ...)
209
210 TYPE
211 TPartType = // --- Document Part Types
212 (ptNone, // Nothing
213 ptXmlProlog, // XML Prolog XmlSpec 2.8 / 4.3.1
214 ptComment, // Comment XmlSpec 2.5
215 ptPI, // Processing Instruction XmlSpec 2.6
216 ptDtdc, // Document Type Declaration XmlSpec 2.8
217 ptStartTag, // Start Tag XmlSpec 3.1
218 ptEmptyTag, // Empty-Element Tag XmlSpec 3.1
219 ptEndTag, // End Tag XmlSpec 3.1
220 ptContent, // Text Content between Tags
221 ptCData); // CDATA Section XmlSpec 2.7
222
223 TDtdElemType = // --- DTD Elements
224 (deElement, // !ELEMENT declaration
225 deAttList, // !ATTLIST declaration
226 deEntity, // !ENTITY declaration
227 deNotation, // !NOTATION declaration
228 dePI, // PI in DTD
229 deComment, // Comment in DTD
230 deError); // Error found in the DTD
231
232 TYPE
233 TAttrList = CLASS;
234 TEntityStack = CLASS;
235 TNvpList = CLASS;
236 TElemDef = CLASS;
237 TElemList = CLASS;
238 TEntityDef = CLASS;
239 TNotationDef = CLASS;
240
241 TDtdElementRec = RECORD // --- This Record is returned by the DTD parser callback function
242 Start, Final : PChar; // Start/End of the Element's Declaration
243 CASE ElementType : TDtdElemType OF // Type of the Element
244 deElement, // <!ELEMENT>
245 deAttList : (ElemDef : TElemDef); // <!ATTLIST>
246 deEntity : (EntityDef : TEntityDef); // <!ENTITY>
247 deNotation : (NotationDef : TNotationDef); // <!NOTATION>
248 dePI : (Target : PChar; // <?PI ?>
249 Content : PChar;
250 AttrList : TAttrList);
251 deError : (Pos : PChar); // Error
252 // deComment : ((No additional fields here)); // <!-- Comment -->
253 END;
254
255 TXmlParser = CLASS // --- Internal Properties and Methods
256 PROTECTED
257 FBuffer : PChar; // NIL if there is no buffer available
258 FBufferSize : INTEGER; // 0 if the buffer is not owned by the Document instance
259 FSource : STRING; // Name of Source of document. Filename for Documents loaded with LoadFromFile
260
261 FXmlVersion : STRING; // XML version from Document header. Default is '1.0'
262 FEncoding : STRING; // Encoding from Document header. Default is 'UTF-8'
263 FStandalone : BOOLEAN; // Standalone declaration from Document header. Default is 'yes'
264 FRootName : STRING; // Name of the Root Element (= DTD name)
265 FDtdcFinal : PChar; // Pointer to the '>' character terminating the DTD declaration
266
267 FNormalize : BOOLEAN; // If true: Pack Whitespace and don't return empty contents
268 EntityStack : TEntityStack; // Entity Stack for Parameter and General Entities
269 FCurEncoding : STRING; // Current Encoding during parsing (always uppercase)
270
271 PROCEDURE AnalyzeProlog; // Analyze XML Prolog or Text Declaration
272 PROCEDURE AnalyzeComment (Start : PChar; VAR Final : PChar); // Analyze Comments
273 PROCEDURE AnalyzePI (Start : PChar; VAR Final : PChar); // Analyze Processing Instructions (PI)
274 PROCEDURE AnalyzeDtdc; // Analyze Document Type Declaration
275 PROCEDURE AnalyzeDtdElements (Start : PChar; VAR Final : PChar); // Analyze DTD declarations
276 PROCEDURE AnalyzeTag; // Analyze Start/End/Empty-Element Tags
277 PROCEDURE AnalyzeCData; // Analyze CDATA Sections
278 PROCEDURE AnalyzeText (VAR IsDone : BOOLEAN); // Analyze Text Content between Tags
279 PROCEDURE AnalyzeElementDecl (Start : PChar; VAR Final : PChar);
280 PROCEDURE AnalyzeAttListDecl (Start : PChar; VAR Final : PChar);
281 PROCEDURE AnalyzeEntityDecl (Start : PChar; VAR Final : PChar);
282 PROCEDURE AnalyzeNotationDecl (Start : PChar; VAR Final : PChar);
283
284 PROCEDURE PushPE (VAR Start : PChar);
285 PROCEDURE ReplaceCharacterEntities (VAR Str : STRING);
286 PROCEDURE ReplaceParameterEntities (VAR Str : STRING);
287 PROCEDURE ReplaceGeneralEntities (VAR Str : STRING);
288
289 FUNCTION GetDocBuffer : PChar; // Returns FBuffer or a pointer to a NUL char if Buffer is empty
290
291 PUBLIC // --- Document Properties
292 PROPERTY XmlVersion : STRING READ FXmlVersion; // XML version from the Document Prolog
293 PROPERTY Encoding : STRING READ FEncoding; // Document Encoding from Prolog
294 PROPERTY Standalone : BOOLEAN READ FStandalone; // Standalone Declaration from Prolog
295 PROPERTY RootName : STRING READ FRootName; // Name of the Root Element
296 PROPERTY Normalize : BOOLEAN READ FNormalize WRITE FNormalize; // True if Content is to be normalized
297 PROPERTY Source : STRING READ FSource; // Name of Document Source (Filename)
298 PROPERTY DocBuffer : PChar READ GetDocBuffer; // Returns document buffer
299 PUBLIC // --- DTD Objects
300 Elements : TElemList; // Elements: List of TElemDef (contains Attribute Definitions)
301 Entities : TNvpList; // General Entities: List of TEntityDef
302 ParEntities : TNvpList; // Parameter Entities: List of TEntityDef
303 Notations : TNvpList; // Notations: List of TNotationDef
304 PUBLIC
305 CONSTRUCTOR Create;
306 DESTRUCTOR Destroy; OVERRIDE;
307
308 // --- Document Handling
309 FUNCTION LoadFromFile (Filename : STRING;
310 FileMode : INTEGER = fmOpenRead OR fmShareDenyNone) : BOOLEAN;
311 // Loads Document from given file
312 FUNCTION LoadFromBuffer (Buffer : PChar) : BOOLEAN; // Loads Document from another buffer
313 PROCEDURE SetBuffer (Buffer : PChar); // References another buffer
314 PROCEDURE Clear; // Clear Document
315
316 PUBLIC
317 // --- Scanning through the document
318 CurPartType : TPartType; // Current Type
319 CurName : STRING; // Current Name
320 CurContent : STRING; // Current Normalized Content
321 CurStart : PChar; // Current First character
322 CurFinal : PChar; // Current Last character
323 CurAttr : TAttrList; // Current Attribute List
324 PROPERTY CurEncoding : STRING READ FCurEncoding; // Current Encoding
325 PROCEDURE StartScan;
326 FUNCTION Scan : BOOLEAN;
327
328 // --- Events / Callbacks
329 FUNCTION LoadExternalEntity (SystemId, PublicId,
330 Notation : STRING) : TXmlParser; VIRTUAL;
331 FUNCTION TranslateEncoding (CONST Source : STRING) : STRING; VIRTUAL;
332 PROCEDURE DtdElementFound (DtdElementRec : TDtdElementRec); VIRTUAL;
333 END;
334
335 TValueType = // --- Attribute Value Type
336 (vtNormal, // Normal specified Attribute
337 vtImplied, // #IMPLIED attribute value
338 vtFixed, // #FIXED attribute value
339 vtDefault); // Attribute value from default value in !ATTLIST declaration
340
341 TAttrDefault = // --- Attribute Default Type
342 (adDefault, // Normal default value
343 adRequired, // #REQUIRED attribute
344 adImplied, // #IMPLIED attribute
345 adFixed); // #FIXED attribute
346
347 TAttrType = // --- Type of attribute
348 (atUnknown, // Unknown type
349 atCData, // Character data only
350 atID, // ID
351 atIdRef, // ID Reference
352 atIdRefs, // Several ID References, separated by Whitespace
353 atEntity, // Name of an unparsed Entity
354 atEntities, // Several unparsed Entity names, separated by Whitespace
355 atNmToken, // Name Token
356 atNmTokens, // Several Name Tokens, separated by Whitespace
357 atNotation, // A selection of Notation names (Unparsed Entity)
358 atEnumeration); // Enumeration
359
360 TElemType = // --- Element content type
361 (etEmpty, // Element is always empty
362 etAny, // Element can have any mixture of PCDATA and any elements
363 etChildren, // Element must contain only elements
364 etMixed); // Mixed PCDATA and elements
365
366 (*$IFDEF HAS_CONTNRS_UNIT *)
367 TObjectList = Contnrs.TObjectList; // Re-Export this identifier
368 (*$ELSE *)
369 TObjectList = CLASS (TList)
370 DESTRUCTOR Destroy; OVERRIDE;
371 PROCEDURE Delete (Index : INTEGER);
372 PROCEDURE Clear; OVERRIDE;
373 END;
374 (*$ENDIF *)
375
376 TNvpNode = CLASS // Name-Value Pair Node
377 Name : STRING;
378 Value : STRING;
379 CONSTRUCTOR Create (TheName : STRING = ''; TheValue : STRING = '');
380 END;
381
382 TNvpList = CLASS (TObjectList) // Name-Value Pair List
383 PROCEDURE Add (Node : TNvpNode);
384 FUNCTION Node (Name : STRING) : TNvpNode; OVERLOAD;
385 FUNCTION Node (Index : INTEGER) : TNvpNode; OVERLOAD;
386 FUNCTION Value (Name : STRING) : STRING; OVERLOAD;
387 FUNCTION Value (Index : INTEGER) : STRING; OVERLOAD;
388 FUNCTION Name (Index : INTEGER) : STRING;
389 END;
390
391 TAttr = CLASS (TNvpNode) // Attribute of a Start-Tag or Empty-Element-Tag
392 ValueType : TValueType;
393 AttrType : TAttrType;
394 END;
395
396 TAttrList = CLASS (TNvpList) // List of Attributes
397 PROCEDURE Analyze (Start : PChar; VAR Final : PChar);
398 END;
399
400 TEntityStack = CLASS (TObjectList) // Stack where current position is stored before parsing entities
401 PROTECTED
402 Owner : TXmlParser;
403 PUBLIC
404 CONSTRUCTOR Create (TheOwner : TXmlParser);
405 PROCEDURE Push (LastPos : PChar); OVERLOAD;
406 PROCEDURE Push (Instance : TObject; LastPos : PChar); OVERLOAD;
407 FUNCTION Pop : PChar; // Returns next char or NIL if EOF is reached. Frees Instance.
408 END;
409
410 TAttrDef = CLASS (TNvpNode) // Represents a <!ATTLIST Definition. "Value" is the default value
411 TypeDef : STRING; // Type definition from the DTD
412 Notations : STRING; // Notation List, separated by pipe symbols '|'
413 AttrType : TAttrType; // Attribute Type
414 DefaultType : TAttrDefault; // Default Type
415 END;
416
417 TElemDef = CLASS (TNvpList) // Represents a <!ELEMENT Definition. Is a list of TAttrDef-Nodes
418 Name : STRING; // Element name
419 ElemType : TElemType; // Element type
420 Definition : STRING; // Element definition from DTD
421 END;
422
423 TElemList = CLASS (TObjectList) // List of TElemDef nodes
424 FUNCTION Node (Name : STRING) : TElemDef;
425 PROCEDURE Add (Node : TElemDef);
426 END;
427
428 TEntityDef = CLASS (TNvpNode) // Represents a <!ENTITY Definition.
429 SystemId : STRING;
430 PublicId : STRING;
431 NotationName : STRING;
432 END;
433
434 TNotationDef = CLASS (TNvpNode) // Represents a <!NOTATION Definition. Value is the System ID
435 PublicId : STRING;
436 END;
437
438 TCharset = SET OF CHAR;
439
440
441 CONST
442 CWhitespace = [#32, #9, #13, #10]; // Whitespace characters (XmlSpec 2.3)
443 CLetter = [#$41..#$5A, #$61..#$7A, #$C0..#$D6, #$D8..#$F6, #$F8..#$FF];
444 CDigit = [#$30..#$39];
445 CNameChar = CLetter + CDigit + ['.', '-', '_', ':', #$B7];
446 CNameStart = CLetter + ['_', ':'];
447 CQuoteChar = ['"', ''''];
448 CPubidChar = [#32, ^M, ^J, #9, 'a'..'z', 'A'..'Z', '0'..'9',
449 '-', '''', '(', ')', '+', ',', '.', '/', ':',
450 '=', '?', ';', '!', '*', '#', '@', '$', '_', '%'];
451
452 CDStart = '<![CDATA[';
453 CDEnd = ']]>';
454
455 // --- Name Constants for the above enumeration types
456 CPartType_Name : ARRAY [TPartType] OF STRING =
457 ('', 'XML Prolog', 'Comment', 'PI',
458 'DTD Declaration', 'Start Tag', 'Empty Tag', 'End Tag',
459 'Text', 'CDATA');
460 CValueType_Name : ARRAY [TValueType] OF STRING = ('Normal', 'Implied', 'Fixed', 'Default');
461 CAttrDefault_Name : ARRAY [TAttrDefault] OF STRING = ('Default', 'Required', 'Implied', 'Fixed');
462 CElemType_Name : ARRAY [TElemType] OF STRING = ('Empty', 'Any', 'Childs only', 'Mixed');
463 CAttrType_Name : ARRAY [TAttrType] OF STRING = ('Unknown', 'CDATA',
464 'ID', 'IDREF', 'IDREFS',
465 'ENTITY', 'ENTITIES',
466 'NMTOKEN', 'NMTOKENS',
467 'Notation', 'Enumeration');
468
469 FUNCTION ConvertWs (Source: STRING; PackWs: BOOLEAN) : STRING; // Convert WS to spaces #x20
470 PROCEDURE SetStringSF (VAR S : STRING; BufferStart, BufferFinal : PChar); // SetString by Start/Final of buffer
471 FUNCTION StrSFPas (Start, Finish : PChar) : STRING; // Convert buffer part to Pascal string
472 FUNCTION TrimWs (Source : STRING) : STRING; // Trim Whitespace
473
474 FUNCTION AnsiToUtf8 (Source : ANSISTRING) : STRING; // Convert Win-1252 to UTF-8
475 FUNCTION Utf8ToAnsi (Source : STRING; UnknownChar : CHAR = '�') : ANSISTRING; // Convert UTF-8 to Win-1252
476
477
478 (*
479 ===============================================================================================
480 TCustomXmlScanner event based component wrapper for TXmlParser
481 ===============================================================================================
482 *)
483
484 TYPE
485 TCustomXmlScanner = CLASS;
486 TXmlPrologEvent = PROCEDURE (Sender : TObject; XmlVersion, Encoding: STRING; Standalone : BOOLEAN) OF OBJECT;
487 TCommentEvent = PROCEDURE (Sender : TObject; Comment : STRING) OF OBJECT;
488 TPIEvent = PROCEDURE (Sender : TObject; Target, Content: STRING; Attributes : TAttrList) OF OBJECT;
489 TDtdEvent = PROCEDURE (Sender : TObject; RootElementName : STRING) OF OBJECT;
490 TStartTagEvent = PROCEDURE (Sender : TObject; TagName : STRING; Attributes : TAttrList) OF OBJECT;
491 TEndTagEvent = PROCEDURE (Sender : TObject; TagName : STRING) OF OBJECT;
492 TContentEvent = PROCEDURE (Sender : TObject; Content : STRING) OF OBJECT;
493 TElementEvent = PROCEDURE (Sender : TObject; ElemDef : TElemDef) OF OBJECT;
494 TEntityEvent = PROCEDURE (Sender : TObject; EntityDef : TEntityDef) OF OBJECT;
495 TNotationEvent = PROCEDURE (Sender : TObject; NotationDef : TNotationDef) OF OBJECT;
496 TErrorEvent = PROCEDURE (Sender : TObject; ErrorPos : PChar) OF OBJECT;
497 TExternalEvent = PROCEDURE (Sender : TObject; SystemId, PublicId, NotationId : STRING;
498 VAR Result : TXmlParser) OF OBJECT;
499 TEncodingEvent = FUNCTION (Sender : TObject; CurrentEncoding, Source : STRING) : STRING OF OBJECT;
500
501
502 TCustomXmlScanner = CLASS (TComponent)
503 PROTECTED
504 FXmlParser : TXmlParser;
505 FOnXmlProlog : TXmlPrologEvent;
506 FOnComment : TCommentEvent;
507 FOnPI : TPIEvent;
508 FOnDtdRead : TDtdEvent;
509 FOnStartTag : TStartTagEvent;
510 FOnEmptyTag : TStartTagEvent;
511 FOnEndTag : TEndTagEvent;
512 FOnContent : TContentEvent;
513 FOnCData : TContentEvent;
514 FOnElement : TElementEvent;
515 FOnAttList : TElementEvent;
516 FOnEntity : TEntityEvent;
517 FOnNotation : TNotationEvent;
518 FOnDtdError : TErrorEvent;
519 FOnLoadExternal : TExternalEvent;
520 FOnTranslateEncoding : TEncodingEvent;
521 FStopParser : BOOLEAN;
522 FUNCTION GetNormalize : BOOLEAN;
523 PROCEDURE SetNormalize (Value : BOOLEAN);
524
525 PROCEDURE WhenXmlProlog(XmlVersion, Encoding: STRING; Standalone : BOOLEAN); VIRTUAL;
526 PROCEDURE WhenComment (Comment : STRING); VIRTUAL;
527 PROCEDURE WhenPI (Target, Content: STRING; Attributes : TAttrList); VIRTUAL;
528 PROCEDURE WhenDtdRead (RootElementName : STRING); VIRTUAL;
529 PROCEDURE WhenStartTag (TagName : STRING; Attributes : TAttrList); VIRTUAL;
530 PROCEDURE WhenEmptyTag (TagName : STRING; Attributes : TAttrList); VIRTUAL;
531 PROCEDURE WhenEndTag (TagName : STRING); VIRTUAL;
532 PROCEDURE WhenContent (Content : STRING); VIRTUAL;
533 PROCEDURE WhenCData (Content : STRING); VIRTUAL;
534 PROCEDURE WhenElement (ElemDef : TElemDef); VIRTUAL;
535 PROCEDURE WhenAttList (ElemDef : TElemDef); VIRTUAL;
536 PROCEDURE WhenEntity (EntityDef : TEntityDef); VIRTUAL;
537 PROCEDURE WhenNotation (NotationDef : TNotationDef); VIRTUAL;
538 PROCEDURE WhenDtdError (ErrorPos : PChar); VIRTUAL;
539
540 PUBLIC
541 CONSTRUCTOR Create (AOwner: TComponent); OVERRIDE;
542 DESTRUCTOR Destroy; OVERRIDE;
543
544 PROCEDURE LoadFromFile (Filename : TFilename); // Load XML Document from file
545 PROCEDURE LoadFromBuffer (Buffer : PChar); // Load XML Document from buffer
546 PROCEDURE SetBuffer (Buffer : PChar); // Refer to Buffer
547 FUNCTION GetFilename : TFilename;
548
549 PROCEDURE Execute; // Perform scanning
550
551 PROTECTED
552 PROPERTY XmlParser : TXmlParser READ FXmlParser;
553 PROPERTY StopParser : BOOLEAN READ FStopParser WRITE FStopParser;
554 PROPERTY Filename : TFilename READ GetFilename WRITE LoadFromFile;
555 PROPERTY Normalize : BOOLEAN READ GetNormalize WRITE SetNormalize;
556 PROPERTY OnXmlProlog : TXmlPrologEvent READ FOnXmlProlog WRITE FOnXmlProlog;
557 PROPERTY OnComment : TCommentEvent READ FOnComment WRITE FOnComment;
558 PROPERTY OnPI : TPIEvent READ FOnPI WRITE FOnPI;
559 PROPERTY OnDtdRead : TDtdEvent READ FOnDtdRead WRITE FOnDtdRead;
560 PROPERTY OnStartTag : TStartTagEvent READ FOnStartTag WRITE FOnStartTag;
561 PROPERTY OnEmptyTag : TStartTagEvent READ FOnEmptyTag WRITE FOnEmptyTag;
562 PROPERTY OnEndTag : TEndTagEvent READ FOnEndTag WRITE FOnEndTag;
563 PROPERTY OnContent : TContentEvent READ FOnContent WRITE FOnContent;
564 PROPERTY OnCData : TContentEvent READ FOnCData WRITE FOnCData;
565 PROPERTY OnElement : TElementEvent READ FOnElement WRITE FOnElement;
566 PROPERTY OnAttList : TElementEvent READ FOnAttList WRITE FOnAttList;
567 PROPERTY OnEntity : TEntityEvent READ FOnEntity WRITE FOnEntity;
568 PROPERTY OnNotation : TNotationEvent READ FOnNotation WRITE FOnNotation;
569 PROPERTY OnDtdError : TErrorEvent READ FOnDtdError WRITE FOnDtdError;
570 PROPERTY OnLoadExternal : TExternalEvent READ FOnLoadExternal WRITE FOnLoadExternal;
571 PROPERTY OnTranslateEncoding : TEncodingEvent READ FOnTranslateEncoding WRITE FOnTranslateEncoding;
572 END;
573
574 (*
575 ===============================================================================================
576 IMPLEMENTATION
577 ===============================================================================================
578 *)
579
580 IMPLEMENTATION
581
582
583 (*
584 ===============================================================================================
585 Unicode and UTF-8 stuff
586 ===============================================================================================
587 *)
588
589 CONST
590 // --- Character Translation Table for Unicode <-> Win-1252
591 WIN1252_UNICODE : ARRAY [$00..$FF] OF WORD = (
592 $0000, $0001, $0002, $0003, $0004, $0005, $0006, $0007, $0008, $0009,
593 $000A, $000B, $000C, $000D, $000E, $000F, $0010, $0011, $0012, $0013,
594 $0014, $0015, $0016, $0017, $0018, $0019, $001A, $001B, $001C, $001D,
595 $001E, $001F, $0020, $0021, $0022, $0023, $0024, $0025, $0026, $0027,
596 $0028, $0029, $002A, $002B, $002C, $002D, $002E, $002F, $0030, $0031,
597 $0032, $0033, $0034, $0035, $0036, $0037, $0038, $0039, $003A, $003B,
598 $003C, $003D, $003E, $003F, $0040, $0041, $0042, $0043, $0044, $0045,
599 $0046, $0047, $0048, $0049, $004A, $004B, $004C, $004D, $004E, $004F,
600 $0050, $0051, $0052, $0053, $0054, $0055, $0056, $0057, $0058, $0059,
601 $005A, $005B, $005C, $005D, $005E, $005F, $0060, $0061, $0062, $0063,
602 $0064, $0065, $0066, $0067, $0068, $0069, $006A, $006B, $006C, $006D,
603 $006E, $006F, $0070, $0071, $0072, $0073, $0074, $0075, $0076, $0077,
604 $0078, $0079, $007A, $007B, $007C, $007D, $007E, $007F,
605
606 $20AC, $0081, $201A, $0192, $201E, $2026, $2020, $2021, $02C6, $2030,
607 $0160, $2039, $0152, $008D, $017D, $008F, $0090, $2018, $2019, $201C,
608 $201D, $2022, $2013, $2014, $02DC, $2122, $0161, $203A, $0153, $009D,
609 $017E, $0178, $00A0, $00A1, $00A2, $00A3, $00A4, $00A5, $00A6, $00A7,
610 $00A8, $00A9, $00AA, $00AB, $00AC, $00AD, $00AE, $00AF, $00B0, $00B1,
611 $00B2, $00B3, $00B4, $00B5, $00B6, $00B7, $00B8, $00B9, $00BA, $00BB,
612 $00BC, $00BD, $00BE, $00BF, $00C0, $00C1, $00C2, $00C3, $00C4, $00C5,
613 $00C6, $00C7, $00C8, $00C9, $00CA, $00CB, $00CC, $00CD, $00CE, $00CF,
614 $00D0, $00D1, $00D2, $00D3, $00D4, $00D5, $00D6, $00D7, $00D8, $00D9,
615 $00DA, $00DB, $00DC, $00DD, $00DE, $00DF, $00E0, $00E1, $00E2, $00E3,
616 $00E4, $00E5, $00E6, $00E7, $00E8, $00E9, $00EA, $00EB, $00EC, $00ED,
617 $00EE, $00EF, $00F0, $00F1, $00F2, $00F3, $00F4, $00F5, $00F6, $00F7,
618 $00F8, $00F9, $00FA, $00FB, $00FC, $00FD, $00FE, $00FF);
619
620 (* UTF-8 (somewhat simplified)
621 -----
622 Character Range Byte sequence
623 --------------- -------------------------- (x=Bits from original character)
624 $0000..$007F 0xxxxxxx
625 $0080..$07FF 110xxxxx 10xxxxxx
626 $8000..$FFFF 1110xxxx 10xxxxxx 10xxxxxx
627
628 Example
629 --------
630 Transforming the Unicode character U+00E4 LATIN SMALL LETTER A WITH DIAERESIS ("�"):
631
632 ISO-8859-1, Decimal 228
633 Win1252, Hex $E4
634 ANSI Bin 1110 0100
635 abcd efgh
636
637 UTF-8 Binary 1100xxab 10cdefgh
638 Binary 11000011 10100100
639 Hex $C3 $A4
640 Decimal 195 164
641 ANSI � � *)
642
643
644 FUNCTION AnsiToUtf8 (Source : ANSISTRING) : STRING;
645 (* Converts the given Windows ANSI (Win1252) String to UTF-8. *)
646 VAR
647 I : INTEGER; // Loop counter
648 U : WORD; // Current Unicode value
649 Len : INTEGER; // Current real length of "Result" string
650 BEGIN
651 SetLength (Result, Length (Source) * 3); // Worst case
652 Len := 0;
653 FOR I := 1 TO Length (Source) DO BEGIN
654 U := WIN1252_UNICODE [ORD (Source [I])];
655 CASE U OF
656 $0000..$007F : BEGIN
657 INC (Len);
658 Result [Len] := CHR (U);
659 END;
660 $0080..$07FF : BEGIN
661 INC (Len);
662 Result [Len] := CHR ($C0 OR (U SHR 6));
663 INC (Len);
664 Result [Len] := CHR ($80 OR (U AND $3F));
665 END;
666 $0800..$FFFF : BEGIN
667 INC (Len);
668 Result [Len] := CHR ($E0 OR (U SHR 12));
669 INC (Len);
670 Result [Len] := CHR ($80 OR ((U SHR 6) AND $3F));
671 INC (Len);
672 Result [Len] := CHR ($80 OR (U AND $3F));
673 END;
674 END;
675 END;
676 SetLength (Result, Len);
677 END;
678
679
680 FUNCTION Utf8ToAnsi (Source : STRING; UnknownChar : CHAR = '�') : ANSISTRING;
681 (* Converts the given UTF-8 String to Windows ANSI (Win-1252).
682 If a character can not be converted, the "UnknownChar" is inserted. *)
683 VAR
684 SourceLen : INTEGER; // Length of Source string
685 I, K : INTEGER;
686 A : BYTE; // Current ANSI character value
687 U : WORD;
688 Ch : CHAR; // Dest char
689 Len : INTEGER; // Current real length of "Result" string
690 BEGIN
691 SourceLen := Length (Source);
692 SetLength (Result, SourceLen); // Enough room to live
693 Len := 0;
694 I := 1;
695 WHILE I <= SourceLen DO BEGIN
696 A := ORD (Source [I]);
697 IF A < $80 THEN BEGIN // Range $0000..$007F
698 INC (Len);
699 Result [Len] := Source [I];
700 INC (I);
701 END
702 ELSE BEGIN // Determine U, Inc I
703 IF (A AND $E0 = $C0) AND (I < SourceLen) THEN BEGIN // Range $0080..$07FF
704 U := (WORD (A AND $1F) SHL 6) OR (ORD (Source [I+1]) AND $3F);
705 INC (I, 2);
706 END
707 ELSE IF (A AND $F0 = $E0) AND (I < SourceLen-1) THEN BEGIN // Range $0800..$FFFF
708 U := (WORD (A AND $0F) SHL 12) OR
709 (WORD (ORD (Source [I+1]) AND $3F) SHL 6) OR
710 ( ORD (Source [I+2]) AND $3F);
711 INC (I, 3);
712 END
713 ELSE BEGIN // Unknown/unsupported
714 INC (I);
715 FOR K := 7 DOWNTO 0 DO
716 IF A AND (1 SHL K) = 0 THEN BEGIN
717 INC (I, (A SHR (K+1))-1);
718 BREAK;
719 END;
720 U := WIN1252_UNICODE [ORD (UnknownChar)];
721 END;
722 Ch := UnknownChar; // Retrieve ANSI char
723 FOR A := $00 TO $FF DO
724 IF WIN1252_UNICODE [A] = U THEN BEGIN
725 Ch := CHR (A);
726 BREAK;
727 END;
728 INC (Len);
729 Result [Len] := Ch;
730 END;
731 END;
732 SetLength (Result, Len);
733 END;
734
735
736 (*
737 ===============================================================================================
738 "Special" Helper Functions
739
740 Don't ask me why. But including these functions makes the parser *DRAMATICALLY* faster
741 on my K6-233 machine. You can test it yourself just by commenting them out.
742 They do exactly the same as the Assembler routines defined in SysUtils.
743 (This is where you can see how great the Delphi compiler really is. The compiled code is
744 faster than hand-coded assembler!)
745 ===============================================================================================
746 --> Just move this line below the StrScan function --> *)
747
748
749 FUNCTION StrPos (CONST Str, SearchStr : PChar) : PChar;
750 // Same functionality as SysUtils.StrPos
751 VAR
752 First : CHAR;
753 Len : INTEGER;
754 BEGIN
755 First := SearchStr^;
756 Len := StrLen (SearchStr);
757 Result := Str;
758 REPEAT
759 IF Result^ = First THEN
760 IF StrLComp (Result, SearchStr, Len) = 0 THEN BREAK;
761 IF Result^ = #0 THEN BEGIN
762 Result := NIL;
763 BREAK;
764 END;
765 INC (Result);
766 UNTIL FALSE;
767 END;
768
769
StrScannull770 FUNCTION StrScan (CONST Start : PChar; CONST Ch : CHAR) : PChar;
771 // Same functionality as SysUtils.StrScan
772 BEGIN
773 Result := Start;
774 WHILE Result^ <> Ch DO BEGIN
775 IF Result^ = #0 THEN BEGIN
776 Result := NIL;
777 EXIT;
778 END;
779 INC (Result);
780 END;
781 END;
782
783
784 (*
785 ===============================================================================================
786 Helper Functions
787 ===============================================================================================
788 *)
789
DelCharsnull790 FUNCTION DelChars (Source : STRING; CharsToDelete : TCharset) : STRING;
791 // Delete all "CharsToDelete" from the string
792 VAR
793 I : INTEGER;
794 BEGIN
795 Result := Source;
796 FOR I := Length (Result) DOWNTO 1 DO
797 IF Result [I] IN CharsToDelete THEN
798 Delete (Result, I, 1);
799 END;
800
801
TrimWsnull802 FUNCTION TrimWs (Source : STRING) : STRING;
803 // Trimms off Whitespace characters from both ends of the string
804 VAR
805 I : INTEGER;
806 BEGIN
807 // --- Trim Left
808 I := 1;
809 WHILE (I <= Length (Source)) AND (Source [I] IN CWhitespace) DO
810 INC (I);
811 Result := Copy (Source, I, MaxInt);
812
813 // --- Trim Right
814 I := Length (Result);
815 WHILE (I > 1) AND (Result [I] IN CWhitespace) DO
816 DEC (I);
817 Delete (Result, I+1, Length (Result)-I);
818 END;
819
820
ConvertWsnull821 FUNCTION ConvertWs (Source: STRING; PackWs: BOOLEAN) : STRING;
822 // Converts all Whitespace characters to the Space #x20 character
823 // If "PackWs" is true, contiguous Whitespace characters are packed to one
824 VAR
825 I : INTEGER;
826 BEGIN
827 Result := Source;
828 FOR I := Length (Result) DOWNTO 1 DO
829 IF (Result [I] IN CWhitespace) THEN
830 IF PackWs AND (I > 1) AND (Result [I-1] IN CWhitespace)
831 THEN Delete (Result, I, 1)
832 ELSE Result [I] := #32;
833 END;
834
835
836 PROCEDURE SetStringSF (VAR S : STRING; BufferStart, BufferFinal : PChar);
837 BEGIN
838 SetString (S, BufferStart, BufferFinal-BufferStart+1);
839 END;
840
841
StrLPasnull842 FUNCTION StrLPas (Start : PChar; Len : INTEGER) : STRING;
843 BEGIN
844 SetString (Result, Start, Len);
845 END;
846
847
StrSFPasnull848 FUNCTION StrSFPas (Start, Finish : PChar) : STRING;
849 BEGIN
850 SetString (Result, Start, Finish-Start+1);
851 END;
852
853
StrScanEnull854 FUNCTION StrScanE (CONST Source : PChar; CONST CharToScanFor : CHAR) : PChar;
855 // If "CharToScanFor" is not found, StrScanE returns the last char of the
856 // buffer instead of NIL
857 BEGIN
858 Result := StrScan (Source, CharToScanFor);
859 IF Result = NIL THEN
860 Result := StrEnd (Source)-1;
861 END;
862
863
864 PROCEDURE ExtractName (Start : PChar; Terminators : TCharset; VAR Final : PChar);
865 (* Extracts the complete Name beginning at "Start".
866 It is assumed that the name is contained in Markup, so the '>' character is
867 always a Termination.
868 Start: IN Pointer to first char of name. Is always considered to be valid
869 Terminators: IN Characters which terminate the name
870 Final: OUT Pointer to last char of name *)
871 BEGIN
872 Final := Start+1;
873 Include (Terminators, #0);
874 Include (Terminators, '>');
875 WHILE NOT (Final^ IN Terminators) DO
876 INC (Final);
877 DEC (Final);
878 END;
879
880
881 PROCEDURE ExtractQuote (Start : PChar; VAR Content : STRING; VAR Final : PChar);
882 (* Extract a string which is contained in single or double Quotes.
883 Start: IN Pointer to opening quote
884 Content: OUT The quoted string
885 Final: OUT Pointer to closing quote *)
886 BEGIN
887 Final := StrScan (Start+1, Start^);
888 IF Final = NIL THEN BEGIN
889 Final := StrEnd (Start+1)-1;
890 SetString (Content, Start+1, Final-Start);
891 END
892 ELSE
893 SetString (Content, Start+1, Final-1-Start);
894 END;
895
896
897 (*
898 ===============================================================================================
899 TEntityStackNode
900 This Node is pushed to the "Entity Stack" whenever the parser parses entity replacement text.
901 The "Instance" field holds the Instance pointer of an External Entity buffer. When it is
902 popped, the Instance is freed.
903 The "Encoding" field holds the name of the Encoding. External Parsed Entities may have
904 another encoding as the document entity (XmlSpec 4.3.3). So when there is an "<?xml" PI
905 found in the stream (= Text Declaration at the beginning of external parsed entities), the
906 Encoding found there is used for the External Entity (is assigned to TXmlParser.CurEncoding)
907 Default Encoding is for the Document Entity is UTF-8. It is assumed that External Entities
908 have the same Encoding as the Document Entity, unless they carry a Text Declaration.
909 ===============================================================================================
910 *)
911
912 TYPE
913 TEntityStackNode = CLASS
914 Instance : TObject;
915 Encoding : STRING;
916 LastPos : PChar;
917 END;
918
919 (*
920 ===============================================================================================
921 TEntityStack
922 For nesting of Entities.
923 When there is an entity reference found in the data stream, the corresponding entity
924 definition is searched and the current position is pushed to this stack.
925 From then on, the program scans the entitiy replacement text as if it were normal content.
926 When the parser reaches the end of an entity, the current position is popped off the
927 stack again.
928 ===============================================================================================
929 *)
930
931 CONSTRUCTOR TEntityStack.Create (TheOwner : TXmlParser);
932 BEGIN
933 INHERITED Create;
934 Owner := TheOwner;
935 END;
936
937
938 PROCEDURE TEntityStack.Push (LastPos : PChar);
939 BEGIN
940 Push (NIL, LastPos);
941 END;
942
943
944 PROCEDURE TEntityStack.Push (Instance : TObject; LastPos : PChar);
945 VAR
946 ESN : TEntityStackNode;
947 BEGIN
948 ESN := TEntityStackNode.Create;
949 ESN.Instance := Instance;
950 ESN.Encoding := Owner.FCurEncoding; // Save current Encoding
951 ESN.LastPos := LastPos;
952 Add (ESN);
953 END;
954
955
TEntityStack.Popnull956 FUNCTION TEntityStack.Pop : PChar;
957 VAR
958 ESN : TEntityStackNode;
959 BEGIN
960 IF Count > 0 THEN BEGIN
961 ESN := TEntityStackNode (Items [Count-1]);
962 Result := ESN.LastPos;
963 IF ESN.Instance <> NIL THEN
964 ESN.Instance.Free;
965 IF ESN.Encoding <> '' THEN
966 Owner.FCurEncoding := ESN.Encoding; // Restore current Encoding
967 Delete (Count-1);
968 END
969 ELSE
970 Result := NIL;
971 END;
972
973
974 (*
975 ===============================================================================================
976 TExternalID
977 -----------
978 XmlSpec 4.2.2: ExternalID ::= 'SYSTEM' S SystemLiteral |
979 'PUBLIC' S PubidLiteral S SystemLiteral
980 XmlSpec 4.7: PublicID ::= 'PUBLIC' S PubidLiteral
981 SystemLiteral and PubidLiteral are quoted
982 ===============================================================================================
983 *)
984
985 TYPE
986 TExternalID = CLASS
987 PublicId : STRING;
988 SystemId : STRING;
989 Final : PChar;
990 CONSTRUCTOR Create (Start : PChar);
991 END;
992
993 CONSTRUCTOR TExternalID.Create (Start : PChar);
994 BEGIN
995 INHERITED Create;
996 Final := Start;
997 IF StrLComp (Start, 'SYSTEM', 6) = 0 THEN BEGIN
998 WHILE NOT (Final^ IN (CQuoteChar + [#0, '>', '['])) DO INC (Final);
999 IF NOT (Final^ IN CQuoteChar) THEN EXIT;
1000 ExtractQuote (Final, SystemID, Final);
1001 END
1002 ELSE IF StrLComp (Start, 'PUBLIC', 6) = 0 THEN BEGIN
1003 WHILE NOT (Final^ IN (CQuoteChar + [#0, '>', '['])) DO INC (Final);
1004 IF NOT (Final^ IN CQuoteChar) THEN EXIT;
1005 ExtractQuote (Final, PublicID, Final);
1006 INC (Final);
1007 WHILE NOT (Final^ IN (CQuoteChar + [#0, '>', '['])) DO INC (Final);
1008 IF NOT (Final^ IN CQuoteChar) THEN EXIT;
1009 ExtractQuote (Final, SystemID, Final);
1010 END;
1011 END;
1012
1013
1014 (*
1015 ===============================================================================================
1016 TXmlParser
1017 ===============================================================================================
1018 *)
1019
1020 CONSTRUCTOR TXmlParser.Create;
1021 BEGIN
1022 INHERITED Create;
1023 FBuffer := NIL;
1024 FBufferSize := 0;
1025 Elements := TElemList.Create;
1026 Entities := TNvpList.Create;
1027 ParEntities := TNvpList.Create;
1028 Notations := TNvpList.Create;
1029 CurAttr := TAttrList.Create;
1030 EntityStack := TEntityStack.Create (Self);
1031 Clear;
1032 END;
1033
1034
1035 DESTRUCTOR TXmlParser.Destroy;
1036 BEGIN
1037 Clear;
1038 Elements.Free;
1039 Entities.Free;
1040 ParEntities.Free;
1041 Notations.Free;
1042 CurAttr.Free;
1043 EntityStack.Free;
1044 INHERITED Destroy;
1045 END;
1046
1047
1048 PROCEDURE TXmlParser.Clear;
1049 // Free Buffer and clear all object attributes
1050 BEGIN
1051 IF (FBufferSize > 0) AND (FBuffer <> NIL) THEN
1052 FreeMem (FBuffer);
1053 FBuffer := NIL;
1054 FBufferSize := 0;
1055 FSource := '';
1056 FXmlVersion := '';
1057 FEncoding := '';
1058 FStandalone := FALSE;
1059 FRootName := '';
1060 FDtdcFinal := NIL;
1061 FNormalize := TRUE;
1062 Elements.Clear;
1063 Entities.Clear;
1064 ParEntities.Clear;
1065 Notations.Clear;
1066 CurAttr.Clear;
1067 EntityStack.Clear;
1068 END;
1069
1070
TXmlParser.LoadFromFilenull1071 FUNCTION TXmlParser.LoadFromFile (Filename : STRING; FileMode : INTEGER = fmOpenRead OR fmShareDenyNone) : BOOLEAN;
1072 // Loads Document from given file
1073 // Returns TRUE if successful
1074 VAR
1075 f : FILE;
1076 ReadIn : INTEGER;
1077 OldFileMode : INTEGER;
1078 BEGIN
1079 Result := FALSE;
1080 Clear;
1081
1082 // --- Open File
1083 OldFileMode := SYSTEM.FileMode;
1084 TRY
1085 SYSTEM.FileMode := FileMode;
1086 TRY
1087 AssignFile (f, Filename);
1088 Reset (f, 1);
1089 EXCEPT
1090 EXIT;
1091 END;
1092
1093 TRY
1094 // --- Allocate Memory
1095 TRY
1096 FBufferSize := Filesize (f) + 1;
1097 GetMem (FBuffer, FBufferSize);
1098 EXCEPT
1099 Clear;
1100 EXIT;
1101 END;
1102
1103 // --- Read File
1104 TRY
1105 BlockRead (f, FBuffer^, FBufferSize, ReadIn);
1106 (FBuffer+ReadIn)^ := #0; // NULL termination
1107 EXCEPT
1108 Clear;
1109 EXIT;
1110 END;
1111 FINALLY
1112 CloseFile (f);
1113 END;
1114
1115 FSource := Filename;
1116 Result := TRUE;
1117
1118 FINALLY
1119 SYSTEM.FileMode := OldFileMode;
1120 END;
1121 END;
1122
1123
TXmlParser.LoadFromBuffernull1124 FUNCTION TXmlParser.LoadFromBuffer (Buffer : PChar) : BOOLEAN;
1125 // Loads Document from another buffer
1126 // Returns TRUE if successful
1127 // The "Source" property becomes '<MEM>' if successful
1128 BEGIN
1129 Result := FALSE;
1130 Clear;
1131 FBufferSize := StrLen (Buffer) + 1;
1132 TRY
1133 GetMem (FBuffer, FBufferSize);
1134 EXCEPT
1135 Clear;
1136 EXIT;
1137 END;
1138 StrCopy (FBuffer, Buffer);
1139 FSource := '<MEM>';
1140 Result := TRUE;
1141 END;
1142
1143
1144 PROCEDURE TXmlParser.SetBuffer (Buffer : PChar); // References another buffer
1145 BEGIN
1146 Clear;
1147 FBuffer := Buffer;
1148 FBufferSize := 0;
1149 FSource := '<REFERENCE>';
1150 END;
1151
1152
1153 //-----------------------------------------------------------------------------------------------
1154 // Scanning through the document
1155 //-----------------------------------------------------------------------------------------------
1156
1157 PROCEDURE TXmlParser.StartScan;
1158 BEGIN
1159 CurPartType := ptNone;
1160 CurName := '';
1161 CurContent := '';
1162 CurStart := NIL;
1163 CurFinal := NIL;
1164 CurAttr.Clear;
1165 EntityStack.Clear;
1166 END;
1167
1168
TXmlParser.Scannull1169 FUNCTION TXmlParser.Scan : BOOLEAN;
1170 // Scans the next Part
1171 // Returns TRUE if a part could be found, FALSE if there is no part any more
1172 //
1173 // "IsDone" can be set to FALSE by AnalyzeText in order to go to the next part
1174 // if there is no Content due to normalization
1175 VAR
1176 IsDone : BOOLEAN;
1177 BEGIN
1178 REPEAT
1179 IsDone := TRUE;
1180
1181 // --- Start of next Part
1182 IF CurStart = NIL
1183 THEN CurStart := DocBuffer
1184 ELSE CurStart := CurFinal+1;
1185 CurFinal := CurStart;
1186
1187 // --- End of Document of Pop off a new part from the Entity stack?
1188 IF CurStart^ = #0 THEN
1189 CurStart := EntityStack.Pop;
1190
1191 // --- No Document or End Of Document: Terminate Scan
1192 IF (CurStart = NIL) OR (CurStart^ = #0) THEN BEGIN
1193 CurStart := StrEnd (DocBuffer);
1194 CurFinal := CurStart-1;
1195 EntityStack.Clear;
1196 Result := FALSE;
1197 EXIT;
1198 END;
1199
1200 IF (StrLComp (CurStart, '<?xml', 5) = 0) AND
1201 ((CurStart+5)^ IN CWhitespace) THEN AnalyzeProlog // XML Declaration, Text Declaration
1202 ELSE IF StrLComp (CurStart, '<?', 2) = 0 THEN AnalyzePI (CurStart, CurFinal) // PI
1203 ELSE IF StrLComp (CurStart, '<!--', 4) = 0 THEN AnalyzeComment (CurStart, CurFinal) // Comment
1204 ELSE IF StrLComp (CurStart, '<!DOCTYPE', 9) = 0 THEN AnalyzeDtdc // DTDc
1205 ELSE IF StrLComp (CurStart, CDStart, Length (CDStart)) = 0 THEN AnalyzeCdata // CDATA Section
1206 ELSE IF StrLComp (CurStart, '<', 1) = 0 THEN AnalyzeTag // Start-Tag, End-Tag, Empty-Element-Tag
1207 ELSE AnalyzeText (IsDone); // Text Content
1208 UNTIL IsDone;
1209 Result := TRUE;
1210 END;
1211
1212
1213 PROCEDURE TXmlParser.AnalyzeProlog;
1214 // Analyze XML Prolog or Text Declaration
1215 VAR
1216 F : PChar;
1217 BEGIN
1218 CurAttr.Analyze (CurStart+5, F);
1219 IF EntityStack.Count = 0 THEN BEGIN
1220 FXmlVersion := CurAttr.Value ('version');
1221 FEncoding := CurAttr.Value ('encoding');
1222 FStandalone := CurAttr.Value ('standalone') = 'yes';
1223 END;
1224 CurFinal := StrPos (F, '?>');
1225 IF CurFinal <> NIL
1226 THEN INC (CurFinal)
1227 ELSE CurFinal := StrEnd (CurStart)-1;
1228 FCurEncoding := AnsiUpperCase (CurAttr.Value ('encoding'));
1229 IF FCurEncoding = '' THEN
1230 FCurEncoding := 'UTF-8'; // Default XML Encoding is UTF-8
1231 CurPartType := ptXmlProlog;
1232 CurName := '';
1233 CurContent := '';
1234 END;
1235
1236
1237 PROCEDURE TXmlParser.AnalyzeComment (Start : PChar; VAR Final : PChar);
1238 // Analyze Comments
1239 BEGIN
1240 Final := StrPos (Start+4, '-->');
1241 IF Final = NIL
1242 THEN Final := StrEnd (Start)-1
1243 ELSE INC (Final, 2);
1244 CurPartType := ptComment;
1245 END;
1246
1247
1248 PROCEDURE TXmlParser.AnalyzePI (Start : PChar; VAR Final : PChar);
1249 // Analyze Processing Instructions (PI)
1250 // This is also called for Character
1251 VAR
1252 F : PChar;
1253 BEGIN
1254 CurPartType := ptPI;
1255 Final := StrPos (Start+2, '?>');
1256 IF Final = NIL
1257 THEN Final := StrEnd (Start)-1
1258 ELSE INC (Final);
1259 ExtractName (Start+2, CWhitespace + ['?', '>'], F);
1260 SetStringSF (CurName, Start+2, F);
1261 SetStringSF (CurContent, F+1, Final-2);
1262 CurAttr.Analyze (F+1, F);
1263 END;
1264
1265
1266 PROCEDURE TXmlParser.AnalyzeDtdc;
1267 (* Analyze Document Type Declaration
1268 doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | PEReference | S)* ']' S?)? '>'
1269 markupdecl ::= elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment
1270 PEReference ::= '%' Name ';'
1271
1272 elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>'
1273 AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>'
1274 EntityDecl ::= '<!ENTITY' S Name S EntityDef S? '>' |
1275 '<!ENTITY' S '%' S Name S PEDef S? '>'
1276 NotationDecl ::= '<!NOTATION' S Name S (ExternalID | PublicID) S? '>'
1277 PI ::= '<?' PITarget (S (Char* - (Char* '?>' Char* )))? '?>'
1278 Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->' *)
1279 TYPE
1280 TPhase = (phName, phDtd, phInternal, phFinishing);
1281 VAR
1282 Phase : TPhase;
1283 F : PChar;
1284 ExternalID : TExternalID;
1285 ExternalDTD : TXmlParser;
1286 DER : TDtdElementRec;
1287 BEGIN
1288 DER.Start := CurStart;
1289 EntityStack.Clear; // Clear stack for Parameter Entities
1290 CurPartType := ptDtdc;
1291
1292 // --- Don't read DTDc twice
1293 IF FDtdcFinal <> NIL THEN BEGIN
1294 CurFinal := FDtdcFinal;
1295 EXIT;
1296 END;
1297
1298 // --- Scan DTDc
1299 CurFinal := CurStart + 9; // First char after '<!DOCTYPE'
1300 Phase := phName;
1301 REPEAT
1302 CASE CurFinal^ OF
1303 '%' : BEGIN
1304 PushPE (CurFinal);
1305 CONTINUE;
1306 END;
1307 #0 : IF EntityStack.Count = 0 THEN
1308 BREAK
1309 ELSE BEGIN
1310 CurFinal := EntityStack.Pop;
1311 CONTINUE;
1312 END;
1313 '[' : BEGIN
1314 Phase := phInternal;
1315 AnalyzeDtdElements (CurFinal+1, CurFinal);
1316 CONTINUE;
1317 END;
1318 ']' : Phase := phFinishing;
1319 '>' : BREAK;
1320 ELSE IF NOT (CurFinal^ IN CWhitespace) THEN BEGIN
1321 CASE Phase OF
1322 phName : IF (CurFinal^ IN CNameStart) THEN BEGIN
1323 ExtractName (CurFinal, CWhitespace + ['[', '>'], F);
1324 SetStringSF (FRootName, CurFinal, F);
1325 CurFinal := F;
1326 Phase := phDtd;
1327 END;
1328 phDtd : IF (StrLComp (CurFinal, 'SYSTEM', 6) = 0) OR
1329 (StrLComp (CurFinal, 'PUBLIC', 6) = 0) THEN BEGIN
1330 ExternalID := TExternalID.Create (CurFinal);
1331 ExternalDTD := LoadExternalEntity (ExternalId.SystemId, ExternalID.PublicId, '');
1332 F := StrPos (ExternalDtd.DocBuffer, '<!');
1333 IF F <> NIL THEN
1334 AnalyzeDtdElements (F, F);
1335 ExternalDTD.Free;
1336 CurFinal := ExternalID.Final;
1337 ExternalID.Free;
1338 END;
1339 ELSE BEGIN
1340 DER.ElementType := deError;
1341 DER.Pos := CurFinal;
1342 DER.Final := CurFinal;
1343 DtdElementFound (DER);
1344 END;
1345 END;
1346
1347 END;
1348 END;
1349 INC (CurFinal);
1350 UNTIL FALSE;
1351
1352 CurPartType := ptDtdc;
1353 CurName := '';
1354 CurContent := '';
1355
1356 // It is an error in the document if "EntityStack" is not empty now
1357 IF EntityStack.Count > 0 THEN BEGIN
1358 DER.ElementType := deError;
1359 DER.Final := CurFinal;
1360 DER.Pos := CurFinal;
1361 DtdElementFound (DER);
1362 END;
1363
1364 EntityStack.Clear; // Clear stack for General Entities
1365 FDtdcFinal := CurFinal;
1366 END;
1367
1368
1369 PROCEDURE TXmlParser.AnalyzeDtdElements (Start : PChar; VAR Final : PChar);
1370 // Analyze the "Elements" of a DTD contained in the external or
1371 // internal DTD subset.
1372 VAR
1373 DER : TDtdElementRec;
1374 BEGIN
1375 Final := Start;
1376 REPEAT
1377 CASE Final^ OF
1378 '%' : BEGIN
1379 PushPE (Final);
1380 CONTINUE;
1381 END;
1382 #0 : IF EntityStack.Count = 0 THEN
1383 BREAK
1384 ELSE BEGIN
1385 CurFinal := EntityStack.Pop;
1386 CONTINUE;
1387 END;
1388 ']',
1389 '>' : BREAK;
1390 '<' : IF StrLComp (Final, '<!ELEMENT', 9) = 0 THEN AnalyzeElementDecl (Final, Final)
1391 ELSE IF StrLComp (Final, '<!ATTLIST', 9) = 0 THEN AnalyzeAttListDecl (Final, Final)
1392 ELSE IF StrLComp (Final, '<!ENTITY', 8) = 0 THEN AnalyzeEntityDecl (Final, Final)
1393 ELSE IF StrLComp (Final, '<!NOTATION', 10) = 0 THEN AnalyzeNotationDecl (Final, Final)
1394 ELSE IF StrLComp (Final, '<?', 2) = 0 THEN BEGIN // PI in DTD
1395 DER.ElementType := dePI;
1396 DER.Start := Final;
1397 AnalyzePI (Final, Final);
1398 DER.Target := PChar (CurName);
1399 DER.Content := PChar (CurContent);
1400 DER.AttrList := CurAttr;
1401 DER.Final := Final;
1402 DtdElementFound (DER);
1403 END
1404 ELSE IF StrLComp (Final, '<!--', 4) = 0 THEN BEGIN // Comment in DTD
1405 DER.ElementType := deComment;
1406 DER.Start := Final;
1407 AnalyzeComment (Final, Final);
1408 DER.Final := Final;
1409 DtdElementFound (DER);
1410 END
1411 ELSE BEGIN
1412 DER.ElementType := deError;
1413 DER.Start := Final;
1414 DER.Pos := Final;
1415 DER.Final := Final;
1416 DtdElementFound (DER);
1417 END;
1418
1419 END;
1420 INC (Final);
1421 UNTIL FALSE;
1422 END;
1423
1424
1425 PROCEDURE TXmlParser.AnalyzeTag;
1426 // Analyze Tags
1427 VAR
1428 S, F : PChar;
1429 Attr : TAttr;
1430 ElemDef : TElemDef;
1431 AttrDef : TAttrDef;
1432 I : INTEGER;
1433 BEGIN
1434 CurPartType := ptStartTag;
1435 S := CurStart+1;
1436 IF S^ = '/' THEN BEGIN
1437 CurPartType := ptEndTag;
1438 INC (S);
1439 END;
1440 ExtractName (S, CWhitespace + ['/'], F);
1441 SetStringSF (CurName, S, F);
1442 CurAttr.Analyze (F+1, CurFinal);
1443 IF CurFinal^ = '/' THEN BEGIN
1444 CurPartType := ptEmptyTag;
1445 END;
1446 CurFinal := StrScanE (CurFinal, '>');
1447
1448 // --- Set Default Attribute values for nonexistent attributes
1449 IF (CurPartType = ptStartTag) OR (CurPartType = ptEmptyTag) THEN BEGIN
1450 ElemDef := Elements.Node (CurName);
1451 IF ElemDef <> NIL THEN BEGIN
1452 FOR I := 0 TO ElemDef.Count-1 DO BEGIN
1453 AttrDef := TAttrDef (ElemDef [I]);
1454 Attr := TAttr (CurAttr.Node (AttrDef.Name));
1455 IF (Attr = NIL) AND (AttrDef.Value <> '') THEN BEGIN
1456 Attr := TAttr.Create (AttrDef.Name, AttrDef.Value);
1457 Attr.ValueType := vtDefault;
1458 CurAttr.Add (Attr);
1459 END;
1460 IF Attr <> NIL THEN BEGIN
1461 CASE AttrDef.DefaultType OF
1462 adDefault : ;
1463 adRequired : ; // -!- It is an error in the document if "Attr.Value" is an empty string
1464 adImplied : Attr.ValueType := vtImplied;
1465 adFixed : BEGIN
1466 Attr.ValueType := vtFixed;
1467 Attr.Value := AttrDef.Value;
1468 END;
1469 END;
1470 Attr.AttrType := AttrDef.AttrType;
1471 END;
1472 END;
1473 END;
1474
1475 // --- Normalize Attribute Values. XmlSpec:
1476 // - a character reference is processed by appending the referenced character to the attribute value
1477 // - an entity reference is processed by recursively processing the replacement text of the entity
1478 // - a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20 to the normalized value,
1479 // except that only a single #x20 is appended for a "#xD#xA" sequence that is part of an external
1480 // parsed entity or the literal entity value of an internal parsed entity
1481 // - other characters are processed by appending them to the normalized value
1482 // If the declared value is not CDATA, then the XML processor must further process the
1483 // normalized attribute value by discarding any leading and trailing space (#x20) characters,
1484 // and by replacing sequences of space (#x20) characters by a single space (#x20) character.
1485 // All attributes for which no declaration has been read should be treated by a
1486 // non-validating parser as if declared CDATA.
1487 // !!! The XML 1.0 SE specification is somewhat different here
1488 // This code does not conform exactly to this specification
1489 FOR I := 0 TO CurAttr.Count-1 DO
1490 WITH TAttr (CurAttr [I]) DO BEGIN
1491 ReplaceGeneralEntities (Value);
1492 ReplaceCharacterEntities (Value);
1493 IF (AttrType <> atCData) AND (AttrType <> atUnknown)
1494 THEN Value := TranslateEncoding (TrimWs (ConvertWs (Value, TRUE)))
1495 ELSE Value := TranslateEncoding (ConvertWs (Value, FALSE));
1496 END;
1497 END;
1498 END;
1499
1500
1501 PROCEDURE TXmlParser.AnalyzeCData;
1502 // Analyze CDATA Sections
1503 BEGIN
1504 CurPartType := ptCData;
1505 CurFinal := StrPos (CurStart, CDEnd);
1506 IF CurFinal = NIL THEN BEGIN
1507 CurFinal := StrEnd (CurStart)-1;
1508 CurContent := TranslateEncoding (StrPas (CurStart+Length (CDStart)));
1509 END
1510 ELSE BEGIN
1511 SetStringSF (CurContent, CurStart+Length (CDStart), CurFinal-1);
1512 INC (CurFinal, Length (CDEnd)-1);
1513 CurContent := TranslateEncoding (CurContent);
1514 END;
1515 END;
1516
1517
1518 PROCEDURE TXmlParser.AnalyzeText (VAR IsDone : BOOLEAN);
1519 (* Analyzes Text Content between Tags. CurFinal will point to the last content character.
1520 Content ends at a '<' character or at the end of the document.
1521 Entity References and Character Entity references are resolved.
1522 If PackSpaces is TRUE, contiguous Whitespace Characters will be compressed to
1523 one Space #x20 character, Whitespace at the beginning and end of content will
1524 be trimmed off and content which is or becomes empty is not returned to
1525 the application (in this case, "IsDone" is set to FALSE which causes the
1526 Scan method to proceed directly to the next part. *)
1527
1528 PROCEDURE ProcessEntity;
1529 (* Is called if there is an ampsersand '&' character found in the document.
1530 IN "CurFinal" points to the ampersand
1531 OUT "CurFinal" points to the first character after the semi-colon ';' *)
1532 VAR
1533 P : PChar;
1534 Name : STRING;
1535 EntityDef : TEntityDef;
1536 ExternalEntity : TXmlParser;
1537 BEGIN
1538 P := StrScan (CurFinal , ';');
1539 IF P <> NIL THEN BEGIN
1540 SetStringSF (Name, CurFinal+1, P-1);
1541
1542 // Is it a Character Entity?
1543 IF (CurFinal+1)^ = '#' THEN BEGIN
1544 IF UpCase ((CurFinal+2)^) = 'X' // !!! Can't use "CHR" for Unicode characters > 255:
1545 THEN CurContent := CurContent + CHR (StrToIntDef ('$'+Copy (Name, 3, MaxInt), 32))
1546 ELSE CurContent := CurContent + CHR (StrToIntDef (Copy (Name, 2, MaxInt), 32));
1547 CurFinal := P+1;
1548 EXIT;
1549 END
1550
1551 // Is it a Predefined Entity?
1552 ELSE IF Name = 'lt' THEN BEGIN CurContent := CurContent + '<'; CurFinal := P+1; EXIT; END
1553 ELSE IF Name = 'gt' THEN BEGIN CurContent := CurContent + '>'; CurFinal := P+1; EXIT; END
1554 ELSE IF Name = 'amp' THEN BEGIN CurContent := CurContent + '&'; CurFinal := P+1; EXIT; END
1555 ELSE IF Name = 'apos' THEN BEGIN CurContent := CurContent + ''''; CurFinal := P+1; EXIT; END
1556 ELSE IF Name = 'quot' THEN BEGIN CurContent := CurContent + '"'; CurFinal := P+1; EXIT; END;
1557
1558 // Replace with Entity from DTD
1559 EntityDef := TEntityDef (Entities.Node (Name));
1560 IF EntityDef <> NIL THEN BEGIN
1561 IF EntityDef.Value <> '' THEN BEGIN
1562 EntityStack.Push (P+1);
1563 CurFinal := PChar (EntityDef.Value);
1564 END
1565 ELSE BEGIN
1566 ExternalEntity := LoadExternalEntity (EntityDef.SystemId, EntityDef.PublicId, EntityDef.NotationName);
1567 EntityStack.Push (ExternalEntity, P+1);
1568 CurFinal := ExternalEntity.DocBuffer;
1569 END;
1570 END
1571 ELSE BEGIN
1572 CurContent := CurContent + Name;
1573 CurFinal := P+1;
1574 END;
1575 END
1576 ELSE BEGIN
1577 INC (CurFinal);
1578 END;
1579 END;
1580
1581 VAR
1582 C : INTEGER;
1583 BEGIN
1584 CurFinal := CurStart;
1585 CurPartType := ptContent;
1586 CurContent := '';
1587 C := 0;
1588 REPEAT
1589 CASE CurFinal^ OF
1590 '&' : BEGIN
1591 CurContent := CurContent + TranslateEncoding (StrLPas (CurFinal-C, C));
1592 C := 0;
1593 ProcessEntity;
1594 CONTINUE;
1595 END;
1596 #0 : BEGIN
1597 IF EntityStack.Count = 0 THEN
1598 BREAK
1599 ELSE BEGIN
1600 CurContent := CurContent + TranslateEncoding (StrLPas (CurFinal-C, C));
1601 C := 0;
1602 CurFinal := EntityStack.Pop;
1603 CONTINUE;
1604 END;
1605 END;
1606 '<' : BREAK;
1607 ELSE INC (C);
1608 END;
1609 INC (CurFinal);
1610 UNTIL FALSE;
1611 CurContent := CurContent + TranslateEncoding (StrLPas (CurFinal-C, C));
1612 DEC (CurFinal);
1613
1614 IF FNormalize THEN BEGIN
1615 CurContent := ConvertWs (TrimWs (CurContent), TRUE);
1616 IsDone := CurContent <> ''; // IsDone will only get FALSE if PackSpaces is TRUE
1617 END;
1618 END;
1619
1620
1621 PROCEDURE TXmlParser.AnalyzeElementDecl (Start : PChar; VAR Final : PChar);
1622 (* Parse <!ELEMENT declaration starting at "Start"
1623 Final must point to the terminating '>' character
1624 XmlSpec 3.2:
1625 elementdecl ::= '<!ELEMENT' S Name S contentspec S? '>'
1626 contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
1627 Mixed ::= '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*' |
1628 '(' S? '#PCDATA' S? ')'
1629 children ::= (choice | seq) ('?' | '*' | '+')?
1630 choice ::= '(' S? cp ( S? '|' S? cp )* S? ')'
1631 cp ::= (Name | choice | seq) ('?' | '*' | '+')?
1632 seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'
1633
1634 More simply:
1635 contentspec ::= EMPTY
1636 ANY
1637 '(#PCDATA)'
1638 '(#PCDATA | A | B)*'
1639 '(A, B, C)'
1640 '(A | B | C)'
1641 '(A?, B*, C+),
1642 '(A, (B | C | D)* )' *)
1643 VAR
1644 Element : TElemDef;
1645 Elem2 : TElemDef;
1646 F : PChar;
1647 DER : TDtdElementRec;
1648 BEGIN
1649 Element := TElemDef.Create;
1650 Final := Start + 9;
1651 DER.Start := Start;
1652 REPEAT
1653 IF Final^ = '>' THEN BREAK;
1654 IF (Final^ IN CNameStart) AND (Element.Name = '') THEN BEGIN
1655 ExtractName (Final, CWhitespace, F);
1656 SetStringSF (Element.Name, Final, F);
1657 Final := F;
1658 F := StrScan (Final+1, '>');
1659 IF F = NIL THEN BEGIN
1660 Element.Definition := STRING (Final);
1661 Final := StrEnd (Final);
1662 BREAK;
1663 END
1664 ELSE BEGIN
1665 SetStringSF (Element.Definition, Final+1, F-1);
1666 Final := F;
1667 BREAK;
1668 END;
1669 END;
1670 INC (Final);
1671 UNTIL FALSE;
1672 Element.Definition := DelChars (Element.Definition, CWhitespace);
1673 ReplaceParameterEntities (Element.Definition);
1674 IF Element.Definition = 'EMPTY' THEN Element.ElemType := etEmpty
1675 ELSE IF Element.Definition = 'ANY' THEN Element.ElemType := etAny
1676 ELSE IF Copy (Element.Definition, 1, 8) = '(#PCDATA' THEN Element.ElemType := etMixed
1677 ELSE IF Copy (Element.Definition, 1, 1) = '(' THEN Element.ElemType := etChildren
1678 ELSE Element.ElemType := etAny;
1679
1680 Elem2 := Elements.Node (Element.Name);
1681 IF Elem2 <> NIL THEN
1682 Elements.Delete (Elements.IndexOf (Elem2));
1683 Elements.Add (Element);
1684 Final := StrScanE (Final, '>');
1685 DER.ElementType := deElement;
1686 DER.ElemDef := Element;
1687 DER.Final := Final;
1688 DtdElementFound (DER);
1689 END;
1690
1691
1692 PROCEDURE TXmlParser.AnalyzeAttListDecl (Start : PChar; VAR Final : PChar);
1693 (* Parse <!ATTLIST declaration starting at "Start"
1694 Final must point to the terminating '>' character
1695 XmlSpec 3.3:
1696 AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>'
1697 AttDef ::= S Name S AttType S DefaultDecl
1698 AttType ::= StringType | TokenizedType | EnumeratedType
1699 StringType ::= 'CDATA'
1700 TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' | 'NMTOKEN' | 'NMTOKENS'
1701 EnumeratedType ::= NotationType | Enumeration
1702 NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')'
1703 Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')'
1704 DefaultDecl ::= '#REQUIRED' | '#IMPLIED' | (('#FIXED' S)? AttValue)
1705 AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'"
1706 Examples:
1707 <!ATTLIST address
1708 A1 CDATA "Default"
1709 A2 ID #REQUIRED
1710 A3 IDREF #IMPLIED
1711 A4 IDREFS #IMPLIED
1712 A5 ENTITY #FIXED "&at;ü"
1713 A6 ENTITIES #REQUIRED
1714 A7 NOTATION (WMF | DXF) "WMF"
1715 A8 (A | B | C) #REQUIRED> *)
1716 TYPE
1717 TPhase = (phElementName, phName, phType, phNotationContent, phDefault);
1718 VAR
1719 Phase : TPhase;
1720 F : PChar;
1721 ElementName : STRING;
1722 ElemDef : TElemDef;
1723 AttrDef : TAttrDef;
1724 AttrDef2 : TAttrDef;
1725 Strg : STRING;
1726 DER : TDtdElementRec;
1727 BEGIN
1728 Final := Start + 9; // The character after <!ATTLIST
1729 Phase := phElementName;
1730 DER.Start := Start;
1731 AttrDef := NIL;
1732 ElemDef := NIL;
1733 REPEAT
1734 IF NOT (Final^ IN CWhitespace) THEN
1735 CASE Final^ OF
1736 '%' : BEGIN
1737 PushPE (Final);
1738 CONTINUE;
1739 END;
1740 #0 : IF EntityStack.Count = 0 THEN
1741 BREAK
1742 ELSE BEGIN
1743 Final := EntityStack.Pop;
1744 CONTINUE;
1745 END;
1746 '>' : BREAK;
1747 ELSE CASE Phase OF
1748 phElementName : BEGIN
1749 ExtractName (Final, CWhitespace + CQuoteChar + ['#'], F);
1750 SetStringSF (ElementName, Final, F);
1751 Final := F;
1752 ElemDef := Elements.Node (ElementName);
1753 IF ElemDef = NIL THEN BEGIN
1754 ElemDef := TElemDef.Create;
1755 ElemDef.Name := ElementName;
1756 ElemDef.Definition := 'ANY';
1757 ElemDef.ElemType := etAny;
1758 Elements.Add (ElemDef);
1759 END;
1760 Phase := phName;
1761 END;
1762 phName : BEGIN
1763 AttrDef := TAttrDef.Create;
1764 ExtractName (Final, CWhitespace + CQuoteChar + ['#'], F);
1765 SetStringSF (AttrDef.Name, Final, F);
1766 Final := F;
1767 AttrDef2 := TAttrDef (ElemDef.Node (AttrDef.Name));
1768 IF AttrDef2 <> NIL THEN
1769 ElemDef.Delete (ElemDef.IndexOf (AttrDef2));
1770 ElemDef.Add (AttrDef);
1771 Phase := phType;
1772 END;
1773 phType : BEGIN
1774 IF Final^ = '(' THEN BEGIN
1775 F := StrScan (Final+1, ')');
1776 IF F <> NIL
1777 THEN SetStringSF (AttrDef.TypeDef, Final+1, F-1)
1778 ELSE AttrDef.TypeDef := STRING (Final+1);
1779 AttrDef.TypeDef := DelChars (AttrDef.TypeDef, CWhitespace);
1780 AttrDef.AttrType := atEnumeration;
1781 ReplaceParameterEntities (AttrDef.TypeDef);
1782 ReplaceCharacterEntities (AttrDef.TypeDef);
1783 Phase := phDefault;
1784 END
1785 ELSE IF StrLComp (Final, 'NOTATION', 8) = 0 THEN BEGIN
1786 INC (Final, 8);
1787 AttrDef.AttrType := atNotation;
1788 Phase := phNotationContent;
1789 END
1790 ELSE BEGIN
1791 ExtractName (Final, CWhitespace+CQuoteChar+['#'], F);
1792 SetStringSF (AttrDef.TypeDef, Final, F);
1793 IF AttrDef.TypeDef = 'CDATA' THEN AttrDef.AttrType := atCData
1794 ELSE IF AttrDef.TypeDef = 'ID' THEN AttrDef.AttrType := atId
1795 ELSE IF AttrDef.TypeDef = 'IDREF' THEN AttrDef.AttrType := atIdRef
1796 ELSE IF AttrDef.TypeDef = 'IDREFS' THEN AttrDef.AttrType := atIdRefs
1797 ELSE IF AttrDef.TypeDef = 'ENTITY' THEN AttrDef.AttrType := atEntity
1798 ELSE IF AttrDef.TypeDef = 'ENTITIES' THEN AttrDef.AttrType := atEntities
1799 ELSE IF AttrDef.TypeDef = 'NMTOKEN' THEN AttrDef.AttrType := atNmToken
1800 ELSE IF AttrDef.TypeDef = 'NMTOKENS' THEN AttrDef.AttrType := atNmTokens;
1801 Phase := phDefault;
1802 END
1803 END;
1804 phNotationContent : BEGIN
1805 F := StrScan (Final, ')');
1806 IF F <> NIL THEN
1807 SetStringSF (AttrDef.Notations, Final+1, F-1)
1808 ELSE BEGIN
1809 AttrDef.Notations := STRING (Final+1);
1810 Final := StrEnd (Final);
1811 END;
1812 ReplaceParameterEntities (AttrDef.Notations);
1813 AttrDef.Notations := DelChars (AttrDef.Notations, CWhitespace);
1814 Phase := phDefault;
1815 END;
1816 phDefault : BEGIN
1817 IF Final^ = '#' THEN BEGIN
1818 ExtractName (Final, CWhiteSpace + CQuoteChar, F);
1819 SetStringSF (Strg, Final, F);
1820 Final := F;
1821 ReplaceParameterEntities (Strg);
1822 IF Strg = '#REQUIRED' THEN BEGIN AttrDef.DefaultType := adRequired; Phase := phName; END
1823 ELSE IF Strg = '#IMPLIED' THEN BEGIN AttrDef.DefaultType := adImplied; Phase := phName; END
1824 ELSE IF Strg = '#FIXED' THEN AttrDef.DefaultType := adFixed;
1825 END
1826 ELSE IF (Final^ IN CQuoteChar) THEN BEGIN
1827 ExtractQuote (Final, AttrDef.Value, Final);
1828 ReplaceParameterEntities (AttrDef.Value);
1829 ReplaceCharacterEntities (AttrDef.Value);
1830 Phase := phName;
1831 END;
1832 IF Phase = phName THEN BEGIN
1833 AttrDef := NIL;
1834 END;
1835 END;
1836
1837 END;
1838 END;
1839 INC (Final);
1840 UNTIL FALSE;
1841
1842 Final := StrScan (Final, '>');
1843
1844 DER.ElementType := deAttList;
1845 DER.ElemDef := ElemDef;
1846 DER.Final := Final;
1847 DtdElementFound (DER);
1848 END;
1849
1850
1851 PROCEDURE TXmlParser.AnalyzeEntityDecl (Start : PChar; VAR Final : PChar);
1852 (* Parse <!ENTITY declaration starting at "Start"
1853 Final must point to the terminating '>' character
1854 XmlSpec 4.2:
1855 EntityDecl ::= '<!ENTITY' S Name S EntityDef S? '>' |
1856 '<!ENTITY' S '%' S Name S PEDef S? '>'
1857 EntityDef ::= EntityValue | (ExternalID NDataDecl?)
1858 PEDef ::= EntityValue | ExternalID
1859 NDataDecl ::= S 'NDATA' S Name
1860 EntityValue ::= '"' ([^%&"] | PEReference | EntityRef | CharRef)* '"' |
1861 "'" ([^%&'] | PEReference | EntityRef | CharRef)* "'"
1862 PEReference ::= '%' Name ';'
1863
1864 Examples
1865 <!ENTITY test1 "Stefan Heymann"> <!-- Internal, general, parsed -->
1866 <!ENTITY test2 SYSTEM "ent2.xml"> <!-- External, general, parsed -->
1867 <!ENTITY test2 SYSTEM "ent3.gif" NDATA gif> <!-- External, general, unparsed -->
1868 <!ENTITY % test3 "<!ELEMENT q ANY>"> <!-- Internal, parameter -->
1869 <!ENTITY % test6 SYSTEM "ent6.xml"> <!-- External, parameter -->
1870 <!ENTITY test4 "&test1; ist lieb"> <!-- IGP, Replacement text <> literal value -->
1871 <!ENTITY test5 "<p>Dies ist ein Test-Absatz</p>"> <!-- IGP, See XmlSpec 2.4 -->
1872 *)
1873 TYPE
1874 TPhase = (phName, phContent, phNData, phNotationName, phFinalGT);
1875 VAR
1876 Phase : TPhase;
1877 IsParamEntity : BOOLEAN;
1878 F : PChar;
1879 ExternalID : TExternalID;
1880 EntityDef : TEntityDef;
1881 EntityDef2 : TEntityDef;
1882 DER : TDtdElementRec;
1883 BEGIN
1884 Final := Start + 8; // First char after <!ENTITY
1885 DER.Start := Start;
1886 Phase := phName;
1887 IsParamEntity := FALSE;
1888 EntityDef := TEntityDef.Create;
1889 REPEAT
1890 IF NOT (Final^ IN CWhitespace) THEN
1891 CASE Final^ OF
1892 '%' : IsParamEntity := TRUE;
1893 '>' : BREAK;
1894 ELSE CASE Phase OF
1895 phName : IF Final^ IN CNameStart THEN BEGIN
1896 ExtractName (Final, CWhitespace + CQuoteChar, F);
1897 SetStringSF (EntityDef.Name, Final, F);
1898 Final := F;
1899 Phase := phContent;
1900 END;
1901 phContent : IF Final^ IN CQuoteChar THEN BEGIN
1902 ExtractQuote (Final, EntityDef.Value, Final);
1903 Phase := phFinalGT;
1904 END
1905 ELSE IF (StrLComp (Final, 'SYSTEM', 6) = 0) OR
1906 (StrLComp (Final, 'PUBLIC', 6) = 0) THEN BEGIN
1907 ExternalID := TExternalID.Create (Final);
1908 EntityDef.SystemId := ExternalID.SystemId;
1909 EntityDef.PublicId := ExternalID.PublicId;
1910 Final := ExternalID.Final;
1911 Phase := phNData;
1912 ExternalID.Free;
1913 END;
1914 phNData : IF StrLComp (Final, 'NDATA', 5) = 0 THEN BEGIN
1915 INC (Final, 4);
1916 Phase := phNotationName;
1917 END;
1918 phNotationName : IF Final^ IN CNameStart THEN BEGIN
1919 ExtractName (Final, CWhitespace + ['>'], F);
1920 SetStringSF (EntityDef.NotationName, Final, F);
1921 Final := F;
1922 Phase := phFinalGT;
1923 END;
1924 phFinalGT : ; // -!- There is an error in the document if this branch is called
1925 END;
1926 END;
1927 INC (Final);
1928 UNTIL FALSE;
1929 IF IsParamEntity THEN BEGIN
1930 EntityDef2 := TEntityDef (ParEntities.Node (EntityDef.Name));
1931 IF EntityDef2 <> NIL THEN
1932 ParEntities.Delete (ParEntities.IndexOf (EntityDef2));
1933 ParEntities.Add (EntityDef);
1934 ReplaceCharacterEntities (EntityDef.Value);
1935 END
1936 ELSE BEGIN
1937 EntityDef2 := TEntityDef (Entities.Node (EntityDef.Name));
1938 IF EntityDef2 <> NIL THEN
1939 Entities.Delete (Entities.IndexOf (EntityDef2));
1940 Entities.Add (EntityDef);
1941 ReplaceParameterEntities (EntityDef.Value); // Create replacement texts (see XmlSpec 4.5)
1942 ReplaceCharacterEntities (EntityDef.Value);
1943 END;
1944 Final := StrScanE (Final, '>');
1945
1946 DER.ElementType := deEntity;
1947 DER.EntityDef := EntityDef;
1948 DER.Final := Final;
1949 DtdElementFound (DER);
1950 END;
1951
1952
1953 PROCEDURE TXmlParser.AnalyzeNotationDecl (Start : PChar; VAR Final : PChar);
1954 // Parse <!NOTATION declaration starting at "Start"
1955 // Final must point to the terminating '>' character
1956 // XmlSpec 4.7: NotationDecl ::= '<!NOTATION' S Name S (ExternalID | PublicID) S? '>'
1957 TYPE
1958 TPhase = (phName, phExtId, phEnd);
1959 VAR
1960 ExternalID : TExternalID;
1961 Phase : TPhase;
1962 F : PChar;
1963 NotationDef : TNotationDef;
1964 DER : TDtdElementRec;
1965 BEGIN
1966 Final := Start + 10; // Character after <!NOTATION
1967 DER.Start := Start;
1968 Phase := phName;
1969 NotationDef := TNotationDef.Create;
1970 REPEAT
1971 IF NOT (Final^ IN CWhitespace) THEN
1972 CASE Final^ OF
1973 '>',
1974 #0 : BREAK;
1975 ELSE CASE Phase OF
1976 phName : BEGIN
1977 ExtractName (Final, CWhitespace + ['>'], F);
1978 SetStringSF (NotationDef.Name, Final, F);
1979 Final := F;
1980 Phase := phExtId;
1981 END;
1982 phExtId : BEGIN
1983 ExternalID := TExternalID.Create (Final);
1984 NotationDef.Value := ExternalID.SystemId;
1985 NotationDef.PublicId := ExternalID.PublicId;
1986 Final := ExternalId.Final;
1987 ExternalId.Free;
1988 Phase := phEnd;
1989 END;
1990 phEnd : ; // -!- There is an error in the document if this branch is called
1991 END;
1992 END;
1993 INC (Final);
1994 UNTIL FALSE;
1995 Notations.Add (NotationDef);
1996 Final := StrScanE (Final, '>');
1997
1998 DER.ElementType := deNotation;
1999 DER.NotationDef := NotationDef;
2000 DER.Final := Final;
2001 DtdElementFound (DER);
2002 END;
2003
2004
2005 PROCEDURE TXmlParser.PushPE (VAR Start : PChar);
2006 (* If there is a parameter entity reference found in the data stream,
2007 the current position will be pushed to the entity stack.
2008 Start: IN Pointer to the '%' character starting the PE reference
2009 OUT Pointer to first character of PE replacement text *)
2010 VAR
2011 P : PChar;
2012 EntityDef : TEntityDef;
2013 BEGIN
2014 P := StrScan (Start, ';');
2015 IF P <> NIL THEN BEGIN
2016 EntityDef := TEntityDef (ParEntities.Node (StrSFPas (Start+1, P-1)));
2017 IF EntityDef <> NIL THEN BEGIN
2018 EntityStack.Push (P+1);
2019 Start := PChar (EntityDef.Value);
2020 END
2021 ELSE
2022 Start := P+1;
2023 END;
2024 END;
2025
2026
2027 PROCEDURE TXmlParser.ReplaceCharacterEntities (VAR Str : STRING);
2028 // Replaces all Character Entity References in the String
2029 VAR
2030 Start : INTEGER;
2031 PAmp : PChar;
2032 PSemi : PChar;
2033 PosAmp : INTEGER;
2034 Len : INTEGER; // Length of Entity Reference
2035 BEGIN
2036 IF Str = '' THEN EXIT;
2037 Start := 1;
2038 REPEAT
2039 PAmp := StrPos (PChar (Str) + Start-1, '&#');
2040 IF PAmp = NIL THEN BREAK;
2041 PSemi := StrScan (PAmp+2, ';');
2042 IF PSemi = NIL THEN BREAK;
2043 PosAmp := PAmp - PChar (Str) + 1;
2044 Len := PSemi-PAmp+1;
2045 IF CompareText (Str [PosAmp+2], 'x') = 0 // !!! Can't use "CHR" for Unicode characters > 255
2046 THEN Str [PosAmp] := CHR (StrToIntDef ('$'+Copy (Str, PosAmp+3, Len-4), 0))
2047 ELSE Str [PosAmp] := CHR (StrToIntDef (Copy (Str, PosAmp+2, Len-3), 32));
2048 Delete (Str, PosAmp+1, Len-1);
2049 Start := PosAmp + 1;
2050 UNTIL FALSE;
2051 END;
2052
2053
2054 PROCEDURE TXmlParser.ReplaceParameterEntities (VAR Str : STRING);
2055 // Recursively replaces all Parameter Entity References in the String
2056 PROCEDURE ReplaceEntities (VAR Str : STRING);
2057 VAR
2058 Start : INTEGER;
2059 PAmp : PChar;
2060 PSemi : PChar;
2061 PosAmp : INTEGER;
2062 Len : INTEGER;
2063 Entity : TEntityDef;
2064 Repl : STRING; // Replacement
2065 BEGIN
2066 IF Str = '' THEN EXIT;
2067 Start := 1;
2068 REPEAT
2069 PAmp := StrPos (PChar (Str)+Start-1, '%');
2070 IF PAmp = NIL THEN BREAK;
2071 PSemi := StrScan (PAmp+2, ';');
2072 IF PSemi = NIL THEN BREAK;
2073 PosAmp := PAmp - PChar (Str) + 1;
2074 Len := PSemi-PAmp+1;
2075 Entity := TEntityDef (ParEntities.Node (Copy (Str, PosAmp+1, Len-2)));
2076 IF Entity <> NIL THEN BEGIN
2077 Repl := Entity.Value;
2078 ReplaceEntities (Repl); // Recursion
2079 END
2080 ELSE
2081 Repl := Copy (Str, PosAmp, Len);
2082 Delete (Str, PosAmp, Len);
2083 Insert (Repl, Str, PosAmp);
2084 Start := PosAmp + Length (Repl);
2085 UNTIL FALSE;
2086 END;
2087 BEGIN
2088 ReplaceEntities (Str);
2089 END;
2090
2091
2092 PROCEDURE TXmlParser.ReplaceGeneralEntities (VAR Str : STRING);
2093 // Recursively replaces General Entity References in the String
2094 PROCEDURE ReplaceEntities (VAR Str : STRING);
2095 VAR
2096 Start : INTEGER;
2097 PAmp : PChar;
2098 PSemi : PChar;
2099 PosAmp : INTEGER;
2100 Len : INTEGER;
2101 EntityDef : TEntityDef;
2102 EntName : STRING;
2103 Repl : STRING; // Replacement
2104 ExternalEntity : TXmlParser;
2105 BEGIN
2106 IF Str = '' THEN EXIT;
2107 Start := 1;
2108 REPEAT
2109 PAmp := StrPos (PChar (Str)+Start-1, '&');
2110 IF PAmp = NIL THEN BREAK;
2111 PSemi := StrScan (PAmp+2, ';');
2112 IF PSemi = NIL THEN BREAK;
2113 PosAmp := PAmp - PChar (Str) + 1;
2114 Len := PSemi-PAmp+1;
2115 EntName := Copy (Str, PosAmp+1, Len-2);
2116 IF EntName = 'lt' THEN Repl := '<'
2117 ELSE IF EntName = 'gt' THEN Repl := '>'
2118 ELSE IF EntName = 'amp' THEN Repl := '&'
2119 ELSE IF EntName = 'apos' THEN Repl := ''''
2120 ELSE IF EntName = 'quot' THEN Repl := '"'
2121 ELSE BEGIN
2122 EntityDef := TEntityDef (Entities.Node (EntName));
2123 IF EntityDef <> NIL THEN BEGIN
2124 IF EntityDef.Value <> '' THEN // Internal Entity
2125 Repl := EntityDef.Value
2126 ELSE BEGIN // External Entity
2127 ExternalEntity := LoadExternalEntity (EntityDef.SystemId, EntityDef.PublicId, EntityDef.NotationName);
2128 Repl := StrPas (ExternalEntity.DocBuffer); // !!! What if it contains a Text Declaration?
2129 ExternalEntity.Free;
2130 END;
2131 ReplaceEntities (Repl); // Recursion
2132 END
2133 ELSE
2134 Repl := Copy (Str, PosAmp, Len);
2135 END;
2136 Delete (Str, PosAmp, Len);
2137 Insert (Repl, Str, PosAmp);
2138 Start := PosAmp + Length (Repl);
2139 UNTIL FALSE;
2140 END;
2141 BEGIN
2142 ReplaceEntities (Str);
2143 END;
2144
2145
2146 FUNCTION TXmlParser.LoadExternalEntity (SystemId, PublicId, Notation : STRING) : TXmlParser;
2147 // This will be called whenever there is a Parsed External Entity or
2148 // the DTD External Subset to be parsed.
2149 // It has to create a TXmlParser instance and load the desired Entity.
2150 // This instance of LoadExternalEntity assumes that "SystemId" is a valid
2151 // file name (relative to the Document source) and loads this file using
2152 // the LoadFromFile method.
2153 VAR
2154 Filename : STRING;
2155 BEGIN
2156 // --- Convert System ID to complete filename
2157 Filename := StringReplace (SystemId, '/', '\', [rfReplaceAll]);
2158 IF Copy (FSource, 1, 1) <> '<' THEN
2159 IF (Copy (Filename, 1, 2) = '\\') OR (Copy (Filename, 2, 1) = ':') THEN
2160 // Already has an absolute Path
2161 ELSE BEGIN
2162 Filename := ExtractFilePath (FSource) + Filename;
2163 END;
2164
2165 // --- Load the File
2166 Result := TXmlParser.Create;
2167 Result.LoadFromFile (Filename);
2168 END;
2169
2170
2171 FUNCTION TXmlParser.TranslateEncoding (CONST Source : STRING) : STRING;
2172 // The member variable "CurEncoding" always holds the name of the current
2173 // encoding, e.g. 'UTF-8' or 'ISO-8859-1'.
2174 // This virtual method "TranslateEncoding" is responsible for translating
2175 // the content passed in the "Source" parameter to the Encoding which
2176 // is expected by the application.
2177 // This instance of "TranlateEncoding" assumes that the Application expects
2178 // Windows ANSI (Win1252) strings. It is able to transform UTF-8 or ISO-8859-1
2179 // encodings.
2180 // If you want your application to understand or create other encodings, you
2181 // override this function.
2182 BEGIN
2183 IF CurEncoding = 'UTF-8'
2184 THEN Result := Utf8ToAnsi (Source)
2185 ELSE Result := Source;
2186 END;
2187
2188
2189 PROCEDURE TXmlParser.DtdElementFound (DtdElementRec : TDtdElementRec);
2190 // This method is called for every element which is found in the DTD
2191 // declaration. The variant record TDtdElementRec is passed which
2192 // holds informations about the element.
2193 // You can override this function to handle DTD declarations.
2194 // Note that when you parse the same Document instance a second time,
2195 // the DTD will not get parsed again.
2196 BEGIN
2197 END;
2198
2199
2200 FUNCTION TXmlParser.GetDocBuffer: PChar;
2201 // Returns FBuffer or a pointer to a NUL char if Buffer is empty
2202 BEGIN
2203 IF FBuffer = NIL
2204 THEN Result := #0
2205 ELSE Result := FBuffer;
2206 END;
2207
2208
2209 (*$IFNDEF HAS_CONTNRS_UNIT
2210 ===============================================================================================
2211 TObjectList
2212 ===============================================================================================
2213 *)
2214
2215 DESTRUCTOR TObjectList.Destroy;
2216 BEGIN
2217 Clear;
2218 SetCapacity(0);
2219 INHERITED Destroy;
2220 END;
2221
2222
2223 PROCEDURE TObjectList.Delete (Index : INTEGER);
2224 BEGIN
2225 IF (Index < 0) OR (Index >= Count) THEN EXIT;
2226 TObject (Items [Index]).Free;
2227 INHERITED Delete (Index);
2228 END;
2229
2230
2231 PROCEDURE TObjectList.Clear;
2232 BEGIN
2233 WHILE Count > 0 DO
2234 Delete (Count-1);
2235 END;
2236
2237 (*$ENDIF *)
2238
2239 (*
2240 ===============================================================================================
2241 TNvpNode
2242 --------
2243 Node base class for the TNvpList
2244 ===============================================================================================
2245 *)
2246
2247 CONSTRUCTOR TNvpNode.Create (TheName, TheValue : STRING);
2248 BEGIN
2249 INHERITED Create;
2250 Name := TheName;
2251 Value := TheValue;
2252 END;
2253
2254
2255 (*
2256 ===============================================================================================
2257 TNvpList
2258 --------
2259 A generic List of Name-Value Pairs, based on the TObjectList introduced in Delphi 5
2260 ===============================================================================================
2261 *)
2262
2263 PROCEDURE TNvpList.Add (Node : TNvpNode);
2264 VAR
2265 I : INTEGER;
2266 BEGIN
2267 FOR I := Count-1 DOWNTO 0 DO
2268 IF Node.Name > TNvpNode (Items [I]).Name THEN BEGIN
2269 Insert (I+1, Node);
2270 EXIT;
2271 END;
2272 Insert (0, Node);
2273 END;
2274
2275
2276
2277 FUNCTION TNvpList.Node (Name : STRING) : TNvpNode;
2278 // Binary search for Node
2279 VAR
2280 L, H : INTEGER; // Low, High Limit
2281 T, C : INTEGER; // Test Index, Comparison result
2282 Last : INTEGER; // Last Test Index
2283 BEGIN
2284 IF Count=0 THEN BEGIN
2285 Result := NIL;
2286 EXIT;
2287 END;
2288
2289 L := 0;
2290 H := Count;
2291 Last := -1;
2292 REPEAT
2293 T := (L+H) DIV 2;
2294 IF T=Last THEN BREAK;
2295 Result := TNvpNode (Items [T]);
2296 C := CompareStr (Result.Name, Name);
2297 IF C = 0 THEN EXIT
2298 ELSE IF C < 0 THEN L := T
2299 ELSE H := T;
2300 Last := T;
2301 UNTIL FALSE;
2302 Result := NIL;
2303 END;
2304
2305
2306 FUNCTION TNvpList.Node (Index : INTEGER) : TNvpNode;
2307 BEGIN
2308 IF (Index < 0) OR (Index >= Count)
2309 THEN Result := NIL
2310 ELSE Result := TNvpNode (Items [Index]);
2311 END;
2312
2313
2314 FUNCTION TNvpList.Value (Name : STRING) : STRING;
2315 VAR
2316 Nvp : TNvpNode;
2317 BEGIN
2318 Nvp := TNvpNode (Node (Name));
2319 IF Nvp <> NIL
2320 THEN Result := Nvp.Value
2321 ELSE Result := '';
2322 END;
2323
2324
2325 FUNCTION TNvpList.Value (Index : INTEGER) : STRING;
2326 BEGIN
2327 IF (Index < 0) OR (Index >= Count)
2328 THEN Result := ''
2329 ELSE Result := TNvpNode (Items [Index]).Value;
2330 END;
2331
2332
2333 FUNCTION TNvpList.Name (Index : INTEGER) : STRING;
2334 BEGIN
2335 IF (Index < 0) OR (Index >= Count)
2336 THEN Result := ''
2337 ELSE Result := TNvpNode (Items [Index]).Name;
2338 END;
2339
2340
2341 (*
2342 ===============================================================================================
2343 TAttrList
2344 List of Attributes. The "Analyze" method extracts the Attributes from the given Buffer.
2345 Is used for extraction of Attributes in Start-Tags, Empty-Element Tags and the "pseudo"
2346 attributes in XML Prologs, Text Declarations and PIs.
2347 ===============================================================================================
2348 *)
2349
2350 PROCEDURE TAttrList.Analyze (Start : PChar; VAR Final : PChar);
2351 // Analyze the Buffer for Attribute=Name pairs.
2352 // Terminates when there is a character which is not IN CNameStart
2353 // (e.g. '?>' or '>' or '/>')
2354 TYPE
2355 TPhase = (phName, phEq, phValue);
2356 VAR
2357 Phase : TPhase;
2358 F : PChar;
2359 Name : STRING;
2360 Value : STRING;
2361 Attr : TAttr;
2362 BEGIN
2363 Clear;
2364 Phase := phName;
2365 Final := Start;
2366 REPEAT
2367 IF (Final^ = #0) OR (Final^ = '>') THEN BREAK;
2368 IF NOT (Final^ IN CWhitespace) THEN
2369 CASE Phase OF
2370 phName : BEGIN
2371 IF NOT (Final^ IN CNameStart) THEN EXIT;
2372 ExtractName (Final, CWhitespace + ['=', '/'], F);
2373 SetStringSF (Name, Final, F);
2374 Final := F;
2375 Phase := phEq;
2376 END;
2377 phEq : BEGIN
2378 IF Final^ = '=' THEN
2379 Phase := phValue
2380 END;
2381 phValue : BEGIN
2382 IF Final^ IN CQuoteChar THEN BEGIN
2383 ExtractQuote (Final, Value, F);
2384 Attr := TAttr.Create;
2385 Attr.Name := Name;
2386 Attr.Value := Value;
2387 Attr.ValueType := vtNormal;
2388 Add (Attr);
2389 Final := F;
2390 Phase := phName;
2391 END;
2392 END;
2393 END;
2394 INC (Final);
2395 UNTIL FALSE;
2396 END;
2397
2398
2399 (*
2400 ===============================================================================================
2401 TElemList
2402 List of TElemDef nodes.
2403 ===============================================================================================
2404 *)
2405
2406 FUNCTION TElemList.Node (Name : STRING) : TElemDef;
2407 // Binary search for the Node with the given Name
2408 VAR
2409 L, H : INTEGER; // Low, High Limit
2410 T, C : INTEGER; // Test Index, Comparison result
2411 Last : INTEGER; // Last Test Index
2412 BEGIN
2413 IF Count=0 THEN BEGIN
2414 Result := NIL;
2415 EXIT;
2416 END;
2417
2418 L := 0;
2419 H := Count;
2420 Last := -1;
2421 REPEAT
2422 T := (L+H) DIV 2;
2423 IF T=Last THEN BREAK;
2424 Result := TElemDef (Items [T]);
2425 C := CompareStr (Result.Name, Name);
2426 IF C = 0 THEN EXIT
2427 ELSE IF C < 0 THEN L := T
2428 ELSE H := T;
2429 Last := T;
2430 UNTIL FALSE;
2431 Result := NIL;
2432 END;
2433
2434
2435 PROCEDURE TElemList.Add (Node : TElemDef);
2436 VAR
2437 I : INTEGER;
2438 BEGIN
2439 FOR I := Count-1 DOWNTO 0 DO
2440 IF Node.Name > TElemDef (Items [I]).Name THEN BEGIN
2441 Insert (I+1, Node);
2442 EXIT;
2443 END;
2444 Insert (0, Node);
2445 END;
2446
2447
2448 (*
2449 ===============================================================================================
2450 TScannerXmlParser
2451 A TXmlParser descendant for the TCustomXmlScanner component
2452 ===============================================================================================
2453 *)
2454
2455 TYPE
2456 TScannerXmlParser = CLASS (TXmlParser)
2457 Scanner : TCustomXmlScanner;
2458 CONSTRUCTOR Create (TheScanner : TCustomXmlScanner);
2459 FUNCTION LoadExternalEntity (SystemId, PublicId,
2460 Notation : STRING) : TXmlParser; OVERRIDE;
2461 FUNCTION TranslateEncoding (CONST Source : STRING) : STRING; OVERRIDE;
2462 PROCEDURE DtdElementFound (DtdElementRec : TDtdElementRec); OVERRIDE;
2463 END;
2464
2465 CONSTRUCTOR TScannerXmlParser.Create (TheScanner : TCustomXmlScanner);
2466 BEGIN
2467 INHERITED Create;
2468 Scanner := TheScanner;
2469 END;
2470
2471
2472 FUNCTION TScannerXmlParser.LoadExternalEntity (SystemId, PublicId, Notation : STRING) : TXmlParser;
2473 BEGIN
2474 IF Assigned (Scanner.FOnLoadExternal)
2475 THEN Scanner.FOnLoadExternal (Scanner, SystemId, PublicId, Notation, Result)
2476 ELSE Result := INHERITED LoadExternalEntity (SystemId, PublicId, Notation);
2477 END;
2478
2479
2480 FUNCTION TScannerXmlParser.TranslateEncoding (CONST Source : STRING) : STRING;
2481 BEGIN
2482 IF Assigned (Scanner.FOnTranslateEncoding)
2483 THEN Result := Scanner.FOnTranslateEncoding (Scanner, CurEncoding, Source)
2484 ELSE Result := INHERITED TranslateEncoding (Source);
2485 END;
2486
2487
2488 PROCEDURE TScannerXmlParser.DtdElementFound (DtdElementRec : TDtdElementRec);
2489 BEGIN
2490 WITH DtdElementRec DO
2491 CASE ElementType OF
2492 deElement : Scanner.WhenElement (ElemDef);
2493 deAttList : Scanner.WhenAttList (ElemDef);
2494 deEntity : Scanner.WhenEntity (EntityDef);
2495 deNotation : Scanner.WhenNotation (NotationDef);
2496 dePI : Scanner.WhenPI (STRING (Target), STRING (Content), AttrList);
2497 deComment : Scanner.WhenComment (StrSFPas (Start, Final));
2498 deError : Scanner.WhenDtdError (Pos);
2499 END;
2500 END;
2501
2502
2503 (*
2504 ===============================================================================================
2505 TCustomXmlScanner
2506 ===============================================================================================
2507 *)
2508
2509 CONSTRUCTOR TCustomXmlScanner.Create (AOwner: TComponent);
2510 BEGIN
2511 INHERITED;
2512 FXmlParser := TScannerXmlParser.Create (Self);
2513 END;
2514
2515
2516 DESTRUCTOR TCustomXmlScanner.Destroy;
2517 BEGIN
2518 FXmlParser.Free;
2519 INHERITED;
2520 END;
2521
2522
2523 PROCEDURE TCustomXmlScanner.LoadFromFile (Filename : TFilename);
2524 // Load XML Document from file
2525 BEGIN
2526 FXmlParser.LoadFromFile (Filename);
2527 END;
2528
2529
2530 PROCEDURE TCustomXmlScanner.LoadFromBuffer (Buffer : PChar);
2531 // Load XML Document from buffer
2532 BEGIN
2533 FXmlParser.LoadFromBuffer (Buffer);
2534 END;
2535
2536
2537 PROCEDURE TCustomXmlScanner.SetBuffer (Buffer : PChar);
2538 // Refer to Buffer
2539 BEGIN
2540 FXmlParser.SetBuffer (Buffer);
2541 END;
2542
2543
2544 FUNCTION TCustomXmlScanner.GetFilename : TFilename;
2545 BEGIN
2546 Result := FXmlParser.Source;
2547 END;
2548
2549
2550 FUNCTION TCustomXmlScanner.GetNormalize : BOOLEAN;
2551 BEGIN
2552 Result := FXmlParser.Normalize;
2553 END;
2554
2555
2556 PROCEDURE TCustomXmlScanner.SetNormalize (Value : BOOLEAN);
2557 BEGIN
2558 FXmlParser.Normalize := Value;
2559 END;
2560
2561
2562 PROCEDURE TCustomXmlScanner.WhenXmlProlog(XmlVersion, Encoding: STRING; Standalone : BOOLEAN);
2563 // Is called when the parser has parsed the <? xml ?> declaration of the prolog
2564 BEGIN
2565 IF Assigned (FOnXmlProlog) THEN FOnXmlProlog (Self, XmlVersion, Encoding, Standalone);
2566 END;
2567
2568
2569 PROCEDURE TCustomXmlScanner.WhenComment (Comment : STRING);
2570 // Is called when the parser has parsed a <!-- comment -->
2571 BEGIN
2572 IF Assigned (FOnComment) THEN FOnComment (Self, Comment);
2573 END;
2574
2575
2576 PROCEDURE TCustomXmlScanner.WhenPI (Target, Content: STRING; Attributes : TAttrList);
2577 // Is called when the parser has parsed a <?processing instruction ?>
2578 BEGIN
2579 IF Assigned (FOnPI) THEN FOnPI (Self, Target, Content, Attributes);
2580 END;
2581
2582
2583 PROCEDURE TCustomXmlScanner.WhenDtdRead (RootElementName : STRING);
2584 // Is called when the parser has completely parsed the DTD
2585 BEGIN
2586 IF Assigned (FOnDtdRead) THEN FOnDtdRead (Self, RootElementName);
2587 END;
2588
2589
2590 PROCEDURE TCustomXmlScanner.WhenStartTag (TagName : STRING; Attributes : TAttrList);
2591 // Is called when the parser has parsed a start tag like <p>
2592 BEGIN
2593 IF Assigned (FOnStartTag) THEN FOnStartTag (Self, TagName, Attributes);
2594 END;
2595
2596
2597 PROCEDURE TCustomXmlScanner.WhenEmptyTag (TagName : STRING; Attributes : TAttrList);
2598 // Is called when the parser has parsed an Empty Element Tag like <br/>
2599 BEGIN
2600 IF Assigned (FOnEmptyTag) THEN FOnEmptyTag (Self, TagName, Attributes);
2601 END;
2602
2603
2604 PROCEDURE TCustomXmlScanner.WhenEndTag (TagName : STRING);
2605 // Is called when the parser has parsed an End Tag like </p>
2606 BEGIN
2607 IF Assigned (FOnEndTag) THEN FOnEndTag (Self, TagName);
2608 END;
2609
2610
2611 PROCEDURE TCustomXmlScanner.WhenContent (Content : STRING);
2612 // Is called when the parser has parsed an element's text content
2613 BEGIN
2614 IF Assigned (FOnContent) THEN FOnContent (Self, Content);
2615 END;
2616
2617
2618 PROCEDURE TCustomXmlScanner.WhenCData (Content : STRING);
2619 // Is called when the parser has parsed a CDATA section
2620 BEGIN
2621 IF Assigned (FOnCData) THEN FOnCData (Self, Content);
2622 END;
2623
2624
2625 PROCEDURE TCustomXmlScanner.WhenElement (ElemDef : TElemDef);
2626 // Is called when the parser has parsed an <!ELEMENT> definition
2627 // inside the DTD
2628 BEGIN
2629 IF Assigned (FOnElement) THEN FOnElement (Self, ElemDef);
2630 END;
2631
2632
2633 PROCEDURE TCustomXmlScanner.WhenAttList (ElemDef : TElemDef);
2634 // Is called when the parser has parsed an <!ATTLIST> definition
2635 // inside the DTD
2636 BEGIN
2637 IF Assigned (FOnAttList) THEN FOnAttList (Self, ElemDef);
2638 END;
2639
2640
2641 PROCEDURE TCustomXmlScanner.WhenEntity (EntityDef : TEntityDef);
2642 // Is called when the parser has parsed an <!ENTITY> definition
2643 // inside the DTD
2644 BEGIN
2645 IF Assigned (FOnEntity) THEN FOnEntity (Self, EntityDef);
2646 END;
2647
2648
2649 PROCEDURE TCustomXmlScanner.WhenNotation (NotationDef : TNotationDef);
2650 // Is called when the parser has parsed a <!NOTATION> definition
2651 // inside the DTD
2652 BEGIN
2653 IF Assigned (FOnNotation) THEN FOnNotation (Self, NotationDef);
2654 END;
2655
2656
2657 PROCEDURE TCustomXmlScanner.WhenDtdError (ErrorPos : PChar);
2658 // Is called when the parser has found an Error in the DTD
2659 BEGIN
2660 IF Assigned (FOnDtdError) THEN FOnDtdError (Self, ErrorPos);
2661 END;
2662
2663
2664 PROCEDURE TCustomXmlScanner.Execute;
2665 // Perform scanning
2666 // Scanning is done synchronously, i.e. you can expect events to be triggered
2667 // in the order of the XML data stream. Execute will finish when the whole XML
2668 // document has been scanned or when the StopParser property has been set to TRUE.
2669 BEGIN
2670 FStopParser := FALSE;
2671 FXmlParser.StartScan;
2672 WHILE FXmlParser.Scan AND (NOT FStopParser) DO
2673 CASE FXmlParser.CurPartType OF
2674 ptNone : ;
2675 ptXmlProlog : WhenXmlProlog (FXmlParser.XmlVersion, FXmlParser.Encoding, FXmlParser.Standalone);
2676 ptComment : WhenComment (StrSFPas (FXmlParser.CurStart, FXmlParser.CurFinal));
2677 ptPI : WhenPI (FXmlParser.CurName, FXmlParser.CurContent, FXmlParser.CurAttr);
2678 ptDtdc : WhenDtdRead (FXmlParser.RootName);
2679 ptStartTag : WhenStartTag (FXmlParser.CurName, FXmlParser.CurAttr);
2680 ptEmptyTag : WhenEmptyTag (FXmlParser.CurName, FXmlParser.CurAttr);
2681 ptEndTag : WhenEndTag (FXmlParser.CurName);
2682 ptContent : WhenContent (FXmlParser.CurContent);
2683 ptCData : WhenCData (FXmlParser.CurContent);
2684 END;
2685 END;
2686
2687
2688 END.
2689