1<<if: ZXIDBOOK>>
2<<else: >>ZXID Low Level ("Raw") API
3##########################
4<<author: Sampo Kellom�ki (sampo@iki.fi)>>
5<<cvsid: $Id: zxid-raw.pd,v 1.5 2010-01-08 02:10:09 sampo Exp $>>
6<<class: article!a4paper!!ZXID-RAW 01>>
7<<define: ZXDOC=ZXID Raw API>>
8
9<<abstract:
10
11ZXID.org Identity Management toolkit implements standalone SAML 2.0 and
12Liberty ID-WSF 2.0 stacks. This document describes the low level API.
13
14>>
15
16<<maketoc: 1>>
17
181 Introduction
19==============
20
21Here we describe the general philosophy of the ZXID low level
22APIs. Some function level documentation is available from
23<<link:../ref/html/index.html: Function reference>>.
24
25Before you barge head first to use the raw API, you should
26check if the +easy+ and simple API in <<link:zxid-simple.html: zxid_simple()>>
27meets your needs. Or you may be able to use
28<<link:../mod_auth_saml/mod_auth_saml.html: mod_auth_saml>>
29and not have to program at all.
30
31Happy hacking!
32
331.1 Other documents
34-------------------
35
36<<doc-inc.pd>>
37<<htmlpreamble: <title>ZXID Low Level ("Raw") API</title><link type="text/css" rel=stylesheet href="zx.css"><body><h1>ZXID Low Level ("Raw") API</h1> >>
38
3912 Full Native C API
40====================
41<<fi: >>
42
43The generated aspects of the native C API are in c/*-data.h, for example
44
45  c/zx-sa-data.h
46
47Studying this file is very instructive.<<footnote: emacs tip: run
48`make tags' and then try hitting M-. while cursor is over a struct
49or function name in c/zx-sa-data.h - this makes navigation painless.>>
50
5112.1 C Data Structures
52----------------------
53
54From .sg a header (NN-data.h) is generated. This header contains structs that
55represent the data of the elements. Each element and attribute
56generates its own node. Even trivial nodes like strings have to be
57kept this way because the nodes form basis of remembering the ordering
58of data. This ordering is needed for exclusive XML canonicalization,
59and thus for signature verification.<<footnote: It's unfortunate that
60the XML standards do not make this any easier. Without order
61maintenance requirement, it would be possible to represent trivial
62child elements directly as struct fields. An approach that tried to do
63just this is available from CVS tag GEN_LALR (ca. 29.5.2006).>>
64
65Any missing data is represented by NULL pointer.
66
67Any repeating data is kept as a linked list, in reverse order of being
68seen in the data stream.<<footnote: Reverse order is just an
69optimization - or an artifact of simply adding latest element to the
70head of the list. If this bothers you, it's easy enough to reverse the
71list afterwards. Linked list is simple and works well for data whose
72order does not matter much (we use separate pointer for remembering
73the canonicalization order) and where random access is not needed, or
74cardinality is low enough so that simple pointer chasing is efficient
75enough.>>
76
77<<ignore: *** Problem here: how to preserve ordering of elements. We need to
78   * do SO canonicalization as there are new elements, yet we would like
79   * to maintain WO as much as possible, especially for elements for which
80   * we do not have schema ("any" elements). Always reverse any elem list?
81>>
82
83Simple elements and all attributes are represented by simple string node
84(even if they are booleans or integers).
85
86*Example*
87
88Consider following XML
89
90  <ds:Signature>
91     <ds:SignedInfo>
92       <ds:CanonicalizationMethod
93           Algorithm="http://w3.org/xml-exc-c14n#"/>
94       <ds:SignatureMethod
95           Algorithm="http://w3.org/xmldsig#rsa-sha1"/>
96       <ds:Reference
97           URI="#RrcrNwFIw6n">
98         <ds:Transforms>
99           <ds:Transform
100               Algorithm="http://w3.org/xml-exc-c14n#"/>
101           <ds:Transform
102               Algorithm="http://w3.org/xmldsig#env-sig"/></>
103         <ds:DigestMethod
104             Algorithm="http://w3.org/xmldsig#sha1"/>
105         <ds:DigestValue>lNIzVMrp8CwTE=</></></>
106     <ds:SignatureValue>GeMp7LS...vnjn8=</></>
107
108Decoding would produce the data structure in Fig-<<see: fig:decode-data>>. You
109should also look at c/zx-sa-data.h to see the structs involved in this
110example.
111
112<<dot: decode-data: Typical data structure produced by decode.
113
114// This graph crashes dot 1.12, but works in dot 2.8, seems to crash 2.20.2
115
116size="11.0,6.0"
117margin=0
118rankdir=LR
119
120{ rank=same; siginfo; sigval; }
121{ rank=same; canonmeth; sigmeth; ref; }
122//{ rank=same; canonmeth; sigmeth; ref; digmeth; digval; }
123//{ rank=same; xforms; xform_env; xform_c14n; }
124//{ rank=same; xform_env; xform_c14n; digmeth; digval; }
125{ rank=same; xforms; digmeth; digval; }
126{ rank=same; xform_c14n; xform_env; }
127
128sig [shape=record,label="zx_ds_Signature_s|{|{<f_kids>gg.kids|<f_siginfo>SignedInfo|<f_sigval>SignatureValue|KeyInfo (0)|Object (0)|Id (0)}}"];
129siginfo [shape=record,label="zx_ds_SignedInfo_s|{|{<f_kids>gg.kids|<f_wo>gg.g.wo|<f_canonmeth>CanonicalizationMethod|<f_sigmeth>SignatureMethod|<f_ref>Reference|Id (0)}}"];
130
131canonmeth [shape=record,label="zx_ds_CanonicalizationMethod_s|{|{<f_wo>gg.g.wo|Algorithm\n\"http://w3.org/xml-exc-c14n#\"}}"];
132
133sigmeth [shape=record,label="zx_ds_SignatureMethod_s|{|{<f_wo>gg.g.wo|Algorithm\n\"http://w3.org/xmldsig#rsa-sha1\"}}"];
134
135ref [shape=record,label="zx_ds_Reference_s|{|{<f_kids>gg.kids|gg.g.wo (0)|<f_xforms>Transforms|<f_digmeth>DigestMethod|<f_digval>DigestValue|Id (0)|Type (0)|URI\n\"#RrcrNwFIw6n\"}}"];
136
137xforms [shape=record,label="zx_ds_Transforms_s|{|{<f_kids>gg.kids|<f_wo>gg.g.wo|gg.g.n (0)|<f_xform>Transform}}"];
138
139xform_c14n [shape=record,label="zx_ds_Transform_s|{|{<f_wo>gg.g.wo|gg.g.n (0)|XPath (0)|<f_c14n_algo>Algorithm\n\"http://w3.org/xml-exc-c14n#\"}}"];
140
141xform_env [shape=record,label="zx_ds_Transform_s|{|{gg.g.wo (0)|<f_n>gg.g.n|XPath (0)|Algorithm\n\"http://w3.org/xmldsig#env-sig\"}}"];
142
143xforms:f_xform -> xform_env
144xform_env:f_n -> xform_c14n
145
146digmeth [shape=record,label="zx_ds_DigestMethod_s|{|{<f_wo>gg.g.wo|Algorithm\n\"http://w3.org/xmldsig#sha1\"}}"];
147digval [shape=record,label="zx_elem_s|{|{gg.g.wo (0)|content\n\"lNIzVMrp8CwTE=\"}}"];
148
149sigval [shape=record,label="zx_ds_SignatureValue_s|{|{gg.g.wo (0)|gg.content\n\"GeMp7LS...vnjn8=\"|Id (0)}}"];
150
151sig:f_siginfo -> siginfo
152sig:f_sigval  -> sigval
153
154siginfo:f_canonmeth -> canonmeth
155siginfo:f_sigmeth -> sigmeth
156siginfo:f_ref -> ref
157
158ref:f_xforms -> xforms
159ref:f_digmeth -> digmeth
160ref:f_digval -> digval
161
162sig:f_kids ->siginfo [weight=0,arrowhead=empty,color=red]
163
164siginfo:f_wo ->sigval [weight=0,arrowhead=empty,color=red]
165siginfo:f_kids -> canonmeth [weight=0,arrowhead=empty,color=red]
166canonmeth:f_wo -> sigmeth [weight=0,arrowhead=empty,color=red]
167sigmeth:f_wo -> ref [weight=0,arrowhead=empty,color=red]
168
169ref:f_kids -> xforms [weight=0,arrowhead=empty,color=red]
170xforms:f_wo -> digmeth [weight=0,arrowhead=empty,color=red]
171digmeth:f_wo -> digval [weight=0,arrowhead=empty,color=red]
172
173xforms:f_kids -> xform_c14n [weight=0,arrowhead=empty,color=red]
174xform_c14n:f_wo -> xform_env [weight=0,arrowhead=empty,color=red]
175
176>>
177
178There are two pointer systems at play here. The black solid arrows
179depict the logical structure of the XML document. For each child
180element there is a struct field that simply points to the child. If
181there are multiple occurrences of the child, as in
182~sig->SignedInfo->Reference->Transforms->Transform~, the children are
183kept in a linked list connected by gg.g.n (next) fields.<<footnote:
184This linked list may be in inverted order depending on the phase of
185the moon and position of the trams in Helsinki. Until implementation
186matures, its better not to depend on the ordering.>>
187
188The +wire order+ structure, depicted by red hollow arrows, is
189maintained using gg.kids as root and gg.g.n for next pointer. For example
190~sig->SignedInfo->Reference->Transforms~ keeps its kids, the
191~zx_ds_Transform~ objects, in the original order hanging from the kids
192and linked with the gg.g.n field. As can be seen, the order kept with gg.g.n
193fields can be different than the one kept using <<tt: n>> (next)
194fields. <<footnote: Sometime before R1.0 the scheme changed to only
195have gg.g.n pointers and making even wireorder lists use them. Thus
196wo pointers no longer exist.>>
197
198What's more, the kids list can contain dissimilar objects, witness
199~sig->SignedInfo->Reference->gg.kids~. The wire order representation
200is only captured when decoding the document and is mainly useful for
201correctly canonicalizing the document for signature verification. If
202you are building a data structure in your own program, you typically
203will not set the gg.kids and gg.g.wo fields.
204
205In the diagram, the objects of type ~zx_str~ were collapsed to
206double quoted strings. Superfluous gg.kids, gg.g.wo, and gg.g.n fields
207were omitted: they exist in all structures, but are not shown when
208they are ~NULL~. The ~NULL~ is depicted as zero (0).<<footnote: All
209this gg.g business is just C's way of referencing the fields of a
210common base type of element objects.>>
211
212
213<<notacountry: so wo>>
214
21512.1.1 Handling XML Namespaces
216~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
217
218An annoying feature of XML documents is that they have variable
219namespace prefixes. The namespace prefix for the unqualified elements
220is taken to be the one specified in target() directive of the .sg
221input. Name of an element in C code is formed by prefixing the element
222by the namespace prefix and an underscore.
223
224Attributes will only have namespace prefix if such was expressly
225specified in .sg input.
226
227When decoding, the actual namespace prefixes are recorded. The wire
228order encoder knows to use these recorded prefixes so that accurate
229canonicalization for XMLDSIG can be produced.
230
231If the message on wire uses wrong namespaces, the wrong ones are
232remembered so that canonicalization for signature validation will work
233irrespective. The ability to accept wrong namespaces only works as
234long as there is no ambiguity as to which tag was meant - there are
235some tags that need namespace information to distinguish. If you hit
236one of these then either you get lucky and the one that is arbitrarily
237picked by the decoder happens to be the correct one, or you are stuck
238with no easy way to make it right. Of course the XML document was
239wrong to start with so theoretically this is not a concern. Generally
240the more schemata that are simultaneously generated to one package, the
241greater the risk of collisions between tags.
242
243The schema order encoder always uses the prefixes defined
244using target() directives in .sg files. The runtime notion of
245namespaces is handled by ~ns_tab~ field of the decoding and encoding
246context.  It is initialized to contain all namespaces known by virtue
247of .sg declarations.  The runtime assigned prefixes are held in a
248linked list hanging from <<tt: n>> (next) field of ~struct
249zx_ns_s~. (*** more work needed here)
250
251The code generation creates a file, such as c/zx-ns.c, which contains
252initialization for the table. The main program should point the ~ns_tab~
253field of context as follows:
254
255  main {
256    struct zx_ctx* ctx;
257    ...
258    ctx->ns_tab = zx_ns_tab;   /* Here zx_ is the prefix chosen in code generation */
259  }
260
261Consider the following evil contortion
262
263  <e:E xmlns:e="uri">
264    <h:H xmlns:h="uri"/>
265    <b:B xmlns:b="uri">
266      <e:C xmlns:e="uri"/>
267      <e:D xmlns:e="iru">
268        <e:F xmlns:e="uri"/></></></>
269
270Assuming the ~ns_tab~ assigns prefix <<tt: y>> to the namespace
271URI, we would have following data structure as a result of a decode
272
273<<dot: ns-data,,: Decode of XML and resulting namespace structures.
274margin=0
275//rankdir=LR
276
277{ rank=same; ns_tab; e; h; b; }
278{ rank=same; H; B; }
279{ rank=same; C; D; }
280
281ns_tab [shape=record,label="{ns_tab|{y|uri|<uri_n>}|{z|iru|<iru_n>}}"]
282
283e [shape=record,label="e|uri|<n>"]
284h [shape=record,label="h|uri|<n>"]
285b [shape=record,label="b|uri|0"]
286i [shape=record,label="e|iru|0"]
287
288ns_tab:uri_n -> e
289ns_tab:iru_n -> i
290e:n -> h
291h:n -> b
292
293E -> H [style=bold]
294E -> B [style=bold]
295B -> C [style=bold]
296B -> D [style=bold]
297D -> F [style=bold]
298
299E -> e [color=red,arrowhead=empty]
300H -> h [color=red,arrowhead=empty]
301B -> b [color=red,arrowhead=empty]
302C -> e [color=red,arrowhead=empty]
303D -> i [color=red,arrowhead=empty]
304F -> e [color=red,arrowhead=empty]
305>>
306
307The red hollow arrows indicate how the elements reference the
308namespaces. Since none of the elements used the prefix originally
309specified in the schema grammar target() directive, we ended up
310allocating "alias" nodes for the uri. However, since E and C use the
311same prefix, they share the alias node. Things get interesting with D:
312it redefines the prefix e to mean different namespace URI, "iru", which
313happens to be an alias of prefix z.
314
315Later, when wire order canonical encode is done, the red thin arrows
316are chased to determine the namespaces. However, we need to keep a
317separate "seen" stack to track whether parent has already declared the
318prefix and URI. E would declare xmlns:e="uri", but C would not because
319it had already been "seen". However, F would have to declare it again
320because the xmlns:e="iru" in D masks the declaration. The ~zx_ctx~
321structure is used to track the namespaces and "seen" status
322through out decoders and encoders.
323
324<<dot: seen-data,,: Seen data structure (blue dotted and green dashed arrows) in the end of decoding F. S=seen, SN=seen_n.
325margin=0
326//rankdir=LR
327
328{ rank=same; ns_tab; ee; e; h; b; }
329{ rank=same; H; B; }
330{ rank=same; C; D; }
331
332ns_tab [shape=record,label="{ns_tab|{P|URI|S|SN|N}|{y|uri|0|0|<uri_n>}|{z|iru|0|0|<iru_n>}}"]
333
334e [shape=record,label="e|uri|0|0|<n>"]
335ee [shape=record,label="e|uri|<s>|0|<n>"]
336h [shape=record,label="h|uri|0|<sn>|<n>"]
337b [shape=record,label="b|uri|0|<sn>|0"]
338i [shape=record,label="e|iru|<s>|0|0"]
339
340ctx [shape=record,label="{ctx|{|{<ns>ns_tab|<sn>seen_n}}}"]
341
342ns_tab:uri_n -> ee
343ns_tab:iru_n -> i
344ee:n -> e
345e:n -> h
346h:n -> b
347
348E -> H [style=bold]
349E -> B [style=bold]
350B -> C [style=bold]
351B -> D [style=bold]
352D -> F [style=bold]
353
354E -> e [color=red,arrowhead=empty]
355H -> h [color=red,arrowhead=empty]
356B -> b [color=red,arrowhead=empty]
357C -> e [color=red,arrowhead=empty]
358D -> i [color=red,arrowhead=empty]
359F -> ee [color=red,arrowhead=empty]
360
361ns_tab -> ctx:ns [arrowhead=none,arrowtail=normal]
362b -> ctx:sn [color=blue,style=dotted,arrowhead=none,arrowtail=normal]
363b:sn -> h [color=blue,style=dotted]
364h:sn -> ee [color=blue,style=dotted]
365ee:s -> i [color=green,style=dashed]
366i:s -> e [color=green,style=dashed]
367>>
368
369Here we can see how the ~seen_n~ list, represented by the blue dotted
370arrows, was built: at the head of the list, ~ctx->seen_n~, is the last
371seen prefix, namely b (because, although the meaning of e at F was
372different, e as a prefix had already been seen earlier at E), followed
373by other prefixes in inverse order of first occurrence.<<footnote: This
374is a mere artifact of implementation: it's cheapest to add to the head
375of the list. This may change in future.>> The green dashed arrows from
376e:uri to e:iru and then on to second e:uri reflect the fact that e:uri
377(second) was put to the list first (when we were at E), but later, at
378D, a different meaning, iru, was given to prefix e. Finally at F we
379give again a different meaning for e, thus pushing to the "seen stack"
380another node. Although e at E and at F have namespace URI, "uri", we are
381not able to use the same node because we need to keep the stack order.
382Thus we are forced to allocate two identical nodes.
383
38412.1.2 Handling any and anyAttribute
385~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
386
387Since our aim is to be lax in what we accept, every element can handle
388unexpected additional attributes as well as unexpected elements. Thus
389whether the schema specifies any or anyAttribute or not, we handle
390everything as if they were there. However, when attributes and
391elements are received outside of their expected context, they are
392simply treated as strings with string names. This is true even for
393those attributes and elements that would be recognizable in their
394proper context.
395
396The any extension points, as well as some bookkeeping data
397are hidden inside ~ZX_ELEM_EXT~ macro. If you tinker with
398this macro, be sure you know what you are doing. If you want
399to add your own specific fields to all structs, redefining
400~ZX_ELEM_EXT~ may be appropriate, but if you want to add more
401fields only to some specific structures, you can define
402a macro of form
403
404  TPF_EEE_EXT
405
406and put in it whatever fields you want. These fields will be
407initialized to zero when the structure is created, but are not touched
408in any other way by the generated code. In particular, if some of your
409fields are pointers, it will be your responsibility to free them. The
410standard free functions will not understand to free them. See the data
411structure walking functions, below for one way to accomplish this.
412
41312.1.3 Root data structure
414~~~~~~~~~~~~~~~~~~~~~~~~~~
415
416The root data structure
417
418  struct zx_root_s;
419
420is a special structure that has a field for every top level
421recognizable element.
422
42312.1.4 Per element data structures
424~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
425
426*** TBW
427
42812.1.5 Memory Allocation
429~~~~~~~~~~~~~~~~~~~~~~~~
430
431After decoding all string data points directly into the input buffer,
432i.e. strings are NOT copied. Be sure to not free the input buffer
433until you are done processing the data structure. If you need to take
434a copy of the strings, you will need to walk the data structure as a
435post processing step and do your copies. This can be done using
436
437  void TPF_dup_strs_len_NS_EEE(struct zx_dec_ctx* c, struct TPF_NS_EEE_s* x);
438
439The structures are allocated via ZX_ZALLOC() macro, which
440by default calls zx_zalloc() function, which in turn
441uses system malloc(3). However, you can redefine the
442macro to use whatever other allocation scheme you desire.
443
444The generated libraries never free(3) memory. In many programming
445patterns, this is actually desirable: for example a CGI program can
446count on dying - the process exit(2) will free all the memory.
447
448If you need to free(3) the data structure, you will need to walk it
449using
450
451  void TPF_free_len_NS_EEE(struct zx_dec_ctx* c,
452                           struct TPF_NS_EEE_s* x,
453                           int free_strings);
454  void zx_free_any(struct zx_dec_ctx* c,
455                   struct zx_note_s* n,
456                   int free_strs);
457
458The zx_free_any() works by having a gigantic switch statement that calls
459the appropriate specific free function.
460
461You can deep clone the data structure with
462
463  void TPF_deep_clone_NS_EEE(struct zx_dec_ctx* c,
464                             struct TPF_NS_EEE_s* x,
465                             int dup_strings);
466  struct zx_note_s* zx_clone_any(struct zx_dec_ctx* c,
467                                 struct zx_note_s* n,
468                                 int dup_strs);
469
470The zx_clone_any() works by having a gigantic switch statement that calls
471the appropriate specific free function.
472
47312.2 Decoder as Recursive Descent Parser
474----------------------------------------
475
476The entry point to the decoder is
477
478  struct zx_root_s* zx_DEC_root(struct zx_dec_ctx* c,
479                                struct zx_ns_s* dummy,
480                                int n_decode);
481
482The decoding context holds pointer to the raw data and must be
483initialized prior to calling the decoder. The third argument specifies
484how many recognized elements are decoded before returning. Usually you
485would specify 1 to consume one top level element from the
486stream.<<footnote: The second argument, the dummy namespace, is
487meaningless for root node, but makes sense for element decoders. For
488root you can simply supply 0 (NULL).>>
489
490The returned data structure, ~struct zx_root_s~, contains
491one pointer for each type of top level element that can
492be recognized. The ~tok~ field of the returned value
493identifies the last top level element recognized and can
494be used to dispatch to correct request handler:
495
496  zx_prepare_dec_ctx(c, TPF_ns_tab, start_ptr, end_ptr);
497  struct TPF_root_s* x = TPF_DEC_root(c, 0, 1);
498  switch (x->gg.g.tok) {
499  case TPF_NS_EEE_ELEM: return process_EEE_req(x->NN_EEE);
500  }
501
502When processing responses, it is generally already known
503which type of response you are expecting, so you can simply
504check for NULLness of the respective pointer in the returned
505data structure.
506
507Internally zx_DEC_root() works much the same way: it scans
508a beginning of an element from the stream, looks up the token
509number corresponding to the element name, and switches on
510that, calling element specific decoder functions (see next
511section) to do the detailed processing.
512
513In the above code fragment, you should note the call to
514zx_prepare_dec_ctx() which initializes the decoder machinery.
515It takes +ns_tab+ argument, which specifies which namespaces
516will be recognized. This table MUST match the TPF_DEC_root()
517function you call (i.e. both must have been generated as
518part of the same xsd2sg.pl invocation). The other arguments
519are the start of the buffer to decode and pointer one past
520the end of the buffer to decode.
521
52212.2.1 Element Decoders
523~~~~~~~~~~~~~~~~~~~~~~~
524
525For each recognizable element there is a function of form
526
527  struct TPF_NS_EEE_s* zx_DEC_NS_EEE(struct zx_dec_ctx* c);
528
529where TPF is the prefix, NS is the namespace prefix, and
530EEE is the element name. For example:
531
532  struct zx_se_Envelope_s* zx_DEC_se_Envelope(struct zx_ctx* c);
533
534These functions work much the same way as the root decoder. You
535should consult dec-templ.c for the skeleton of the decoder. Generally
536you should not be calling element specific decoders: they
537exist so that zx_DEC_root() can call them. They have somewhat
538nonintuitive requirements, for example the opening <, the
539namespace prefix, and the element name must have already been
540scanned from the input stream by the time you call element
541specific decoder.
542
54312.2.2 Decoder Extension Points
544~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
545
546The generated code is instrumented with following macros
547
548ZX_ATTR_DEC_EXT(ss):: Extension point called just after decoding known attribute
549ZX_XMLNS_DEC_EXT(ss):: Extension point called just after decoding xmlns attribute
550ZX_UNKNOWN_ATTR_DEC_EXT(ss):: Extension point called just after decoding unknown attr
551ZX_START_DEC_EXT(x):: Extension point called just after decoding element name
552    and allocating struct, but before decoding any of the attributes.
553ZX_END_DEC_EXT(x):: Extension point called just after decoding the entire element.
554ZX_START_BODY_DEC_EXT(x):: Extension point called just after decoding element tag, including attributes, but before decoding the body of the element.
555ZX_PI_DEC_EXT(pi):: Extension point called just after decoding processing instruction
556ZX_COMMENT_DEC_EXT(comment):: Extension point called just after decoding comment
557ZX_CONTENT_DEC(ss):: Extension point called just after decoding string content
558ZX_UNKNOWN_ELEM_DEC_EXT(elem):: Extension point called just after decoding unknown element
559
560Following macros are available to the extension points
561
562TPF:: Type prefix (as specified by  -p during code generation)
563EL_NAME:: Namespaceful element name (NS_EEE)
564EL_STRUCT:: Name of the struct that describes the element
565EL_NS:: Namespace prefix of the element (as seen in input schema)
566EL_TAG:: Name of the element without any namespace qualification.
567
56812.3 Exclusive Canonical Encoder (Serializer)
569---------------------------------------------
570
571The encoder receives a C data structure and generates a gigantic
572string containing an XML document corresponding to the data structure
573and the input schemata. The XML document conforms to the rules of
574exclusive XML canonicalization and hence is useful as input to XMLDSIG.
575
576One encoder is generated for each root node specified at the code
577generation. Often these encoders share code for interior nodes.
578
579The encoders allow two pass rendering. You can first use the length
580computation method to calculate the amount of storage needed and
581then call one of the rendering functions to actually render. Or
582if you simply have large enough buffer, you can just render directly.
583
584The encoders take as argument next free position in buffer
585and return a char pointer one past the last byte used. Thus
586you can discover the length after rendering by subtracting the
587pointers. This is guaranteed to result same length as returned
588by the length computation method.<<footnote: This is a useful
589sanity check. If the two ever disagree, please report a bug.>>
590You can also call the next encoder with the return value
591of the previous encoder to render back-to-back elements.
592
593The XML namespace and XML attribute handling of the encoders
594is novel in that the specified sort is done already at code
595generation time, i.e. the renderers are already in the order
596that the sort mandates.
597
598For attributes we know the sort order directly from the schema
599because [XML-C14N], sec 2.2, p.7, specifies that they
600sort first by namespace URI and then by name, both of which
601we know from the schema.
602
603For ~xmlns~ specifications the situation is similarly easy in the
604schema order encoder case because we know the namespace prefixes
605already at code generation time. However, for the wire order encoder
606we actually need a runtime sort because we can not control which
607namespace prefixes get used. However, for both cases we can make a
608pretty good guess about which namespaces might need to be declared at
609any given element: the element's own namespace and namespaces of each
610of its attributes. That's all, and it's all known at code generation
611time. At runtime we only need to check if the namespace has already
612been seen at outer layer.
613
61412.3.1 Length computation
615~~~~~~~~~~~~~~~~~~~~~~~~~
616
617Compute length of an element (and its subelements). The XML attributes
618and elements are processed in schema order.
619
620  int TPF_LEN_SO_NS_EEE(struct zx_ctx* c,
621                        struct TPF_NS_EEE_s* x);
622
623For example:
624
625  int zx_LEN_SO_se_Envelope(struct zx_ctx* c,
626                            struct zx_se_Envelope_s* x);
627
628Compute length of an element (and its subelements). The XML namespaces
629and elements are processed in wire order.
630
631  int TPF_LEN_WO_NS_EEE(struct zx_ctx* c,
632                        struct TPF_NS_EEE_s* x);
633
634For example:
635
636  int zx_LEN_WO_se_Envelope(struct zx_ctx* c,
637                            struct zx_se_Envelope_s* x);
638
63912.3.2 Encoding in schema order
640~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
641
642Render an element into string. The XML elements are processed in
643schema order. The xmlns declarations and XML attributes are always
644sorted per [XML-EXC-C14N] rules.<<footnote: The sort is actually done
645already at code generation time by xsd2sg.pl.>> This is what you
646generally want for rendering new data structure to a string. The wo
647pointers are not used.
648
649  char* TPF_ENC_SO_NS_EEE(struct zx_ctx* c,
650                          struct TPF_NS_EEE_s* x,
651                          char* p);
652
653For example:
654
655  char* zx_ENC_SO_se_Envelope(struct zx_ctx* c,
656                              struct zx_se_Envelope_s* x,
657                              char* p);
658
659Since it is a very common requirement to allocate correct
660sized buffer and then render an element, a helper function
661is provided to do this in one step.
662
663  struct zx_str* zx_EASY_ENC_SO_se_Envelope(struct zx_ctx* c,
664                                    struct zx_se_Envelope_s* x);
665
666The returned string is allocated from allocation arena described
667by ~zx_ctx~.
668
66912.3.3 Encoding in wire order
670~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
671
672Render element into string. The XML elements are
673processed in wire order by chasing wo pointers. This is what you want
674for validating signatures on other people's XML documents. If the wire
675representation was schema invalid, e.g. elements were in wrong order,
676the wire representation is still respected, except for xmlns
677declarations and XML attributes, which are always sorted, per exc-c14n
678rules. For each element a function is generated as follows
679
680  char* TPF_ENC_WO_NS_EEE(struct zx_ctx* c,
681                          struct TPF_NS_EEE_s* x,
682                          char* p);
683
684For example
685
686  char* zx_ENC_WO_se_Envelope(struct zx_ctx* c,
687                              struct zx_se_Envelope_s* x,
688                              char* p);
689
690A helper function is also available
691
692  struct zx_str* zx_EASY_ENC_WO_se_Envelope(struct zx_ctx* c,
693                                    struct zx_se_Envelope_s* x);
694
69512.4 Signatures (XMLDSIG)
696-------------------------
697
69812.4.1 Signature Generation
699~~~~~~~~~~~~~~~~~~~~~~~~~~~
700
701*** TBW
702
70312.4.2 Signature Validation
704~~~~~~~~~~~~~~~~~~~~~~~~~~~
705
706For signature validation you need to walk the decoded data structure
707to locate the signature as well as the references and pass them to
708zxsig_validate(). The validation involves wire order exclusive
709canonical encoding of the referenced XML blobs, computation of SHA1 or
710MD5 checksums over them, and finally computation of SHA1 check sum
711over the <SignedInfo> element and validation of the actual
712<SignatureValue> against that. The validation involves public key
713decryption using the signer's certificate.
714
715A nasty problem in exclusive canonicalization is that the namespaces
716that are needed in the blob may actually appear in the containing XML
717structures, thus in order to know the correct meaning of a namespace
718prefix, we need to perform the +seen+ computation for all elements
719outside and above the blob of interest.<<footnote: This is yet another
720indication of how botched the XML namespace concept is. Or this could
721have been fixed in the exclusive canonicalization spec by not using
722namespace prefixes at all.>>
723
724To verify signature, you have to do certain amount of preparatory work
725to locate the signature and the data that was signed. Generally what
726should be signed will be evident from protocol specifications or from
727the security requirements of your application environment. Conversely,
728if there is a signature, but it does not reference the appropriate
729elements, its worthless and you might as well reject the document
730without even verifying the signature.
731
732*Example*
733
734    struct zxsig_ref refs[1];
735    cf = zxid_new_conf("/var/zxid/");
736    ent = zxid_get_ent_from_file(cf, "YV7HPtu3bfqW3I4W_DZr-_DKMP4.");
737
738    refs[0].ref = r->Envelope->Body->ArtifactResolve
739                   ->Signature->SignedInfo->Reference;
740    refs[0].blob = (struct zx_elem_s*)r->Envelope->Body->ArtifactResolve;
741    res = zxsig_validate(cf->ctx, ent->sign_cert,
742                         r->Envelope->Body->ArtifactResolve->Signature,
743                         1, refs);
744    if (res == ZXSIG_OK) {
745      D("sig vfy ok %d", res);
746    } else {
747      ERR("sig vfy failed due to(%d)", res);
748    }
749
750This code illustrates
751
7521. You have to determine who signed and provide the entity
753   object that corresponds to the signer. Often you
754   would determine the entity from <Issuer> element somewhere
755   inside the message.
756
757   The entity is used for retrieving the signing certificate.
758   Another alternative is that the signature itself contains
759   a <KeyInfo> element and you extract the certificate from
760   there. You would still need to have a way to know if you
761   trust the certificate.
762
7632. You have to prepare the refs array. It contains pairs of
764   <SignedInfo><Reference> specifications combined with the
765   actual elements that are signed. Generally the URI
766   XML attribute of the <Reference> element points to the
767   data that was signed. However, it is application dependent
768   what type of ID XML attribute the URI actually references
769   or the URI could even reference something outside the
770   document. It would be way too unreliable for the
771   zxsig_validate() to attempt guessing how to locate the
772   signed data: therefore we push the responsibility to
773   you. Your code will have to walk the data to locate
774   all referenced bits and pieces.
775
776   In the above example, locating the one signed bit was
777   very easy: the specification says where it is (and this
778   location is fixed so there really is no need to check
779   the URI either).
780
781   You pass the length of the refs array and the array
782   itself as two last arguments to zxsig_validate().
783
7843. You need to locate the <Signature> element in the document
785   and pass it as argument to zxsig_validate(). Usually
786   a protocol specification will say where the <Signature>
787   element is to be found, so locating it is not difficult.
788
7894. The return value will indicate validation status. ZXSIG_OK,
790   which has numerical value of 0, indicates success. Other
791   nonzero values indicate various kinds of failure.
792
79312.4.3 Certificate Validation and Trust Model
794~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
795
796Trust models for TLS and signature validation are separate. TLS layer
797is handled mainly by libcurl or in case of ClientTLS, by the https web
798server (which is not part of zxid).
799
800In signature validation the primary trust mechanism is that entity's
801metadata specifies the signing certificate and there is no
802Certification Authority check at all.<<footnote: If you develop CA
803check, please submit patches to ZXID project.>>
804This model works well if you control the admission
805to your CoT. However, ZXID ships by default with the
806automatic CoT feature turned on, thus anyone can get
807added to the CoT and therefore signature with any
808certificate they declare is "valid". This hardly
809is acceptable for anything involving money.
810
81112.5 Data Accessor Functions
812----------------------------
813
814Simple read access to data should, in C, be done by
815simply referencing the fields of the struct, e.g.
816
817  if (!r->EntitiesDescriptor->EntityDescriptor)
818      goto bad_md;
819
820*** TBW
821
82212.6 Memory Allocation and Free
823-------------------------------
824
825*** TBW
826
82712.7 Walking the data structure
828-------------------------------
829
830*** TBW
831
83212.9 Thread Safety
833------------------
834
835All generated libraries are designed to be thread safe, provided
836that the underlying libc APIs, such as malloc(3) are thread safe.
837
838
83915 Creating New Interfaces Using ZXID Methodology
840=================================================
841
842The ZXID code generation methodology can be used to create
843interfaces to any XML document or protocol that can be
844described as a Schema Grammar (which includes any document
845that can be expressed as XML Schema - XSD). The general
846steps are
847
8481. Convert .xsd file to .sg, or write the .sg directly. For conversion,
849   you would typically use a command like
850
851     ~/pd/xsd2sg.pl <foo.xsd >foo.sg
852
8532. Tweak and rationalize the resulting .sg file. In ideal world
854   any construct expressible as .xsd should be nicely representable,
855   but in practise some work better than others, thus you can create
856   a much nicer interface if you invest in some manual tweaking.
857
858   Note that the tweaked .sg still is able to represent the
859   same document as the original .xsd described, though
860   often the tweaking causes some relaxation.
861
862   Most common tweaks
863
864   a. If the .xsd is written so that the targeted namespace is
865      also the default namespace, you should introduce
866      a namespace prefix because this is needed during
867      code generation to keep different C identifiers
868      from clashing with each other. Ideally you
869      should coordinate the namespace prefixes globally
870      so that even two different projects will not clash.
871
872   b. Where the choice construct is used, indicated
873      by pipey symbol (|) in the .sg file, you
874      should refactor these into sequences of
875      zero-or-one occurrence (?) instances of the alternatives
876      of the choice. This is needed because for the foreseeable
877      future xsd2sg.pl has a limitation in code generation
878      feature. If the choice has maxOccurs="unbounded"
879      you should use (*) instead.
880
881   c. xml:lang and other similar attributes may need to
882      be factored open to be just of type %xs:string. This
883      is a bug in xsd2sg.pl
884
8853. "Connect" the schema to bigger framework. Usually this
886   means adding your schema grammar to the ZX_SG variable
887   in zxid/Makefile and supplying additional -r flags
888   in ZX_ROOT variable. This allows your new schema to
889   be visible at top level.
890
891   If your schema is meant to extend leafs or interior nodes of
892   the parse tree, such as SOAP Body, you would edit
893   the SOAP schema to accept your
894   new protocol elements in the Body. Or that the generic SOAP
895   header can accept your specific header schemata, or that
896   the SAML attribute definitions accept your kind of
897   attributes - whatever makes sense in your context.
898
899   Alternative to this is to create an entirely new
900   monolithic encoder decoder, i.e. instead of extending
901   the existing ZXID project to accommodate your new
902   protocol, you just start a new project that uses the
903   same methodology. You should see how the SAML protocol
904   part is separated from the SAML metadata parsing and
905   from the WSF parsing in the existing project.
906
90717 Code Generation Tools
908========================
909
910Main work horse of code generation is xsd2sg.pl, which serves multiple
911purposes
912
9131. Build hashes of all declarations in .sg input. Each hash element consists
914   of array of elements and attributes, as well as groups and attribute groups.
915   The type of array element sis determined from prefix, per .sg rules.
9162. Expand groups and attribute groups
9173. Evaluate each element wrt its type and generate
918   a. C data structures
919   b. Decoder grammar
920   c. Token descriptions for perfect hash and lexical analyzer
921   d. Encoder C code
922
923The code to build hashes is interwoven in the code that generates .xsd
924from .sg. The rest of the generation happens in a function called
925generate().
926
927Typical command line (to generate SAML 2.0 protocol engine)
928
929  ~/plaindoc/xsd2sg.pl -d -gen saml2 -p zx_ \
930       -r saml:Assertion -r se:Envelope \
931       -S \
932       sg/saml-schema-assertion-2.0.sg \
933       sg/saml-schema-protocol-2.0.sg \
934       sg/xmldsig-core.sg \
935       sg/xenc-schema.sg \
936       sg/soap11.sg \
937       >/dev/null
938
939<<ignore: ~/plaindoc/xsd2sg.pl -d -gen saml2 -p zx_ -r saml:Assertion -r se:Envelope -S sg/saml-schema-assertion-2.0.sg sg/saml-schema-protocol-2.0.sg sg/xmldsig-core.sg sg/xenc-schema.sg sg/soap11.sg >/dev/null >>
940
941To generate SAML 2.0 Metadata engine you would issue
942
943  ~/plaindoc/xsd2sg.pl -d -gen saml2md -p zx_ \
944       -r md:EntityDescriptor -r md:EntitiesDescriptor \
945       -S \
946       sg/saml-schema-assertion-2.0.sg \
947       sg/saml-schema-metadata-2.0.sg \
948       sg/xmldsig-core.sg \
949       sg/xenc-schema.sg \
950       >/dev/null
951
952<<ignore: ~/plaindoc/xsd2sg.pl -d -gen saml2md -p zx_ -r md:EntityDescriptor -r md:EntitiesDescriptor -S sg/saml-schema-assertion-2.0.sg sg/saml-schema-metadata-2.0.sg sg/xmldsig-core.sg sg/xenc-schema.sg >/dev/null >>
953
95417.1 Special Support for Specific Programming Languages
955-------------------------------------------------------
956
957While C code generation is the main output, and this can always be
958converted to other languages using SWIG, sometimes a more natural
959language interface can be built by directly generating it.
960
961We plan to enhance the code generation to do something like this. At
962least direct hash-of-hashes-of-arrays-of-hashes type data-structure
963generation for benefit of some scripting languages is planned.
964
965<<if: ZXIDBOOK>>
966<<else: >>
967
96818 ZXID SP
969==========
970
971*** warning: not checked lately, may be wrong!
972
973<<table: ZXID SP URLs
974URL          Description
975============ =======================================================
976/zxid        Same as o=M. Main convenience entry point
977/zxid?o=M    SSO with CDC; or management if already logged in
978/zxid?o=C    Common Domain Cookie (CDC) reader, usually under common domain host name.
979/zxid?o=E    SSO after CDC read; or management if already logged in.
980/zxid?o=P    HTTP POST end point. Used for forms and last part of POST profile SSO.
981/zxid?o=Q    HTTP binding (POST or redirect) request end point (e.g. SLO, MNI).
982/zxid?o=S    SOAP end point (HTTP POST)
983/zxid?o=B    Get SP metadata (or combined SP and IdP metadata if proxying).
984>>
985
98696 License
987==========
988
989Copyright (c) 2006-2009 Symlabs (symlabs@symlabs.com), All Rights Reserved.
990Author: Sampo Kellom�ki (sampo@iki.fi)
991
992Licensed under the Apache License, Version 2.0 (the "License");
993you may not use this file except in compliance with the License.
994You may obtain a copy of the License at
995http://www.apache.org/licenses/LICENSE-2.0
996
997Unless required by applicable law or agreed to in writing, software
998distributed under the License is distributed on an "AS IS" BASIS,
999WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1000See the License for the specific language governing permissions and
1001limitations under the License.
1002
100396.2 Specification IPR
1004----------------------
1005
1006ZXID is based on open SAML and Liberty specifications. The parties
1007that have developed these specifications, including Symlabs, have made
1008Royalty Free (RF) licensing commitment. Please ask OASIS and Liberty
1009Alliance for the specifics of their IPR policies and IPR disclosures.
1010
1011Some protocols, such as WS-Trust and WS-Federation enjoy Microsoft's
1012pledge<<footnote: If you have a reference to where this pledge can be
1013found, please let me know so it can be included here.>> that they will
1014not sue you even if you implement these specifications. You should
1015evaluate yourself whether this is good enough for your situation.
1016
1017<<zxid-ref.pd>>
1018
1019<<doc-end.pd>>
1020<<notapath: TCP/IP a.k.a xBSD/Unix n/a Perl/mod_perl PHP/mod_php Java/Tomcat>>
1021<<EOF: >>
1022<<fi: >>