1<<if: ZXIDBOOK>> 2<<else: >>ZXID Low Level ("Raw") API 3########################## 4<<author: Sampo Kellom�ki (sampo@iki.fi)>> 5<<cvsid: $Id: zxid-raw.pd,v 1.5 2010-01-08 02:10:09 sampo Exp $>> 6<<class: article!a4paper!!ZXID-RAW 01>> 7<<define: ZXDOC=ZXID Raw API>> 8 9<<abstract: 10 11ZXID.org Identity Management toolkit implements standalone SAML 2.0 and 12Liberty ID-WSF 2.0 stacks. This document describes the low level API. 13 14>> 15 16<<maketoc: 1>> 17 181 Introduction 19============== 20 21Here we describe the general philosophy of the ZXID low level 22APIs. Some function level documentation is available from 23<<link:../ref/html/index.html: Function reference>>. 24 25Before you barge head first to use the raw API, you should 26check if the +easy+ and simple API in <<link:zxid-simple.html: zxid_simple()>> 27meets your needs. Or you may be able to use 28<<link:../mod_auth_saml/mod_auth_saml.html: mod_auth_saml>> 29and not have to program at all. 30 31Happy hacking! 32 331.1 Other documents 34------------------- 35 36<<doc-inc.pd>> 37<<htmlpreamble: <title>ZXID Low Level ("Raw") API</title><link type="text/css" rel=stylesheet href="zx.css"><body><h1>ZXID Low Level ("Raw") API</h1> >> 38 3912 Full Native C API 40==================== 41<<fi: >> 42 43The generated aspects of the native C API are in c/*-data.h, for example 44 45 c/zx-sa-data.h 46 47Studying this file is very instructive.<<footnote: emacs tip: run 48`make tags' and then try hitting M-. while cursor is over a struct 49or function name in c/zx-sa-data.h - this makes navigation painless.>> 50 5112.1 C Data Structures 52---------------------- 53 54From .sg a header (NN-data.h) is generated. This header contains structs that 55represent the data of the elements. Each element and attribute 56generates its own node. Even trivial nodes like strings have to be 57kept this way because the nodes form basis of remembering the ordering 58of data. This ordering is needed for exclusive XML canonicalization, 59and thus for signature verification.<<footnote: It's unfortunate that 60the XML standards do not make this any easier. Without order 61maintenance requirement, it would be possible to represent trivial 62child elements directly as struct fields. An approach that tried to do 63just this is available from CVS tag GEN_LALR (ca. 29.5.2006).>> 64 65Any missing data is represented by NULL pointer. 66 67Any repeating data is kept as a linked list, in reverse order of being 68seen in the data stream.<<footnote: Reverse order is just an 69optimization - or an artifact of simply adding latest element to the 70head of the list. If this bothers you, it's easy enough to reverse the 71list afterwards. Linked list is simple and works well for data whose 72order does not matter much (we use separate pointer for remembering 73the canonicalization order) and where random access is not needed, or 74cardinality is low enough so that simple pointer chasing is efficient 75enough.>> 76 77<<ignore: *** Problem here: how to preserve ordering of elements. We need to 78 * do SO canonicalization as there are new elements, yet we would like 79 * to maintain WO as much as possible, especially for elements for which 80 * we do not have schema ("any" elements). Always reverse any elem list? 81>> 82 83Simple elements and all attributes are represented by simple string node 84(even if they are booleans or integers). 85 86*Example* 87 88Consider following XML 89 90 <ds:Signature> 91 <ds:SignedInfo> 92 <ds:CanonicalizationMethod 93 Algorithm="http://w3.org/xml-exc-c14n#"/> 94 <ds:SignatureMethod 95 Algorithm="http://w3.org/xmldsig#rsa-sha1"/> 96 <ds:Reference 97 URI="#RrcrNwFIw6n"> 98 <ds:Transforms> 99 <ds:Transform 100 Algorithm="http://w3.org/xml-exc-c14n#"/> 101 <ds:Transform 102 Algorithm="http://w3.org/xmldsig#env-sig"/></> 103 <ds:DigestMethod 104 Algorithm="http://w3.org/xmldsig#sha1"/> 105 <ds:DigestValue>lNIzVMrp8CwTE=</></></> 106 <ds:SignatureValue>GeMp7LS...vnjn8=</></> 107 108Decoding would produce the data structure in Fig-<<see: fig:decode-data>>. You 109should also look at c/zx-sa-data.h to see the structs involved in this 110example. 111 112<<dot: decode-data: Typical data structure produced by decode. 113 114// This graph crashes dot 1.12, but works in dot 2.8, seems to crash 2.20.2 115 116size="11.0,6.0" 117margin=0 118rankdir=LR 119 120{ rank=same; siginfo; sigval; } 121{ rank=same; canonmeth; sigmeth; ref; } 122//{ rank=same; canonmeth; sigmeth; ref; digmeth; digval; } 123//{ rank=same; xforms; xform_env; xform_c14n; } 124//{ rank=same; xform_env; xform_c14n; digmeth; digval; } 125{ rank=same; xforms; digmeth; digval; } 126{ rank=same; xform_c14n; xform_env; } 127 128sig [shape=record,label="zx_ds_Signature_s|{|{<f_kids>gg.kids|<f_siginfo>SignedInfo|<f_sigval>SignatureValue|KeyInfo (0)|Object (0)|Id (0)}}"]; 129siginfo [shape=record,label="zx_ds_SignedInfo_s|{|{<f_kids>gg.kids|<f_wo>gg.g.wo|<f_canonmeth>CanonicalizationMethod|<f_sigmeth>SignatureMethod|<f_ref>Reference|Id (0)}}"]; 130 131canonmeth [shape=record,label="zx_ds_CanonicalizationMethod_s|{|{<f_wo>gg.g.wo|Algorithm\n\"http://w3.org/xml-exc-c14n#\"}}"]; 132 133sigmeth [shape=record,label="zx_ds_SignatureMethod_s|{|{<f_wo>gg.g.wo|Algorithm\n\"http://w3.org/xmldsig#rsa-sha1\"}}"]; 134 135ref [shape=record,label="zx_ds_Reference_s|{|{<f_kids>gg.kids|gg.g.wo (0)|<f_xforms>Transforms|<f_digmeth>DigestMethod|<f_digval>DigestValue|Id (0)|Type (0)|URI\n\"#RrcrNwFIw6n\"}}"]; 136 137xforms [shape=record,label="zx_ds_Transforms_s|{|{<f_kids>gg.kids|<f_wo>gg.g.wo|gg.g.n (0)|<f_xform>Transform}}"]; 138 139xform_c14n [shape=record,label="zx_ds_Transform_s|{|{<f_wo>gg.g.wo|gg.g.n (0)|XPath (0)|<f_c14n_algo>Algorithm\n\"http://w3.org/xml-exc-c14n#\"}}"]; 140 141xform_env [shape=record,label="zx_ds_Transform_s|{|{gg.g.wo (0)|<f_n>gg.g.n|XPath (0)|Algorithm\n\"http://w3.org/xmldsig#env-sig\"}}"]; 142 143xforms:f_xform -> xform_env 144xform_env:f_n -> xform_c14n 145 146digmeth [shape=record,label="zx_ds_DigestMethod_s|{|{<f_wo>gg.g.wo|Algorithm\n\"http://w3.org/xmldsig#sha1\"}}"]; 147digval [shape=record,label="zx_elem_s|{|{gg.g.wo (0)|content\n\"lNIzVMrp8CwTE=\"}}"]; 148 149sigval [shape=record,label="zx_ds_SignatureValue_s|{|{gg.g.wo (0)|gg.content\n\"GeMp7LS...vnjn8=\"|Id (0)}}"]; 150 151sig:f_siginfo -> siginfo 152sig:f_sigval -> sigval 153 154siginfo:f_canonmeth -> canonmeth 155siginfo:f_sigmeth -> sigmeth 156siginfo:f_ref -> ref 157 158ref:f_xforms -> xforms 159ref:f_digmeth -> digmeth 160ref:f_digval -> digval 161 162sig:f_kids ->siginfo [weight=0,arrowhead=empty,color=red] 163 164siginfo:f_wo ->sigval [weight=0,arrowhead=empty,color=red] 165siginfo:f_kids -> canonmeth [weight=0,arrowhead=empty,color=red] 166canonmeth:f_wo -> sigmeth [weight=0,arrowhead=empty,color=red] 167sigmeth:f_wo -> ref [weight=0,arrowhead=empty,color=red] 168 169ref:f_kids -> xforms [weight=0,arrowhead=empty,color=red] 170xforms:f_wo -> digmeth [weight=0,arrowhead=empty,color=red] 171digmeth:f_wo -> digval [weight=0,arrowhead=empty,color=red] 172 173xforms:f_kids -> xform_c14n [weight=0,arrowhead=empty,color=red] 174xform_c14n:f_wo -> xform_env [weight=0,arrowhead=empty,color=red] 175 176>> 177 178There are two pointer systems at play here. The black solid arrows 179depict the logical structure of the XML document. For each child 180element there is a struct field that simply points to the child. If 181there are multiple occurrences of the child, as in 182~sig->SignedInfo->Reference->Transforms->Transform~, the children are 183kept in a linked list connected by gg.g.n (next) fields.<<footnote: 184This linked list may be in inverted order depending on the phase of 185the moon and position of the trams in Helsinki. Until implementation 186matures, its better not to depend on the ordering.>> 187 188The +wire order+ structure, depicted by red hollow arrows, is 189maintained using gg.kids as root and gg.g.n for next pointer. For example 190~sig->SignedInfo->Reference->Transforms~ keeps its kids, the 191~zx_ds_Transform~ objects, in the original order hanging from the kids 192and linked with the gg.g.n field. As can be seen, the order kept with gg.g.n 193fields can be different than the one kept using <<tt: n>> (next) 194fields. <<footnote: Sometime before R1.0 the scheme changed to only 195have gg.g.n pointers and making even wireorder lists use them. Thus 196wo pointers no longer exist.>> 197 198What's more, the kids list can contain dissimilar objects, witness 199~sig->SignedInfo->Reference->gg.kids~. The wire order representation 200is only captured when decoding the document and is mainly useful for 201correctly canonicalizing the document for signature verification. If 202you are building a data structure in your own program, you typically 203will not set the gg.kids and gg.g.wo fields. 204 205In the diagram, the objects of type ~zx_str~ were collapsed to 206double quoted strings. Superfluous gg.kids, gg.g.wo, and gg.g.n fields 207were omitted: they exist in all structures, but are not shown when 208they are ~NULL~. The ~NULL~ is depicted as zero (0).<<footnote: All 209this gg.g business is just C's way of referencing the fields of a 210common base type of element objects.>> 211 212 213<<notacountry: so wo>> 214 21512.1.1 Handling XML Namespaces 216~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 217 218An annoying feature of XML documents is that they have variable 219namespace prefixes. The namespace prefix for the unqualified elements 220is taken to be the one specified in target() directive of the .sg 221input. Name of an element in C code is formed by prefixing the element 222by the namespace prefix and an underscore. 223 224Attributes will only have namespace prefix if such was expressly 225specified in .sg input. 226 227When decoding, the actual namespace prefixes are recorded. The wire 228order encoder knows to use these recorded prefixes so that accurate 229canonicalization for XMLDSIG can be produced. 230 231If the message on wire uses wrong namespaces, the wrong ones are 232remembered so that canonicalization for signature validation will work 233irrespective. The ability to accept wrong namespaces only works as 234long as there is no ambiguity as to which tag was meant - there are 235some tags that need namespace information to distinguish. If you hit 236one of these then either you get lucky and the one that is arbitrarily 237picked by the decoder happens to be the correct one, or you are stuck 238with no easy way to make it right. Of course the XML document was 239wrong to start with so theoretically this is not a concern. Generally 240the more schemata that are simultaneously generated to one package, the 241greater the risk of collisions between tags. 242 243The schema order encoder always uses the prefixes defined 244using target() directives in .sg files. The runtime notion of 245namespaces is handled by ~ns_tab~ field of the decoding and encoding 246context. It is initialized to contain all namespaces known by virtue 247of .sg declarations. The runtime assigned prefixes are held in a 248linked list hanging from <<tt: n>> (next) field of ~struct 249zx_ns_s~. (*** more work needed here) 250 251The code generation creates a file, such as c/zx-ns.c, which contains 252initialization for the table. The main program should point the ~ns_tab~ 253field of context as follows: 254 255 main { 256 struct zx_ctx* ctx; 257 ... 258 ctx->ns_tab = zx_ns_tab; /* Here zx_ is the prefix chosen in code generation */ 259 } 260 261Consider the following evil contortion 262 263 <e:E xmlns:e="uri"> 264 <h:H xmlns:h="uri"/> 265 <b:B xmlns:b="uri"> 266 <e:C xmlns:e="uri"/> 267 <e:D xmlns:e="iru"> 268 <e:F xmlns:e="uri"/></></></> 269 270Assuming the ~ns_tab~ assigns prefix <<tt: y>> to the namespace 271URI, we would have following data structure as a result of a decode 272 273<<dot: ns-data,,: Decode of XML and resulting namespace structures. 274margin=0 275//rankdir=LR 276 277{ rank=same; ns_tab; e; h; b; } 278{ rank=same; H; B; } 279{ rank=same; C; D; } 280 281ns_tab [shape=record,label="{ns_tab|{y|uri|<uri_n>}|{z|iru|<iru_n>}}"] 282 283e [shape=record,label="e|uri|<n>"] 284h [shape=record,label="h|uri|<n>"] 285b [shape=record,label="b|uri|0"] 286i [shape=record,label="e|iru|0"] 287 288ns_tab:uri_n -> e 289ns_tab:iru_n -> i 290e:n -> h 291h:n -> b 292 293E -> H [style=bold] 294E -> B [style=bold] 295B -> C [style=bold] 296B -> D [style=bold] 297D -> F [style=bold] 298 299E -> e [color=red,arrowhead=empty] 300H -> h [color=red,arrowhead=empty] 301B -> b [color=red,arrowhead=empty] 302C -> e [color=red,arrowhead=empty] 303D -> i [color=red,arrowhead=empty] 304F -> e [color=red,arrowhead=empty] 305>> 306 307The red hollow arrows indicate how the elements reference the 308namespaces. Since none of the elements used the prefix originally 309specified in the schema grammar target() directive, we ended up 310allocating "alias" nodes for the uri. However, since E and C use the 311same prefix, they share the alias node. Things get interesting with D: 312it redefines the prefix e to mean different namespace URI, "iru", which 313happens to be an alias of prefix z. 314 315Later, when wire order canonical encode is done, the red thin arrows 316are chased to determine the namespaces. However, we need to keep a 317separate "seen" stack to track whether parent has already declared the 318prefix and URI. E would declare xmlns:e="uri", but C would not because 319it had already been "seen". However, F would have to declare it again 320because the xmlns:e="iru" in D masks the declaration. The ~zx_ctx~ 321structure is used to track the namespaces and "seen" status 322through out decoders and encoders. 323 324<<dot: seen-data,,: Seen data structure (blue dotted and green dashed arrows) in the end of decoding F. S=seen, SN=seen_n. 325margin=0 326//rankdir=LR 327 328{ rank=same; ns_tab; ee; e; h; b; } 329{ rank=same; H; B; } 330{ rank=same; C; D; } 331 332ns_tab [shape=record,label="{ns_tab|{P|URI|S|SN|N}|{y|uri|0|0|<uri_n>}|{z|iru|0|0|<iru_n>}}"] 333 334e [shape=record,label="e|uri|0|0|<n>"] 335ee [shape=record,label="e|uri|<s>|0|<n>"] 336h [shape=record,label="h|uri|0|<sn>|<n>"] 337b [shape=record,label="b|uri|0|<sn>|0"] 338i [shape=record,label="e|iru|<s>|0|0"] 339 340ctx [shape=record,label="{ctx|{|{<ns>ns_tab|<sn>seen_n}}}"] 341 342ns_tab:uri_n -> ee 343ns_tab:iru_n -> i 344ee:n -> e 345e:n -> h 346h:n -> b 347 348E -> H [style=bold] 349E -> B [style=bold] 350B -> C [style=bold] 351B -> D [style=bold] 352D -> F [style=bold] 353 354E -> e [color=red,arrowhead=empty] 355H -> h [color=red,arrowhead=empty] 356B -> b [color=red,arrowhead=empty] 357C -> e [color=red,arrowhead=empty] 358D -> i [color=red,arrowhead=empty] 359F -> ee [color=red,arrowhead=empty] 360 361ns_tab -> ctx:ns [arrowhead=none,arrowtail=normal] 362b -> ctx:sn [color=blue,style=dotted,arrowhead=none,arrowtail=normal] 363b:sn -> h [color=blue,style=dotted] 364h:sn -> ee [color=blue,style=dotted] 365ee:s -> i [color=green,style=dashed] 366i:s -> e [color=green,style=dashed] 367>> 368 369Here we can see how the ~seen_n~ list, represented by the blue dotted 370arrows, was built: at the head of the list, ~ctx->seen_n~, is the last 371seen prefix, namely b (because, although the meaning of e at F was 372different, e as a prefix had already been seen earlier at E), followed 373by other prefixes in inverse order of first occurrence.<<footnote: This 374is a mere artifact of implementation: it's cheapest to add to the head 375of the list. This may change in future.>> The green dashed arrows from 376e:uri to e:iru and then on to second e:uri reflect the fact that e:uri 377(second) was put to the list first (when we were at E), but later, at 378D, a different meaning, iru, was given to prefix e. Finally at F we 379give again a different meaning for e, thus pushing to the "seen stack" 380another node. Although e at E and at F have namespace URI, "uri", we are 381not able to use the same node because we need to keep the stack order. 382Thus we are forced to allocate two identical nodes. 383 38412.1.2 Handling any and anyAttribute 385~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 386 387Since our aim is to be lax in what we accept, every element can handle 388unexpected additional attributes as well as unexpected elements. Thus 389whether the schema specifies any or anyAttribute or not, we handle 390everything as if they were there. However, when attributes and 391elements are received outside of their expected context, they are 392simply treated as strings with string names. This is true even for 393those attributes and elements that would be recognizable in their 394proper context. 395 396The any extension points, as well as some bookkeeping data 397are hidden inside ~ZX_ELEM_EXT~ macro. If you tinker with 398this macro, be sure you know what you are doing. If you want 399to add your own specific fields to all structs, redefining 400~ZX_ELEM_EXT~ may be appropriate, but if you want to add more 401fields only to some specific structures, you can define 402a macro of form 403 404 TPF_EEE_EXT 405 406and put in it whatever fields you want. These fields will be 407initialized to zero when the structure is created, but are not touched 408in any other way by the generated code. In particular, if some of your 409fields are pointers, it will be your responsibility to free them. The 410standard free functions will not understand to free them. See the data 411structure walking functions, below for one way to accomplish this. 412 41312.1.3 Root data structure 414~~~~~~~~~~~~~~~~~~~~~~~~~~ 415 416The root data structure 417 418 struct zx_root_s; 419 420is a special structure that has a field for every top level 421recognizable element. 422 42312.1.4 Per element data structures 424~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 425 426*** TBW 427 42812.1.5 Memory Allocation 429~~~~~~~~~~~~~~~~~~~~~~~~ 430 431After decoding all string data points directly into the input buffer, 432i.e. strings are NOT copied. Be sure to not free the input buffer 433until you are done processing the data structure. If you need to take 434a copy of the strings, you will need to walk the data structure as a 435post processing step and do your copies. This can be done using 436 437 void TPF_dup_strs_len_NS_EEE(struct zx_dec_ctx* c, struct TPF_NS_EEE_s* x); 438 439The structures are allocated via ZX_ZALLOC() macro, which 440by default calls zx_zalloc() function, which in turn 441uses system malloc(3). However, you can redefine the 442macro to use whatever other allocation scheme you desire. 443 444The generated libraries never free(3) memory. In many programming 445patterns, this is actually desirable: for example a CGI program can 446count on dying - the process exit(2) will free all the memory. 447 448If you need to free(3) the data structure, you will need to walk it 449using 450 451 void TPF_free_len_NS_EEE(struct zx_dec_ctx* c, 452 struct TPF_NS_EEE_s* x, 453 int free_strings); 454 void zx_free_any(struct zx_dec_ctx* c, 455 struct zx_note_s* n, 456 int free_strs); 457 458The zx_free_any() works by having a gigantic switch statement that calls 459the appropriate specific free function. 460 461You can deep clone the data structure with 462 463 void TPF_deep_clone_NS_EEE(struct zx_dec_ctx* c, 464 struct TPF_NS_EEE_s* x, 465 int dup_strings); 466 struct zx_note_s* zx_clone_any(struct zx_dec_ctx* c, 467 struct zx_note_s* n, 468 int dup_strs); 469 470The zx_clone_any() works by having a gigantic switch statement that calls 471the appropriate specific free function. 472 47312.2 Decoder as Recursive Descent Parser 474---------------------------------------- 475 476The entry point to the decoder is 477 478 struct zx_root_s* zx_DEC_root(struct zx_dec_ctx* c, 479 struct zx_ns_s* dummy, 480 int n_decode); 481 482The decoding context holds pointer to the raw data and must be 483initialized prior to calling the decoder. The third argument specifies 484how many recognized elements are decoded before returning. Usually you 485would specify 1 to consume one top level element from the 486stream.<<footnote: The second argument, the dummy namespace, is 487meaningless for root node, but makes sense for element decoders. For 488root you can simply supply 0 (NULL).>> 489 490The returned data structure, ~struct zx_root_s~, contains 491one pointer for each type of top level element that can 492be recognized. The ~tok~ field of the returned value 493identifies the last top level element recognized and can 494be used to dispatch to correct request handler: 495 496 zx_prepare_dec_ctx(c, TPF_ns_tab, start_ptr, end_ptr); 497 struct TPF_root_s* x = TPF_DEC_root(c, 0, 1); 498 switch (x->gg.g.tok) { 499 case TPF_NS_EEE_ELEM: return process_EEE_req(x->NN_EEE); 500 } 501 502When processing responses, it is generally already known 503which type of response you are expecting, so you can simply 504check for NULLness of the respective pointer in the returned 505data structure. 506 507Internally zx_DEC_root() works much the same way: it scans 508a beginning of an element from the stream, looks up the token 509number corresponding to the element name, and switches on 510that, calling element specific decoder functions (see next 511section) to do the detailed processing. 512 513In the above code fragment, you should note the call to 514zx_prepare_dec_ctx() which initializes the decoder machinery. 515It takes +ns_tab+ argument, which specifies which namespaces 516will be recognized. This table MUST match the TPF_DEC_root() 517function you call (i.e. both must have been generated as 518part of the same xsd2sg.pl invocation). The other arguments 519are the start of the buffer to decode and pointer one past 520the end of the buffer to decode. 521 52212.2.1 Element Decoders 523~~~~~~~~~~~~~~~~~~~~~~~ 524 525For each recognizable element there is a function of form 526 527 struct TPF_NS_EEE_s* zx_DEC_NS_EEE(struct zx_dec_ctx* c); 528 529where TPF is the prefix, NS is the namespace prefix, and 530EEE is the element name. For example: 531 532 struct zx_se_Envelope_s* zx_DEC_se_Envelope(struct zx_ctx* c); 533 534These functions work much the same way as the root decoder. You 535should consult dec-templ.c for the skeleton of the decoder. Generally 536you should not be calling element specific decoders: they 537exist so that zx_DEC_root() can call them. They have somewhat 538nonintuitive requirements, for example the opening <, the 539namespace prefix, and the element name must have already been 540scanned from the input stream by the time you call element 541specific decoder. 542 54312.2.2 Decoder Extension Points 544~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 545 546The generated code is instrumented with following macros 547 548ZX_ATTR_DEC_EXT(ss):: Extension point called just after decoding known attribute 549ZX_XMLNS_DEC_EXT(ss):: Extension point called just after decoding xmlns attribute 550ZX_UNKNOWN_ATTR_DEC_EXT(ss):: Extension point called just after decoding unknown attr 551ZX_START_DEC_EXT(x):: Extension point called just after decoding element name 552 and allocating struct, but before decoding any of the attributes. 553ZX_END_DEC_EXT(x):: Extension point called just after decoding the entire element. 554ZX_START_BODY_DEC_EXT(x):: Extension point called just after decoding element tag, including attributes, but before decoding the body of the element. 555ZX_PI_DEC_EXT(pi):: Extension point called just after decoding processing instruction 556ZX_COMMENT_DEC_EXT(comment):: Extension point called just after decoding comment 557ZX_CONTENT_DEC(ss):: Extension point called just after decoding string content 558ZX_UNKNOWN_ELEM_DEC_EXT(elem):: Extension point called just after decoding unknown element 559 560Following macros are available to the extension points 561 562TPF:: Type prefix (as specified by -p during code generation) 563EL_NAME:: Namespaceful element name (NS_EEE) 564EL_STRUCT:: Name of the struct that describes the element 565EL_NS:: Namespace prefix of the element (as seen in input schema) 566EL_TAG:: Name of the element without any namespace qualification. 567 56812.3 Exclusive Canonical Encoder (Serializer) 569--------------------------------------------- 570 571The encoder receives a C data structure and generates a gigantic 572string containing an XML document corresponding to the data structure 573and the input schemata. The XML document conforms to the rules of 574exclusive XML canonicalization and hence is useful as input to XMLDSIG. 575 576One encoder is generated for each root node specified at the code 577generation. Often these encoders share code for interior nodes. 578 579The encoders allow two pass rendering. You can first use the length 580computation method to calculate the amount of storage needed and 581then call one of the rendering functions to actually render. Or 582if you simply have large enough buffer, you can just render directly. 583 584The encoders take as argument next free position in buffer 585and return a char pointer one past the last byte used. Thus 586you can discover the length after rendering by subtracting the 587pointers. This is guaranteed to result same length as returned 588by the length computation method.<<footnote: This is a useful 589sanity check. If the two ever disagree, please report a bug.>> 590You can also call the next encoder with the return value 591of the previous encoder to render back-to-back elements. 592 593The XML namespace and XML attribute handling of the encoders 594is novel in that the specified sort is done already at code 595generation time, i.e. the renderers are already in the order 596that the sort mandates. 597 598For attributes we know the sort order directly from the schema 599because [XML-C14N], sec 2.2, p.7, specifies that they 600sort first by namespace URI and then by name, both of which 601we know from the schema. 602 603For ~xmlns~ specifications the situation is similarly easy in the 604schema order encoder case because we know the namespace prefixes 605already at code generation time. However, for the wire order encoder 606we actually need a runtime sort because we can not control which 607namespace prefixes get used. However, for both cases we can make a 608pretty good guess about which namespaces might need to be declared at 609any given element: the element's own namespace and namespaces of each 610of its attributes. That's all, and it's all known at code generation 611time. At runtime we only need to check if the namespace has already 612been seen at outer layer. 613 61412.3.1 Length computation 615~~~~~~~~~~~~~~~~~~~~~~~~~ 616 617Compute length of an element (and its subelements). The XML attributes 618and elements are processed in schema order. 619 620 int TPF_LEN_SO_NS_EEE(struct zx_ctx* c, 621 struct TPF_NS_EEE_s* x); 622 623For example: 624 625 int zx_LEN_SO_se_Envelope(struct zx_ctx* c, 626 struct zx_se_Envelope_s* x); 627 628Compute length of an element (and its subelements). The XML namespaces 629and elements are processed in wire order. 630 631 int TPF_LEN_WO_NS_EEE(struct zx_ctx* c, 632 struct TPF_NS_EEE_s* x); 633 634For example: 635 636 int zx_LEN_WO_se_Envelope(struct zx_ctx* c, 637 struct zx_se_Envelope_s* x); 638 63912.3.2 Encoding in schema order 640~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 641 642Render an element into string. The XML elements are processed in 643schema order. The xmlns declarations and XML attributes are always 644sorted per [XML-EXC-C14N] rules.<<footnote: The sort is actually done 645already at code generation time by xsd2sg.pl.>> This is what you 646generally want for rendering new data structure to a string. The wo 647pointers are not used. 648 649 char* TPF_ENC_SO_NS_EEE(struct zx_ctx* c, 650 struct TPF_NS_EEE_s* x, 651 char* p); 652 653For example: 654 655 char* zx_ENC_SO_se_Envelope(struct zx_ctx* c, 656 struct zx_se_Envelope_s* x, 657 char* p); 658 659Since it is a very common requirement to allocate correct 660sized buffer and then render an element, a helper function 661is provided to do this in one step. 662 663 struct zx_str* zx_EASY_ENC_SO_se_Envelope(struct zx_ctx* c, 664 struct zx_se_Envelope_s* x); 665 666The returned string is allocated from allocation arena described 667by ~zx_ctx~. 668 66912.3.3 Encoding in wire order 670~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 671 672Render element into string. The XML elements are 673processed in wire order by chasing wo pointers. This is what you want 674for validating signatures on other people's XML documents. If the wire 675representation was schema invalid, e.g. elements were in wrong order, 676the wire representation is still respected, except for xmlns 677declarations and XML attributes, which are always sorted, per exc-c14n 678rules. For each element a function is generated as follows 679 680 char* TPF_ENC_WO_NS_EEE(struct zx_ctx* c, 681 struct TPF_NS_EEE_s* x, 682 char* p); 683 684For example 685 686 char* zx_ENC_WO_se_Envelope(struct zx_ctx* c, 687 struct zx_se_Envelope_s* x, 688 char* p); 689 690A helper function is also available 691 692 struct zx_str* zx_EASY_ENC_WO_se_Envelope(struct zx_ctx* c, 693 struct zx_se_Envelope_s* x); 694 69512.4 Signatures (XMLDSIG) 696------------------------- 697 69812.4.1 Signature Generation 699~~~~~~~~~~~~~~~~~~~~~~~~~~~ 700 701*** TBW 702 70312.4.2 Signature Validation 704~~~~~~~~~~~~~~~~~~~~~~~~~~~ 705 706For signature validation you need to walk the decoded data structure 707to locate the signature as well as the references and pass them to 708zxsig_validate(). The validation involves wire order exclusive 709canonical encoding of the referenced XML blobs, computation of SHA1 or 710MD5 checksums over them, and finally computation of SHA1 check sum 711over the <SignedInfo> element and validation of the actual 712<SignatureValue> against that. The validation involves public key 713decryption using the signer's certificate. 714 715A nasty problem in exclusive canonicalization is that the namespaces 716that are needed in the blob may actually appear in the containing XML 717structures, thus in order to know the correct meaning of a namespace 718prefix, we need to perform the +seen+ computation for all elements 719outside and above the blob of interest.<<footnote: This is yet another 720indication of how botched the XML namespace concept is. Or this could 721have been fixed in the exclusive canonicalization spec by not using 722namespace prefixes at all.>> 723 724To verify signature, you have to do certain amount of preparatory work 725to locate the signature and the data that was signed. Generally what 726should be signed will be evident from protocol specifications or from 727the security requirements of your application environment. Conversely, 728if there is a signature, but it does not reference the appropriate 729elements, its worthless and you might as well reject the document 730without even verifying the signature. 731 732*Example* 733 734 struct zxsig_ref refs[1]; 735 cf = zxid_new_conf("/var/zxid/"); 736 ent = zxid_get_ent_from_file(cf, "YV7HPtu3bfqW3I4W_DZr-_DKMP4."); 737 738 refs[0].ref = r->Envelope->Body->ArtifactResolve 739 ->Signature->SignedInfo->Reference; 740 refs[0].blob = (struct zx_elem_s*)r->Envelope->Body->ArtifactResolve; 741 res = zxsig_validate(cf->ctx, ent->sign_cert, 742 r->Envelope->Body->ArtifactResolve->Signature, 743 1, refs); 744 if (res == ZXSIG_OK) { 745 D("sig vfy ok %d", res); 746 } else { 747 ERR("sig vfy failed due to(%d)", res); 748 } 749 750This code illustrates 751 7521. You have to determine who signed and provide the entity 753 object that corresponds to the signer. Often you 754 would determine the entity from <Issuer> element somewhere 755 inside the message. 756 757 The entity is used for retrieving the signing certificate. 758 Another alternative is that the signature itself contains 759 a <KeyInfo> element and you extract the certificate from 760 there. You would still need to have a way to know if you 761 trust the certificate. 762 7632. You have to prepare the refs array. It contains pairs of 764 <SignedInfo><Reference> specifications combined with the 765 actual elements that are signed. Generally the URI 766 XML attribute of the <Reference> element points to the 767 data that was signed. However, it is application dependent 768 what type of ID XML attribute the URI actually references 769 or the URI could even reference something outside the 770 document. It would be way too unreliable for the 771 zxsig_validate() to attempt guessing how to locate the 772 signed data: therefore we push the responsibility to 773 you. Your code will have to walk the data to locate 774 all referenced bits and pieces. 775 776 In the above example, locating the one signed bit was 777 very easy: the specification says where it is (and this 778 location is fixed so there really is no need to check 779 the URI either). 780 781 You pass the length of the refs array and the array 782 itself as two last arguments to zxsig_validate(). 783 7843. You need to locate the <Signature> element in the document 785 and pass it as argument to zxsig_validate(). Usually 786 a protocol specification will say where the <Signature> 787 element is to be found, so locating it is not difficult. 788 7894. The return value will indicate validation status. ZXSIG_OK, 790 which has numerical value of 0, indicates success. Other 791 nonzero values indicate various kinds of failure. 792 79312.4.3 Certificate Validation and Trust Model 794~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 795 796Trust models for TLS and signature validation are separate. TLS layer 797is handled mainly by libcurl or in case of ClientTLS, by the https web 798server (which is not part of zxid). 799 800In signature validation the primary trust mechanism is that entity's 801metadata specifies the signing certificate and there is no 802Certification Authority check at all.<<footnote: If you develop CA 803check, please submit patches to ZXID project.>> 804This model works well if you control the admission 805to your CoT. However, ZXID ships by default with the 806automatic CoT feature turned on, thus anyone can get 807added to the CoT and therefore signature with any 808certificate they declare is "valid". This hardly 809is acceptable for anything involving money. 810 81112.5 Data Accessor Functions 812---------------------------- 813 814Simple read access to data should, in C, be done by 815simply referencing the fields of the struct, e.g. 816 817 if (!r->EntitiesDescriptor->EntityDescriptor) 818 goto bad_md; 819 820*** TBW 821 82212.6 Memory Allocation and Free 823------------------------------- 824 825*** TBW 826 82712.7 Walking the data structure 828------------------------------- 829 830*** TBW 831 83212.9 Thread Safety 833------------------ 834 835All generated libraries are designed to be thread safe, provided 836that the underlying libc APIs, such as malloc(3) are thread safe. 837 838 83915 Creating New Interfaces Using ZXID Methodology 840================================================= 841 842The ZXID code generation methodology can be used to create 843interfaces to any XML document or protocol that can be 844described as a Schema Grammar (which includes any document 845that can be expressed as XML Schema - XSD). The general 846steps are 847 8481. Convert .xsd file to .sg, or write the .sg directly. For conversion, 849 you would typically use a command like 850 851 ~/pd/xsd2sg.pl <foo.xsd >foo.sg 852 8532. Tweak and rationalize the resulting .sg file. In ideal world 854 any construct expressible as .xsd should be nicely representable, 855 but in practise some work better than others, thus you can create 856 a much nicer interface if you invest in some manual tweaking. 857 858 Note that the tweaked .sg still is able to represent the 859 same document as the original .xsd described, though 860 often the tweaking causes some relaxation. 861 862 Most common tweaks 863 864 a. If the .xsd is written so that the targeted namespace is 865 also the default namespace, you should introduce 866 a namespace prefix because this is needed during 867 code generation to keep different C identifiers 868 from clashing with each other. Ideally you 869 should coordinate the namespace prefixes globally 870 so that even two different projects will not clash. 871 872 b. Where the choice construct is used, indicated 873 by pipey symbol (|) in the .sg file, you 874 should refactor these into sequences of 875 zero-or-one occurrence (?) instances of the alternatives 876 of the choice. This is needed because for the foreseeable 877 future xsd2sg.pl has a limitation in code generation 878 feature. If the choice has maxOccurs="unbounded" 879 you should use (*) instead. 880 881 c. xml:lang and other similar attributes may need to 882 be factored open to be just of type %xs:string. This 883 is a bug in xsd2sg.pl 884 8853. "Connect" the schema to bigger framework. Usually this 886 means adding your schema grammar to the ZX_SG variable 887 in zxid/Makefile and supplying additional -r flags 888 in ZX_ROOT variable. This allows your new schema to 889 be visible at top level. 890 891 If your schema is meant to extend leafs or interior nodes of 892 the parse tree, such as SOAP Body, you would edit 893 the SOAP schema to accept your 894 new protocol elements in the Body. Or that the generic SOAP 895 header can accept your specific header schemata, or that 896 the SAML attribute definitions accept your kind of 897 attributes - whatever makes sense in your context. 898 899 Alternative to this is to create an entirely new 900 monolithic encoder decoder, i.e. instead of extending 901 the existing ZXID project to accommodate your new 902 protocol, you just start a new project that uses the 903 same methodology. You should see how the SAML protocol 904 part is separated from the SAML metadata parsing and 905 from the WSF parsing in the existing project. 906 90717 Code Generation Tools 908======================== 909 910Main work horse of code generation is xsd2sg.pl, which serves multiple 911purposes 912 9131. Build hashes of all declarations in .sg input. Each hash element consists 914 of array of elements and attributes, as well as groups and attribute groups. 915 The type of array element sis determined from prefix, per .sg rules. 9162. Expand groups and attribute groups 9173. Evaluate each element wrt its type and generate 918 a. C data structures 919 b. Decoder grammar 920 c. Token descriptions for perfect hash and lexical analyzer 921 d. Encoder C code 922 923The code to build hashes is interwoven in the code that generates .xsd 924from .sg. The rest of the generation happens in a function called 925generate(). 926 927Typical command line (to generate SAML 2.0 protocol engine) 928 929 ~/plaindoc/xsd2sg.pl -d -gen saml2 -p zx_ \ 930 -r saml:Assertion -r se:Envelope \ 931 -S \ 932 sg/saml-schema-assertion-2.0.sg \ 933 sg/saml-schema-protocol-2.0.sg \ 934 sg/xmldsig-core.sg \ 935 sg/xenc-schema.sg \ 936 sg/soap11.sg \ 937 >/dev/null 938 939<<ignore: ~/plaindoc/xsd2sg.pl -d -gen saml2 -p zx_ -r saml:Assertion -r se:Envelope -S sg/saml-schema-assertion-2.0.sg sg/saml-schema-protocol-2.0.sg sg/xmldsig-core.sg sg/xenc-schema.sg sg/soap11.sg >/dev/null >> 940 941To generate SAML 2.0 Metadata engine you would issue 942 943 ~/plaindoc/xsd2sg.pl -d -gen saml2md -p zx_ \ 944 -r md:EntityDescriptor -r md:EntitiesDescriptor \ 945 -S \ 946 sg/saml-schema-assertion-2.0.sg \ 947 sg/saml-schema-metadata-2.0.sg \ 948 sg/xmldsig-core.sg \ 949 sg/xenc-schema.sg \ 950 >/dev/null 951 952<<ignore: ~/plaindoc/xsd2sg.pl -d -gen saml2md -p zx_ -r md:EntityDescriptor -r md:EntitiesDescriptor -S sg/saml-schema-assertion-2.0.sg sg/saml-schema-metadata-2.0.sg sg/xmldsig-core.sg sg/xenc-schema.sg >/dev/null >> 953 95417.1 Special Support for Specific Programming Languages 955------------------------------------------------------- 956 957While C code generation is the main output, and this can always be 958converted to other languages using SWIG, sometimes a more natural 959language interface can be built by directly generating it. 960 961We plan to enhance the code generation to do something like this. At 962least direct hash-of-hashes-of-arrays-of-hashes type data-structure 963generation for benefit of some scripting languages is planned. 964 965<<if: ZXIDBOOK>> 966<<else: >> 967 96818 ZXID SP 969========== 970 971*** warning: not checked lately, may be wrong! 972 973<<table: ZXID SP URLs 974URL Description 975============ ======================================================= 976/zxid Same as o=M. Main convenience entry point 977/zxid?o=M SSO with CDC; or management if already logged in 978/zxid?o=C Common Domain Cookie (CDC) reader, usually under common domain host name. 979/zxid?o=E SSO after CDC read; or management if already logged in. 980/zxid?o=P HTTP POST end point. Used for forms and last part of POST profile SSO. 981/zxid?o=Q HTTP binding (POST or redirect) request end point (e.g. SLO, MNI). 982/zxid?o=S SOAP end point (HTTP POST) 983/zxid?o=B Get SP metadata (or combined SP and IdP metadata if proxying). 984>> 985 98696 License 987========== 988 989Copyright (c) 2006-2009 Symlabs (symlabs@symlabs.com), All Rights Reserved. 990Author: Sampo Kellom�ki (sampo@iki.fi) 991 992Licensed under the Apache License, Version 2.0 (the "License"); 993you may not use this file except in compliance with the License. 994You may obtain a copy of the License at 995http://www.apache.org/licenses/LICENSE-2.0 996 997Unless required by applicable law or agreed to in writing, software 998distributed under the License is distributed on an "AS IS" BASIS, 999WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 1000See the License for the specific language governing permissions and 1001limitations under the License. 1002 100396.2 Specification IPR 1004---------------------- 1005 1006ZXID is based on open SAML and Liberty specifications. The parties 1007that have developed these specifications, including Symlabs, have made 1008Royalty Free (RF) licensing commitment. Please ask OASIS and Liberty 1009Alliance for the specifics of their IPR policies and IPR disclosures. 1010 1011Some protocols, such as WS-Trust and WS-Federation enjoy Microsoft's 1012pledge<<footnote: If you have a reference to where this pledge can be 1013found, please let me know so it can be included here.>> that they will 1014not sue you even if you implement these specifications. You should 1015evaluate yourself whether this is good enough for your situation. 1016 1017<<zxid-ref.pd>> 1018 1019<<doc-end.pd>> 1020<<notapath: TCP/IP a.k.a xBSD/Unix n/a Perl/mod_perl PHP/mod_php Java/Tomcat>> 1021<<EOF: >> 1022<<fi: >>