1These notes attempt to explain how to use the ASN.1 infrastructure to 2add new ASN.1 types. ASN.1 is complicated and easy to get wrong, so 3it is best to verify your results against another tool (such as asn1c) 4if at all possible. These notes are up to date as of 2012-02-13. 5 6If you are trying to debug a problem that shows up in the ASN.1 7encoder or decoder, skip to the last section. 8 9 10General 11------- 12 13For the moment, a developer must hand-translate the ASN.1 module into 14macro invocations that generate data structures used by the encoder 15and decoder. Ideally we would have a tool to compile an ASN.1 module 16(and probably some additional information about C identifier mappings) 17and generate the macro invocations. 18 19Currently the ASN.1 infrastructure is not visible to applications or 20plugins. For plugin modules shipped as part of the krb5 tree, the 21types can be added to asn1_k_encode.c and exported from libkrb5. 22Plugin modules built separately from the krb5 tree must use another 23tool (such as asn1c) for now if they need to do ASN.1 encoding or 24decoding. 25 26 27Tags 28---- 29 30Before you start writing macro invocations, it is important to 31understand a little bit about ASN.1 tags. You will most commonly see 32tag notation in a sequence definition, like: 33 34 TypeName ::= SEQUENCE { 35 field-name [0] IMPLICIT OCTET STRING OPTIONAL 36 } 37 38Contrary to intuition, the tag notation "[0] IMPLICIT" is not a 39property of the sequence field; instead, it specifies a type that 40wraps the type to the right (OCTET STRING). The right way to think 41about the above definition is: 42 43 TypeName is defined as a sequence type 44 which has an optional field named field-name 45 whose type is a tagged type 46 the tag's class is context-specific (by default) 47 the tag's number is 0 48 it is an implicit tag 49 the tagged type wraps OCTET STRING 50 51The other case you are likely to see tag notation is something like: 52 53 AS-REQ ::= [APPLICATION 10] KDC-REQ 54 55This example defines AS-REQ to be a tagged type whose class is 56application, whose tag number is 10, and whose base type is KDC-REQ. 57The tag may be implicit or explicit depending on the module's tag 58environment, which we will get to in a moment. 59 60Tags can have one of four classes: universal, application, private, 61and context-specific. Universal tags are used for built-in ASN.1 62types. Application and context-specific tags are the most common to 63see in ASN.1 modules; private is rarely used. If no tag class is 64specified, the default is context-specific. 65 66Tags can be explicit or implicit, and the distinction is important to 67the wire encoding. If a tag's closing bracket is followed by the word 68IMPLICIT or EXPLICIT, then it is clear which kind of tag it is, but 69usually there will be no such annotation. If not, the default depends 70on the header of the ASN.1 module. Look at the top of the module for 71the word DEFINITIONS. It may be followed by one of three phrases: 72 73* EXPLICIT TAGS -- in this case, tags default to explicit 74* IMPLICIT TAGS -- in this case, tags default to implicit (usually) 75* AUTOMATIC TAGS -- tags default to implicit (usually) and are also 76 automatically added to sequence fields (usually) 77 78If none of those phrases appear, the default is explicit tags. 79 80Even if a module defaults to implicit tags, a tag defaults to explicit 81if its base type is a choice type or ANY type (or the information 82object equivalent of an ANY type). 83 84If the module's default is AUTOMATIC TAGS, sequence and set fields 85should have ascending context-specific tags wrapped around the field 86types, starting from 0, unless one of the fields of the sequence or 87set is already a tagged type. See ITU X.680 section 24.2 for details, 88particularly if COMPONENTS OF is used in the sequence definition. 89 90 91Basic types 92----------- 93 94In our infrastructure, a type descriptor specifies a mapping between 95an ASN.1 type and a C type. The first step is to ensure that type 96descriptors are defined for the basic types used by your ASN.1 module, 97as mapped to the C types used in your structures, in asn1_k_encode.c. 98If not, you will need to create it. For a BOOLEAN or INTEGER ASN.1 99type, you will use one of these macros: 100 101 DEFBOOLTYPE(descname, ctype) 102 DEFINTTYPE(descname, ctype) 103 DEFUINTTYPE(descname, ctype) 104 105where "descname" is an identifier you make up and "ctype" is the 106integer type of the C object you want to map the ASN.1 value to. For 107integers, use DEFINTTYPE if the C type is a signed integer type and 108DEFUINTTYPE if it is an unsigned type. (For booleans, the distinction 109is unimportant since all integer types can hold the values 0 and 1.) 110We don't generally define integer mappings for every typedef name of 111an integer type. For example, we use the type descriptor int32, which 112maps an ASN.1 INTEGER to an int32_t, for krb5_enctype values. 113 114String types are a little more complicated. Our practice is to store 115strings in a krb5_data structure (rather than a zero-terminated C 116string), so our infrastructure currently assumes that all strings are 117represented as "counted types", meaning the C representation is a 118combination of a pointer and an integer type. So, first you must 119declare a counted type descriptor (we will describe those in more 120detail later) with something like: 121 122 DEFCOUNTEDSTRINGTYPE(generalstring, char *, unsigned int, 123 k5_asn1_encode_bytestring, k5_asn1_decode_bytestring, 124 ASN1_GENERALSTRING); 125 126The first parameter is an identifier you make up. The second and 127third parameters are the C types of the pointer and integer holding 128the string; for a krb5_data object, those should be the types in the 129example. The pointer type must be char * or uint8_t *. The fourth 130and fifth parameters reference primitive encoder and decoder 131functions; these should almost always be the ones in the example, 132unless the ASN.1 type is BIT STRING. The sixth parameter is the 133universal tag number of the ASN.1 type, as defined in krbasn1.h. 134 135Once you have defined the counted type, you can define a normal type 136descriptor to wrap it in a krb5_data structure with something like: 137 138 DEFCOUNTEDTYPE(gstring_data, krb5_data, data, length, generalstring); 139 140 141Sequences 142--------- 143 144In our infrastructure, we model ASN.1 sequences using an array of 145normal type descriptors. Each type descriptor is applied in turn to 146the C object to generate (or consume) an encoding of an ASN.1 value. 147 148Of course, each value needs to be stored in a different place within 149the C object, or they would just overwrite each other. To address 150this, you must create an offset type wrapper for each sequence field: 151 152 DEFOFFSETTYPE(descname, structuretype, fieldname, basedesc) 153 154where "descname" is an identifier you make up, "structuretype" and 155"fieldtype" are used to compute the offset and type-check the 156structure field, and "basedesc" is the type of the ASN.1 object to be 157stored at that offset. 158 159If your C structure contains a pointer to another C object, you will 160need to first define a pointer wrapper, which is very simple: 161 162 DEFPTRTYPE(descname, basedesc) 163 164Then wrap the defined pointer type in an offset type as described 165above. Once a pointer descriptor is defined for a base descriptor, it 166can be reused many times, so pointer descriptors are usually defined 167right after the types they wrap. When decoding, pointer wrappers 168cause a pointer to be allocated with a block of memory equal to the 169size of the C type corresponding to the base type. (For offset types, 170the corresponding C type is the structure type inside which the offset 171is computed.) It is okay for several fields of a sequence to 172reference the same pointer field within a structure, as long as the 173pointer types all wrap base types with the same corresponding C type. 174 175If the sequence field has a context tag attached to its type, you will 176also need to create a tag wrapper for it: 177 178 DEFCTAGGEDTYPE(descname, tagnum, basedesc) 179 DEFCTAGGEDTYPE_IMPLICIT(descname, tagnum, basedesc) 180 181Use the first macro for explicit context tags and the second for 182implicit context tags. "tagnum" is the number of the context-specific 183tag, and "basedesc" is the name you chose for the offset type above. 184 185You don't actually need to separately write out DEFOFFSETTYPE and 186DEFCTAGGEDTYPE for each field. The combination of offset and context 187tag is so common that we have a macro to combine them: 188 189 DEFFIELD(descname, structuretype, fieldname, tagnum, basedesc) 190 DEFFIELD_IMPLICIT(descname, structuretype, fieldname, tagnum, basedesc) 191 192Once you have defined tag and offset wrappers for each sequence field, 193combine them together in an array and use the DEFSEQTYPE macro to 194define the sequence type descriptor: 195 196 static const struct atype_info *my_sequence_fields[] = { 197 &k5_atype_my_sequence_0, &k5_atype_my_sequence_1, 198 }; 199 DEFSEQTYPE(my_sequence, structuretype, my_sequence_fields) 200 201Each field name must by prefixed by "&k5_atype_" to get a pointer to 202the actual variable used to hold the type descriptor. 203 204ASN.1 sequence types may or may not be defined to be extensible, and 205may group extensions together in blocks which must appear together. 206Our model does not distinguish these cases. Our decoder treats all 207sequence types as extensible. Extension blocks must be modeled by 208making all of the extension fields optional, and the decoder will not 209enforce that they appear together. 210 211If your ASN.1 sequence contains optional fields, keep reading. 212 213 214Optional sequence fields 215------------------------ 216 217ASN.1 sequence fields can be annotated with OPTIONAL or, less 218commonly, with DEFAULT VALUE. (Be aware that if DEFAULT VALUE is 219specified for a sequence field, DER mandates that fields with that 220value not be encoded within the sequence. Most standards in the 221Kerberos ecosystem avoid the use of DEFAULT VALUE for this reason.) 222Although optionality is a property of sequence or set fields, not 223types, we still model optional sequence fields using type wrappers. 224Optional type wrappers must only be used as members of a sequence, 225although they can be nested in offset or pointer wrappers first. 226 227The simplest way to represent an optional value in a C structure is 228with a pointer which takes the value NULL if the field is not present. 229In this case, you can just use DEFOPTIONALZEROTYPE to wrap the pointer 230type: 231 232 DEFPTRTYPE(ptr_basetype, basetype); 233 DEFOPTIONALZEROTYPE(opt_ptr_basetype, ptr_basetype); 234 235and then use opt_ptr_basetype in the DEFFIELD invocation for the 236sequence field. DEFOPTIONALZEROTYPE can also be used for integer 237types, if it is okay for the value 0 to represent that the 238corresponding ASN.1 value is omitted. Optional-zero wrappers, like 239pointer wrappers, are usually defined just after the types they wrap. 240 241For null-terminated sequences, you can use a wrapper like this: 242 243 DEFOPTIONALEMPTYTYPE(opt_seqof_basetype, seqof_basetype) 244 245to omit the sequence if it is either NULL or of zero length. 246 247A more general way to wrap optional types is: 248 249 DEFOPTIONALTYPE(descname, predicatefn, initfn, basedesc); 250 251where "predicatefn" has the signature "int (*fn)(const void *p)" and 252is used by the encoder to test whether the ASN.1 value is present in 253the C object. "initfn" has the signature "void (*fn)(void *p)" and is 254used by the decoder to initialize the C object field if the 255corresponding ASN.1 value is omitted in the wire encoding. "initfn" 256can be NULL, in which case the C object will simply be left alone. 257All C objects are initialized to zero-filled memory when they are 258allocated by the decoder. 259 260An optional string type, represented in a krb5_data structure, can be 261wrapped using the nonempty_data function already defined in 262asn1_k_encode.c, like so: 263 264 DEFOPTIONALTYPE(opt_ostring_data, nonempty_data, NULL, ostring_data); 265 266 267Sequence-of types 268----------------- 269 270ASN.1 sequence-of types can be represented as C types in two ways. 271The simplest is to use an array of pointers terminated in a null 272pointer. A descriptor for a sequence-of represented this way is 273defined in three steps: 274 275 DEFPTRTYPE(ptr_basetype, basetype); 276 DEFNULLTERMSEQOFTYPE(seqof_basetype, ptr_basetype); 277 DEFPTRTYPE(ptr_seqof_basetype, seqof_basetype); 278 279If the C type corresponding to basetype is "ctype", then the C type 280corresponding to ptr_seqof_basetype will be "ctype **". The middle 281type sort of corresponds to "ctype *", but not exactly, as it 282describes an object of variable size. 283 284You can also use DEFNONEMPTYNULLTERMSEQOFTYPE in the second step. In 285this case, the encoder will throw an error if the sequence is empty. 286For historical reasons, the decoder will *not* throw an error if the 287sequence is empty, so the calling code must check before assuming a 288first element is present. 289 290The other way of representing sequences is through a combination of 291pointer and count. This pattern is most often used for compactness 292when the base type is an integer type. A descriptor for a sequence-of 293represented this way is defined using a counted type descriptor: 294 295 DEFCOUNTEDSEQOFTYPE(descname, lentype, basedesc) 296 297where "lentype" is the C type of the length and "basedesc" is a 298pointer wrapper for the sequence element type (*not* the element type 299itself). For example, an array of 32-bit signed integers is defined 300as: 301 302 DEFINTTYPE(int32, int32_t); 303 DEFPTRTYPE(int32_ptr, int32); 304 DEFCOUNTEDSEQOFTYPE(cseqof_int32, int32_t, int32_ptr); 305 306To use a counted sequence-of type in a sequence, use DEFCOUNTEDTYPE: 307 308 DEFCOUNTEDTYPE(descname, structuretype, ptrfield, lenfield, cdesc) 309 310where "structuretype", "ptrfield", and "lenfield" are used to compute 311the field offsets and type-check the structure fields, and "cdesc" is 312the name of the counted type descriptor. 313 314The combination of DEFCOUNTEDTYPE and DEFCTAGGEDTYPE can be 315abbreviated using DEFCNFIELD: 316 317 DEFCNFIELD(descname, structuretype, ptrfield, lenfield, tagnum, cdesc) 318 319 320Tag wrappers 321------------ 322 323We've previously covered DEFCTAGGEDTYPE and DEFCTAGGEDTYPE_IMPLICIT, 324which are used to define context-specific tag wrappers. There are 325two other macros for creating tag wrappers. The first is: 326 327 DEFAPPTAGGEDTYPE(descname, tagnum, basedesc) 328 329Use this macro to model an "[APPLICATION tagnum]" tag wrapper in an 330ASN.1 module. 331 332There is also a general tag wrapper macro: 333 334 DEFTAGGEDTYPE(descname, class, construction, tag, implicit, basedesc) 335 336where "class" is one of UNIVERSAL, APPLICATION, CONTEXT_SPECIFIC, or 337PRIVATE, "construction" is one of PRIMITIVE or CONSTRUCTED, "tag" is 338the tag number, "implicit" is 1 for an implicit tag and 0 for an 339explicit tag, and "basedesc" is the wrapped type. Note that that 340primitive vs. constructed is not a concept within the abstract ASN.1 341type model, but is instead a concept used in DER. In general, all 342explicit tags should be constructed (but see the section on "Dirty 343tricks" below). The construction parameter is ignored for implicit 344tags. 345 346 347Choice types 348------------ 349 350ASN.1 CHOICE types are represented in C using a signed integer 351distinguisher and a union. Modeling a choice type happens in three 352steps: 353 3541. Define type descriptors for each alternative of the choice, 355typically using DEFCTAGGEDTYPE to create a tag wrapper for an existing 356type. There is no need to create offset type wrappers, as union 357fields always have an offset of 0. For example: 358 359 DEFCTAGGEDTYPE(my_choice_0, 0, firstbasedesc); 360 DEFCTAGGEDTYPE(my_choice_1, 1, secondbasedesc); 361 3622. Assemble them into an array, similar to how you would for a 363sequence, and use DEFCHOICETYPE to create a counted type descriptor: 364 365 static const struct atype_info *my_choice_alternatives[] = { 366 &k5_atype_my_choice_0, &k5_atype_my_choice_1 367 }; 368 DEFCHOICETYPE(my_choice, union my_choice_choices, enum my_choice_selector, 369 my_choice_alternatives); 370 371The second and third parameters to DEFCHOICETYPE are the C types of 372the union and distinguisher fields. 373 3743. Wrap the counted type descriptor in a type descriptor for the 375structure containing the distinguisher and union: 376 377 DEFCOUNTEDTYPE_SIGNED(descname, structuretype, u, choice, my_choice); 378 379The third and fourth parameters to DEFCOUNTEDTYPE_SIGNED are the field 380names of the union and distinguisher fields within structuretype. 381 382ASN.1 choice types may be defined to be extensible, or may not be. 383Our model does not distinguish between the two cases. Our decoder 384treats all choice types as extensible. 385 386Our encoder will throw an error if the distinguisher is not within the 387range of valid offsets of the alternatives array. Our decoder will 388set the distinguisher to -1 if the tag of the ASN.1 value is not 389matched by any of the alternatives, and will leave the union 390zero-filled in that case. 391 392 393Counted type descriptors 394------------------------ 395 396Several times in earlier sections we've referred to the notion of 397"counted type descriptors" without defining what they are. Counted 398type descriptors live in a separate namespace from normal type 399descriptors, and specify a mapping between an ASN.1 type and two C 400objects, one of them having integer type. There are four kinds of 401counted type descriptors, defined using the following macros: 402 403 DEFCOUNTEDSTRINGTYPE(descname, ptrtype, lentype, encfn, decfn, tagnum) 404 DEFCOUNTEDDERTYPE(descname, ptrtype, lentype) 405 DEFCOUNTEDSEQOFTYPE(descname, lentype, baseptrdesc) 406 DEFCHOICETYPE(descname, uniontype, distinguishertype, fields) 407 408DEFCOUNTEDDERTYPE is described in the "Dirty tricks" section below. 409The other three kinds of counted types have been covered previously. 410 411Counted types are always used by wrapping them in a normal type 412descriptor with one of these macros: 413 414 DEFCOUNTEDTYPE(descname, structuretype, datafield, countfield, cdesc) 415 DEFCOUNTEDTYPE_SIGNED(descname, structuretype, datafield, countfield, cdesc) 416 417These macros are similar in concept to an offset type, only with two 418offsets. Use DEFCOUNTEDTYPE if the count field is unsigned, 419DEFCOUNTEDTYPE_SIGNED if it is signed. 420 421 422Defining encoder and decoder functions 423-------------------------------------- 424 425After you have created a type descriptor for your types, you need to 426create encoder or decoder functions for the ones you want calling code 427to be able to process. Do this with one of the following macros: 428 429 MAKE_ENCODER(funcname, desc) 430 MAKE_DECODER(funcname, desc) 431 MAKE_CODEC(typename, desc) 432 433MAKE_ENCODER and MAKE_DECODER allow you to choose function names. 434MAKE_CODEC defines encoder and decoder functions with the names 435"encode_typename" and "decode_typename". 436 437If you are defining functions for a null-terminated sequence, use the 438descriptor created with DEFNULLTERMSEQOFTYPE or 439DEFNONEMPTYNULLTERMSEQOFTYPE, rather than the pointer to it. This is 440because encoder and decoder functions implicitly traffic in pointers 441to the C object being encoded or decoded. 442 443Encoder and decoder functions must be prototyped separately, either in 444k5-int.h or in a subsidiary included by it. Encoder functions have 445the prototype: 446 447 krb5_error_code encode_typename(const ctype *rep, krb5_data **code_out); 448 449where "ctype" is the C type corresponding to desc. Decoder functions 450have the prototype: 451 452 krb5_error_code decode_typename(const krb5_data *code, ctype **rep_out); 453 454Decoder functions allocate a container for the C type of the object 455being decoded and return a pointer to it in *rep_out. 456 457 458Writing test cases 459------------------ 460 461New ASN.1 types in libkrb5 will typically only be accepted with test 462cases. Our current test framework lives in src/tests/asn.1. Adding 463new types to this framework involves the following steps: 464 4651. Define an initializer for a sample value of the type in ktest.c, 466named ktest_make_sample_typename(). Also define a contents-destructor 467for it, named ktest_empty_typename(). Prototype these functions in 468ktest.h. 469 4702. Define an equality test for the type in ktest_equal.c. Prototype 471this in ktest_equal.h. (This step is not necessary if the type has no 472decoder.) 473 4743. Add a test case to krb5_encode_test.c, following the examples of 475existing test cases there. Update reference_encode.out and 476trval_reference.out to contain the output generated by your test case. 477 4784. Add a test case to krb5_decode_test.c, following the examples of 479existing test cases there, and using the output generated by your 480encode test. 481 4825. Add a test case to krb5_decode_leak.c, following the examples of 483existing test cases there. 484 485Following these steps will not ensure the correctness of your 486translation of the ASN.1 module to macro invocations; it only lets us 487detect unintentional changes to the encodings after they are defined. 488To ensure that your translations are correct, you should extend 489tests/asn.1/make-vectors.c and use "make test-vectors" to create 490vectors using asn1c. 491 492 493Dirty tricks 494------------ 495 496In rare cases you may want to represent the raw DER encoding of a 497value in the C structure. If so, you can use DEFCOUNTEDDERTYPE (or 498more likely, the existing der_data type descriptor). The encoder and 499decoder will throw errors if the wire encoding doesn't have a valid 500outermost tag, so be sure to use valid DER encodings in your test 501cases (see ktest_make_sample_algorithm_identifier for an example). 502 503Conversely, the ASN.1 module may define an OCTET STRING wrapper around 504a DER encoding which you want to represent as the decoded value. (The 505existing example of this is in PKINIT hash agility, where the 506PartyUInfo and PartyVInfo fields of OtherInfo are defined as octet 507strings which contain the DER encodings of KRB5PrincipalName values.) 508In this case you can use a DEFTAGGEDTYPE wrapper like so: 509 510 DEFTAGGEDTYPE(descname, UNIVERSAL, PRIMITIVE, ASN1_OCTETSTRING, 0, 511 basedesc) 512 513 514Limitations 515----------- 516 517We cannot currently encode or decode SET or SET OF types. 518 519We cannot model self-referential types (like "MATHSET ::= SET OF 520MATHSET"). 521 522If a sequence uses an optional field that is a choice field (without 523a context tag wrapper), or an optional field that uses a stored DER 524encoding (again, without a context tag wrapper), our decoder may 525assign a value to the choice or stored-DER field when the correct 526behavior is to skip that field and assign the value to a subsequent 527field. It should be very rare for ASN.1 modules to use choice or open 528types this way. 529 530 531Debugging 532--------- 533 534If you are looking at a stack trace with a bunch of ASN.1 encoder or 535decoder calls at the top, here are some notes that might help with 536debugging: 537 5381. You may have noticed that the entry point into the encoder is 539defined by a macro like MAKE_CODEC. Don't worry about this; those 540macros just define thin wrappers around k5_asn1_full_encode and 541k5_asn1_full_decode. If you are stepping through code and hit a 542wrapper function, just enter "step" to get into the actual encoder or 543decoder function. 544 5452. If you are in the encoder, look for stack frames in 546encode_sequence(), and print the value of i within those stack frames. 547You should be able to subtract 1 from those values and match them up 548with the sequence field offsets in asn1_k_encode.c for the type being 549encoded. For example, if an as-req is being encoded and the i values 550(starting with the one closest to encode_krb5_as_req) are 4, 2, and 2, 551you could match those up as following: 552 553* as_req_encode wraps untagged_as_req, whose field at offset 3 is the 554 descriptor for kdc_req_4, which wraps kdc_req_body. 555 556* kdc_req_body is a function wrapper around kdc_req_hack, whose field 557 at offset 1 is the descriptor for req_body_1, which wraps 558 opt_principal. 559 560* opt_principal wraps principal, which wraps principal_data, whose 561 field at offset 1 is the descriptor for princname_1. 562 563* princname_1 is a sequence of general strings represented in the data 564 and length fields of the krb5_principal_data structure. 565 566So the problem would likely be in the data components of the client 567principal in the kdc_req structure. 568 5693. If you are in the decoder, look for stacks frames in 570decode_sequence(), and again print the values of i. You can match 571these up just as above, except without subtracting 1 from the i 572values. 573