1These notes attempt to explain how to use the ASN.1 infrastructure to
2add new ASN.1 types.  ASN.1 is complicated and easy to get wrong, so
3it is best to verify your results against another tool (such as asn1c)
4if at all possible.  These notes are up to date as of 2012-02-13.
5
6If you are trying to debug a problem that shows up in the ASN.1
7encoder or decoder, skip to the last section.
8
9
10General
11-------
12
13For the moment, a developer must hand-translate the ASN.1 module into
14macro invocations that generate data structures used by the encoder
15and decoder.  Ideally we would have a tool to compile an ASN.1 module
16(and probably some additional information about C identifier mappings)
17and generate the macro invocations.
18
19Currently the ASN.1 infrastructure is not visible to applications or
20plugins.  For plugin modules shipped as part of the krb5 tree, the
21types can be added to asn1_k_encode.c and exported from libkrb5.
22Plugin modules built separately from the krb5 tree must use another
23tool (such as asn1c) for now if they need to do ASN.1 encoding or
24decoding.
25
26
27Tags
28----
29
30Before you start writing macro invocations, it is important to
31understand a little bit about ASN.1 tags.  You will most commonly see
32tag notation in a sequence definition, like:
33
34  TypeName ::= SEQUENCE {
35    field-name [0] IMPLICIT OCTET STRING OPTIONAL
36  }
37
38Contrary to intuition, the tag notation "[0] IMPLICIT" is not a
39property of the sequence field; instead, it specifies a type that
40wraps the type to the right (OCTET STRING).  The right way to think
41about the above definition is:
42
43  TypeName is defined as a sequence type
44    which has an optional field named field-name
45      whose type is a tagged type
46        the tag's class is context-specific (by default)
47        the tag's number is 0
48        it is an implicit tag
49        the tagged type wraps OCTET STRING
50
51The other case you are likely to see tag notation is something like:
52
53  AS-REQ ::= [APPLICATION 10] KDC-REQ
54
55This example defines AS-REQ to be a tagged type whose class is
56application, whose tag number is 10, and whose base type is KDC-REQ.
57The tag may be implicit or explicit depending on the module's tag
58environment, which we will get to in a moment.
59
60Tags can have one of four classes: universal, application, private,
61and context-specific.  Universal tags are used for built-in ASN.1
62types.  Application and context-specific tags are the most common to
63see in ASN.1 modules; private is rarely used.  If no tag class is
64specified, the default is context-specific.
65
66Tags can be explicit or implicit, and the distinction is important to
67the wire encoding.  If a tag's closing bracket is followed by the word
68IMPLICIT or EXPLICIT, then it is clear which kind of tag it is, but
69usually there will be no such annotation.  If not, the default depends
70on the header of the ASN.1 module.  Look at the top of the module for
71the word DEFINITIONS.  It may be followed by one of three phrases:
72
73* EXPLICIT TAGS -- in this case, tags default to explicit
74* IMPLICIT TAGS -- in this case, tags default to implicit (usually)
75* AUTOMATIC TAGS -- tags default to implicit (usually) and are also
76  automatically added to sequence fields (usually)
77
78If none of those phrases appear, the default is explicit tags.
79
80Even if a module defaults to implicit tags, a tag defaults to explicit
81if its base type is a choice type or ANY type (or the information
82object equivalent of an ANY type).
83
84If the module's default is AUTOMATIC TAGS, sequence and set fields
85should have ascending context-specific tags wrapped around the field
86types, starting from 0, unless one of the fields of the sequence or
87set is already a tagged type.  See ITU X.680 section 24.2 for details,
88particularly if COMPONENTS OF is used in the sequence definition.
89
90
91Basic types
92-----------
93
94In our infrastructure, a type descriptor specifies a mapping between
95an ASN.1 type and a C type.  The first step is to ensure that type
96descriptors are defined for the basic types used by your ASN.1 module,
97as mapped to the C types used in your structures, in asn1_k_encode.c.
98If not, you will need to create it.  For a BOOLEAN or INTEGER ASN.1
99type, you will use one of these macros:
100
101  DEFBOOLTYPE(descname, ctype)
102  DEFINTTYPE(descname, ctype)
103  DEFUINTTYPE(descname, ctype)
104
105where "descname" is an identifier you make up and "ctype" is the
106integer type of the C object you want to map the ASN.1 value to.  For
107integers, use DEFINTTYPE if the C type is a signed integer type and
108DEFUINTTYPE if it is an unsigned type.  (For booleans, the distinction
109is unimportant since all integer types can hold the values 0 and 1.)
110We don't generally define integer mappings for every typedef name of
111an integer type.  For example, we use the type descriptor int32, which
112maps an ASN.1 INTEGER to an int32_t, for krb5_enctype values.
113
114String types are a little more complicated.  Our practice is to store
115strings in a krb5_data structure (rather than a zero-terminated C
116string), so our infrastructure currently assumes that all strings are
117represented as "counted types", meaning the C representation is a
118combination of a pointer and an integer type.  So, first you must
119declare a counted type descriptor (we will describe those in more
120detail later) with something like:
121
122  DEFCOUNTEDSTRINGTYPE(generalstring, char *, unsigned int,
123                       k5_asn1_encode_bytestring, k5_asn1_decode_bytestring,
124                       ASN1_GENERALSTRING);
125
126The first parameter is an identifier you make up.  The second and
127third parameters are the C types of the pointer and integer holding
128the string; for a krb5_data object, those should be the types in the
129example.  The pointer type must be char * or uint8_t *.  The fourth
130and fifth parameters reference primitive encoder and decoder
131functions; these should almost always be the ones in the example,
132unless the ASN.1 type is BIT STRING.  The sixth parameter is the
133universal tag number of the ASN.1 type, as defined in krbasn1.h.
134
135Once you have defined the counted type, you can define a normal type
136descriptor to wrap it in a krb5_data structure with something like:
137
138  DEFCOUNTEDTYPE(gstring_data, krb5_data, data, length, generalstring);
139
140
141Sequences
142---------
143
144In our infrastructure, we model ASN.1 sequences using an array of
145normal type descriptors.  Each type descriptor is applied in turn to
146the C object to generate (or consume) an encoding of an ASN.1 value.
147
148Of course, each value needs to be stored in a different place within
149the C object, or they would just overwrite each other.  To address
150this, you must create an offset type wrapper for each sequence field:
151
152  DEFOFFSETTYPE(descname, structuretype, fieldname, basedesc)
153
154where "descname" is an identifier you make up, "structuretype" and
155"fieldtype" are used to compute the offset and type-check the
156structure field, and "basedesc" is the type of the ASN.1 object to be
157stored at that offset.
158
159If your C structure contains a pointer to another C object, you will
160need to first define a pointer wrapper, which is very simple:
161
162  DEFPTRTYPE(descname, basedesc)
163
164Then wrap the defined pointer type in an offset type as described
165above.  Once a pointer descriptor is defined for a base descriptor, it
166can be reused many times, so pointer descriptors are usually defined
167right after the types they wrap.  When decoding, pointer wrappers
168cause a pointer to be allocated with a block of memory equal to the
169size of the C type corresponding to the base type.  (For offset types,
170the corresponding C type is the structure type inside which the offset
171is computed.)  It is okay for several fields of a sequence to
172reference the same pointer field within a structure, as long as the
173pointer types all wrap base types with the same corresponding C type.
174
175If the sequence field has a context tag attached to its type, you will
176also need to create a tag wrapper for it:
177
178  DEFCTAGGEDTYPE(descname, tagnum, basedesc)
179  DEFCTAGGEDTYPE_IMPLICIT(descname, tagnum, basedesc)
180
181Use the first macro for explicit context tags and the second for
182implicit context tags.  "tagnum" is the number of the context-specific
183tag, and "basedesc" is the name you chose for the offset type above.
184
185You don't actually need to separately write out DEFOFFSETTYPE and
186DEFCTAGGEDTYPE for each field.  The combination of offset and context
187tag is so common that we have a macro to combine them:
188
189  DEFFIELD(descname, structuretype, fieldname, tagnum, basedesc)
190  DEFFIELD_IMPLICIT(descname, structuretype, fieldname, tagnum, basedesc)
191
192Once you have defined tag and offset wrappers for each sequence field,
193combine them together in an array and use the DEFSEQTYPE macro to
194define the sequence type descriptor:
195
196  static const struct atype_info *my_sequence_fields[] = {
197      &k5_atype_my_sequence_0, &k5_atype_my_sequence_1,
198  };
199  DEFSEQTYPE(my_sequence, structuretype, my_sequence_fields)
200
201Each field name must by prefixed by "&k5_atype_" to get a pointer to
202the actual variable used to hold the type descriptor.
203
204ASN.1 sequence types may or may not be defined to be extensible, and
205may group extensions together in blocks which must appear together.
206Our model does not distinguish these cases.  Our decoder treats all
207sequence types as extensible.  Extension blocks must be modeled by
208making all of the extension fields optional, and the decoder will not
209enforce that they appear together.
210
211If your ASN.1 sequence contains optional fields, keep reading.
212
213
214Optional sequence fields
215------------------------
216
217ASN.1 sequence fields can be annotated with OPTIONAL or, less
218commonly, with DEFAULT VALUE.  (Be aware that if DEFAULT VALUE is
219specified for a sequence field, DER mandates that fields with that
220value not be encoded within the sequence.  Most standards in the
221Kerberos ecosystem avoid the use of DEFAULT VALUE for this reason.)
222Although optionality is a property of sequence or set fields, not
223types, we still model optional sequence fields using type wrappers.
224Optional type wrappers must only be used as members of a sequence,
225although they can be nested in offset or pointer wrappers first.
226
227The simplest way to represent an optional value in a C structure is
228with a pointer which takes the value NULL if the field is not present.
229In this case, you can just use DEFOPTIONALZEROTYPE to wrap the pointer
230type:
231
232  DEFPTRTYPE(ptr_basetype, basetype);
233  DEFOPTIONALZEROTYPE(opt_ptr_basetype, ptr_basetype);
234
235and then use opt_ptr_basetype in the DEFFIELD invocation for the
236sequence field.  DEFOPTIONALZEROTYPE can also be used for integer
237types, if it is okay for the value 0 to represent that the
238corresponding ASN.1 value is omitted.  Optional-zero wrappers, like
239pointer wrappers, are usually defined just after the types they wrap.
240
241For null-terminated sequences, you can use a wrapper like this:
242
243  DEFOPTIONALEMPTYTYPE(opt_seqof_basetype, seqof_basetype)
244
245to omit the sequence if it is either NULL or of zero length.
246
247A more general way to wrap optional types is:
248
249  DEFOPTIONALTYPE(descname, predicatefn, initfn, basedesc);
250
251where "predicatefn" has the signature "int (*fn)(const void *p)" and
252is used by the encoder to test whether the ASN.1 value is present in
253the C object.  "initfn" has the signature "void (*fn)(void *p)" and is
254used by the decoder to initialize the C object field if the
255corresponding ASN.1 value is omitted in the wire encoding.  "initfn"
256can be NULL, in which case the C object will simply be left alone.
257All C objects are initialized to zero-filled memory when they are
258allocated by the decoder.
259
260An optional string type, represented in a krb5_data structure, can be
261wrapped using the nonempty_data function already defined in
262asn1_k_encode.c, like so:
263
264  DEFOPTIONALTYPE(opt_ostring_data, nonempty_data, NULL, ostring_data);
265
266
267Sequence-of types
268-----------------
269
270ASN.1 sequence-of types can be represented as C types in two ways.
271The simplest is to use an array of pointers terminated in a null
272pointer.  A descriptor for a sequence-of represented this way is
273defined in three steps:
274
275  DEFPTRTYPE(ptr_basetype, basetype);
276  DEFNULLTERMSEQOFTYPE(seqof_basetype, ptr_basetype);
277  DEFPTRTYPE(ptr_seqof_basetype, seqof_basetype);
278
279If the C type corresponding to basetype is "ctype", then the C type
280corresponding to ptr_seqof_basetype will be "ctype **".  The middle
281type sort of corresponds to "ctype *", but not exactly, as it
282describes an object of variable size.
283
284You can also use DEFNONEMPTYNULLTERMSEQOFTYPE in the second step.  In
285this case, the encoder will throw an error if the sequence is empty.
286For historical reasons, the decoder will *not* throw an error if the
287sequence is empty, so the calling code must check before assuming a
288first element is present.
289
290The other way of representing sequences is through a combination of
291pointer and count.  This pattern is most often used for compactness
292when the base type is an integer type.  A descriptor for a sequence-of
293represented this way is defined using a counted type descriptor:
294
295  DEFCOUNTEDSEQOFTYPE(descname, lentype, basedesc)
296
297where "lentype" is the C type of the length and "basedesc" is a
298pointer wrapper for the sequence element type (*not* the element type
299itself).  For example, an array of 32-bit signed integers is defined
300as:
301
302  DEFINTTYPE(int32, int32_t);
303  DEFPTRTYPE(int32_ptr, int32);
304  DEFCOUNTEDSEQOFTYPE(cseqof_int32, int32_t, int32_ptr);
305
306To use a counted sequence-of type in a sequence, use DEFCOUNTEDTYPE:
307
308  DEFCOUNTEDTYPE(descname, structuretype, ptrfield, lenfield, cdesc)
309
310where "structuretype", "ptrfield", and "lenfield" are used to compute
311the field offsets and type-check the structure fields, and "cdesc" is
312the name of the counted type descriptor.
313
314The combination of DEFCOUNTEDTYPE and DEFCTAGGEDTYPE can be
315abbreviated using DEFCNFIELD:
316
317  DEFCNFIELD(descname, structuretype, ptrfield, lenfield, tagnum, cdesc)
318
319
320Tag wrappers
321------------
322
323We've previously covered DEFCTAGGEDTYPE and DEFCTAGGEDTYPE_IMPLICIT,
324which are used to define context-specific tag wrappers.  There are
325two other macros for creating tag wrappers.  The first is:
326
327  DEFAPPTAGGEDTYPE(descname, tagnum, basedesc)
328
329Use this macro to model an "[APPLICATION tagnum]" tag wrapper in an
330ASN.1 module.
331
332There is also a general tag wrapper macro:
333
334  DEFTAGGEDTYPE(descname, class, construction, tag, implicit, basedesc)
335
336where "class" is one of UNIVERSAL, APPLICATION, CONTEXT_SPECIFIC, or
337PRIVATE, "construction" is one of PRIMITIVE or CONSTRUCTED, "tag" is
338the tag number, "implicit" is 1 for an implicit tag and 0 for an
339explicit tag, and "basedesc" is the wrapped type.  Note that that
340primitive vs. constructed is not a concept within the abstract ASN.1
341type model, but is instead a concept used in DER.  In general, all
342explicit tags should be constructed (but see the section on "Dirty
343tricks" below).  The construction parameter is ignored for implicit
344tags.
345
346
347Choice types
348------------
349
350ASN.1 CHOICE types are represented in C using a signed integer
351distinguisher and a union.  Modeling a choice type happens in three
352steps:
353
3541. Define type descriptors for each alternative of the choice,
355typically using DEFCTAGGEDTYPE to create a tag wrapper for an existing
356type.  There is no need to create offset type wrappers, as union
357fields always have an offset of 0.  For example:
358
359  DEFCTAGGEDTYPE(my_choice_0, 0, firstbasedesc);
360  DEFCTAGGEDTYPE(my_choice_1, 1, secondbasedesc);
361
3622. Assemble them into an array, similar to how you would for a
363sequence, and use DEFCHOICETYPE to create a counted type descriptor:
364
365  static const struct atype_info *my_choice_alternatives[] = {
366      &k5_atype_my_choice_0, &k5_atype_my_choice_1
367  };
368  DEFCHOICETYPE(my_choice, union my_choice_choices, enum my_choice_selector,
369                my_choice_alternatives);
370
371The second and third parameters to DEFCHOICETYPE are the C types of
372the union and distinguisher fields.
373
3743. Wrap the counted type descriptor in a type descriptor for the
375structure containing the distinguisher and union:
376
377  DEFCOUNTEDTYPE_SIGNED(descname, structuretype, u, choice, my_choice);
378
379The third and fourth parameters to DEFCOUNTEDTYPE_SIGNED are the field
380names of the union and distinguisher fields within structuretype.
381
382ASN.1 choice types may be defined to be extensible, or may not be.
383Our model does not distinguish between the two cases.  Our decoder
384treats all choice types as extensible.
385
386Our encoder will throw an error if the distinguisher is not within the
387range of valid offsets of the alternatives array.  Our decoder will
388set the distinguisher to -1 if the tag of the ASN.1 value is not
389matched by any of the alternatives, and will leave the union
390zero-filled in that case.
391
392
393Counted type descriptors
394------------------------
395
396Several times in earlier sections we've referred to the notion of
397"counted type descriptors" without defining what they are.  Counted
398type descriptors live in a separate namespace from normal type
399descriptors, and specify a mapping between an ASN.1 type and two C
400objects, one of them having integer type.  There are four kinds of
401counted type descriptors, defined using the following macros:
402
403  DEFCOUNTEDSTRINGTYPE(descname, ptrtype, lentype, encfn, decfn, tagnum)
404  DEFCOUNTEDDERTYPE(descname, ptrtype, lentype)
405  DEFCOUNTEDSEQOFTYPE(descname, lentype, baseptrdesc)
406  DEFCHOICETYPE(descname, uniontype, distinguishertype, fields)
407
408DEFCOUNTEDDERTYPE is described in the "Dirty tricks" section below.
409The other three kinds of counted types have been covered previously.
410
411Counted types are always used by wrapping them in a normal type
412descriptor with one of these macros:
413
414  DEFCOUNTEDTYPE(descname, structuretype, datafield, countfield, cdesc)
415  DEFCOUNTEDTYPE_SIGNED(descname, structuretype, datafield, countfield, cdesc)
416
417These macros are similar in concept to an offset type, only with two
418offsets.  Use DEFCOUNTEDTYPE if the count field is unsigned,
419DEFCOUNTEDTYPE_SIGNED if it is signed.
420
421
422Defining encoder and decoder functions
423--------------------------------------
424
425After you have created a type descriptor for your types, you need to
426create encoder or decoder functions for the ones you want calling code
427to be able to process.  Do this with one of the following macros:
428
429  MAKE_ENCODER(funcname, desc)
430  MAKE_DECODER(funcname, desc)
431  MAKE_CODEC(typename, desc)
432
433MAKE_ENCODER and MAKE_DECODER allow you to choose function names.
434MAKE_CODEC defines encoder and decoder functions with the names
435"encode_typename" and "decode_typename".
436
437If you are defining functions for a null-terminated sequence, use the
438descriptor created with DEFNULLTERMSEQOFTYPE or
439DEFNONEMPTYNULLTERMSEQOFTYPE, rather than the pointer to it.  This is
440because encoder and decoder functions implicitly traffic in pointers
441to the C object being encoded or decoded.
442
443Encoder and decoder functions must be prototyped separately, either in
444k5-int.h or in a subsidiary included by it.  Encoder functions have
445the prototype:
446
447  krb5_error_code encode_typename(const ctype *rep, krb5_data **code_out);
448
449where "ctype" is the C type corresponding to desc.  Decoder functions
450have the prototype:
451
452  krb5_error_code decode_typename(const krb5_data *code, ctype **rep_out);
453
454Decoder functions allocate a container for the C type of the object
455being decoded and return a pointer to it in *rep_out.
456
457
458Writing test cases
459------------------
460
461New ASN.1 types in libkrb5 will typically only be accepted with test
462cases.  Our current test framework lives in src/tests/asn.1.  Adding
463new types to this framework involves the following steps:
464
4651. Define an initializer for a sample value of the type in ktest.c,
466named ktest_make_sample_typename().  Also define a contents-destructor
467for it, named ktest_empty_typename().  Prototype these functions in
468ktest.h.
469
4702. Define an equality test for the type in ktest_equal.c.  Prototype
471this in ktest_equal.h.  (This step is not necessary if the type has no
472decoder.)
473
4743. Add a test case to krb5_encode_test.c, following the examples of
475existing test cases there.  Update reference_encode.out and
476trval_reference.out to contain the output generated by your test case.
477
4784. Add a test case to krb5_decode_test.c, following the examples of
479existing test cases there, and using the output generated by your
480encode test.
481
4825. Add a test case to krb5_decode_leak.c, following the examples of
483existing test cases there.
484
485Following these steps will not ensure the correctness of your
486translation of the ASN.1 module to macro invocations; it only lets us
487detect unintentional changes to the encodings after they are defined.
488To ensure that your translations are correct, you should extend
489tests/asn.1/make-vectors.c and use "make test-vectors" to create
490vectors using asn1c.
491
492
493Dirty tricks
494------------
495
496In rare cases you may want to represent the raw DER encoding of a
497value in the C structure.  If so, you can use DEFCOUNTEDDERTYPE (or
498more likely, the existing der_data type descriptor).  The encoder and
499decoder will throw errors if the wire encoding doesn't have a valid
500outermost tag, so be sure to use valid DER encodings in your test
501cases (see ktest_make_sample_algorithm_identifier for an example).
502
503Conversely, the ASN.1 module may define an OCTET STRING wrapper around
504a DER encoding which you want to represent as the decoded value.  (The
505existing example of this is in PKINIT hash agility, where the
506PartyUInfo and PartyVInfo fields of OtherInfo are defined as octet
507strings which contain the DER encodings of KRB5PrincipalName values.)
508In this case you can use a DEFTAGGEDTYPE wrapper like so:
509
510  DEFTAGGEDTYPE(descname, UNIVERSAL, PRIMITIVE, ASN1_OCTETSTRING, 0,
511                basedesc)
512
513
514Limitations
515-----------
516
517We cannot currently encode or decode SET or SET OF types.
518
519We cannot model self-referential types (like "MATHSET ::= SET OF
520MATHSET").
521
522If a sequence uses an optional field that is a choice field (without
523a context tag wrapper), or an optional field that uses a stored DER
524encoding (again, without a context tag wrapper), our decoder may
525assign a value to the choice or stored-DER field when the correct
526behavior is to skip that field and assign the value to a subsequent
527field.  It should be very rare for ASN.1 modules to use choice or open
528types this way.
529
530For historical interoperability reasons, our decoder accepts the
531indefinite length form for constructed tags, which is allowed by BER
532but not DER.  We still require the primitive forms of basic scalar
533types, however, so we do not accept all BER encodings of ASN.1 values.
534
535
536Debugging
537---------
538
539If you are looking at a stack trace with a bunch of ASN.1 encoder or
540decoder calls at the top, here are some notes that might help with
541debugging:
542
5431. You may have noticed that the entry point into the encoder is
544defined by a macro like MAKE_CODEC.  Don't worry about this; those
545macros just define thin wrappers around k5_asn1_full_encode and
546k5_asn1_full_decode.  If you are stepping through code and hit a
547wrapper function, just enter "step" to get into the actual encoder or
548decoder function.
549
5502. If you are in the encoder, look for stack frames in
551encode_sequence(), and print the value of i within those stack frames.
552You should be able to subtract 1 from those values and match them up
553with the sequence field offsets in asn1_k_encode.c for the type being
554encoded.  For example, if an as-req is being encoded and the i values
555(starting with the one closest to encode_krb5_as_req) are 4, 2, and 2,
556you could match those up as following:
557
558* as_req_encode wraps untagged_as_req, whose field at offset 3 is the
559  descriptor for kdc_req_4, which wraps kdc_req_body.
560
561* kdc_req_body is a function wrapper around kdc_req_hack, whose field
562  at offset 1 is the descriptor for req_body_1, which wraps
563  opt_principal.
564
565* opt_principal wraps principal, which wraps principal_data, whose
566  field at offset 1 is the descriptor for princname_1.
567
568* princname_1 is a sequence of general strings represented in the data
569  and length fields of the krb5_principal_data structure.
570
571So the problem would likely be in the data components of the client
572principal in the kdc_req structure.
573
5743. If you are in the decoder, look for stacks frames in
575decode_sequence(), and again print the values of i.  You can match
576these up just as above, except without subtracting 1 from the i
577values.
578