1#
2# Reflects Oren's comments, adds yamlbyte.h at the bottom
3#
4subject: Revision #4 of YAML Bytecodes
5summary: >
6 This proposal defines a 'preparsed' format where a YAML syntax
7 is converted into a series of events, as bytecodes. Each bytecode
8 appears on its own line, starting with a single character and ending
9 with a line feed character, '\n'.
10codes:
11 #
12 # Primary Bytecodes (Capital Letters)
13 #
14 # These bytecodes form the minimum needed to represent YAML information
15 # from the serial model (ie, without format and comments)
16 #
17 'D':
18 name: Document
19 desc: >
20 Indicates that a document has begun, either it is
21 the beginning of a YAML stream, or a --- has been
22 found. Thus, an empty document is expressed
23 as "D\n"
24 'V':
25 name: Directive
26 desc: >
27 This represents any YAML directives immediately following
28 a 'D' bytecode. For example '--- %YAML:1.0' produces the
29 bytecode "D\nVYAML:1.0\n".
30 'P':
31 name: Pause Stream
32 desc: >
33 This is the instruction when a document is terminated, but
34 another document has not yet begun. Thus, it is optional,
35 and typically used to pause parsing. For example,
36 a stream starting with an empty document, but then in a
37 hold state for the next document would be: "D\nP\n"
38 '\z':
39 name: Finish (end stream)
40 desc: >
41 YAML bytecodes are meant to be passable as a single "C"
42 string, and thus the null terminator can optionally be
43 used to signal the end of a stream. When writing bytecodes
44 out to a flat file, the file need not contain a null
45 terminator; however, when read into memory it should
46 always have a null terminator.
47 'M':
48 name: Mapping
49 desc: >
50 Indicates the begin of a mapping, children of the
51 mapping are provided as a series of K1,V1,K2,V2
52 pairs as they are found in the input stream. For
53 example, the bytecodes for "{ a: b, c: d }" would
54 be "M\nSa\nSb\nSc\nSd\nE\n"
55 'Q':
56 name: Sequence
57 desc: >
58 Indicates the begin of a sequence, children are provided
59 following till a '.' bytecode is encountered. So, the
60 bytecodes for "[ one, two ]" would be "Q\nSone\nStwo\nE\n"
61 'E':
62 name: End Collection
63 desc: >
64 This closes the outermost Collection (Mapping, Sequence),
65 note that the document has one and only one node following
66 it, therefore it is not a branch.
67 'S':
68 name: Scalar
69 desc: >
70 This indicates the start of a scalar value, which can
71 be continued by the 'N' and 'C' bytecodes. This bytecode
72 is used for sequence entries, keys, values, etc.
73 'C':
74 name: Scalar Continuation
75 desc: >
76 Since a scalar may not fit within a buffer, and since it
77 may not contain a \n character, it may have to be broken
78 into several chunks.
79 'N':
80 name: Normalized New Line (in a scalar value)
81 desc: >
82 Scalar values must be chunked so that new lines and
83 null values do not occur within a 'S' or 'C' bytecode
84 (in the bytecodes, all other C0 need not be escaped).
85 This bytecode is then used to represent one or more
86 newlines, with the number of newlines optionally
87 following. For example,
88 "Hello\nWorld" would be "SHello\nN\nCWorld\n", and
89 "Hello\n\n\nWorld" is "SHello\nN3\nCWorld\n"
90
91 If the new line is an LS or a PS, the N bytecode can
92 be followed with a L or P. Thus, "Hello\PWorld\L" is
93 reported "SHello\nNP\nWorld\NL\n"
94
95 'Z':
96 name: Null Character (in a scalar value)
97 desc: >
98 As in normalized new lines above, since the null character
99 cannot be used in the bytecodes, is must be escaped, ie,
100 "Hello\zWorld" would be "SHello\nZ\nCWorld\n".
101 'A':
102 name: Alias
103 desc: >
104 This is used when ever there is an alias node, for
105 example, "[ &X one, *X ]" would be normalized
106 to "S\nAX\nSone\nRX\nE\n" -- in this example, the
107 anchor bytecode applies to the very next content
108 bytecode.
109 'R':
110 name: Reference (Anchor)
111 desc: >
112 This bytecode associates an anchor with the very next
113 content node, see the 'A' alias bytecode.
114 'T':
115 name: Transfer
116 desc: >
117 This is the transfer method. If the value begins with
118 a '!', then it is not normalized. Otherwise, the value
119 is a fully qualified URL, with a semicolon. The transfer
120 method applies only to the node immediately following,
121 and thus it can be seen as a modifier like the anchor.
122 For example, "Ttag:yaml.org,2002:str\nSstring\n" is
123 normalized, "T!str\nSstring\n" is not.
124 #
125 # Formatting bytecodes (lower case)
126 #
127 # The following bytecodes are purely at the syntax level and
128 # useful for pretty printers and emitters. Since the range of
129 # lower case letters is contiguous, it could be easy for a
130 # processor to simply ignore all bytecodes in this range.
131 #
132 'c':
133 name: Comment
134 desc: >
135 This is a single line comment. It is terminated like all
136 of the other variable length items, with a '\n'.
137 'i':
138 name: Indent
139 desc: >
140 Specifies number of additional spaces to indent for
141 subsequent block style nodes, "i4\n" specifies 4 char indent.
142 's':
143 name: Scalar styling
144 desc: >
145 This bytecode, is followed with one of the following
146 items to indicate the style to be used for the very
147 next content node. It is an error to specify a style for
148 a scalar other than double quoted when it must be escaped.
149 Furthermore, there must be agreement between the style
150 and the very next content node, in other words, a scalar
151 style requires that the next content node be an S.
152
153 > flow scalar
154 " double quoted scalar
155 ' single quoted scalar
156 | literal scalar
157 p plain scalar
158 { inline mapping
159 [ inline sequence
160 b block style (for mappings and sequences'")
161
162 #
163 # Advanced bytecodes (not alphabetic)
164 #
165 # These are optional goodies which one could find useful.
166 #
167 '#':
168 name: Line Number
169 desc: >
170 This bytecode allows the line number of the very next
171 node to be reported.
172 '!':
173 name: Notice
174 desc: >
175 This is a message sent from the producer to the consumer
176 regarding the state of the stream or document. It does
177 not necessarly end a stream, as the 'finish' bytecode can
178 be used for this purpose. This signal has a packed format,
179 with the error number, a comma, and a textual message:
180 "#22\n!73,Indentation mismatch\n"
181 "#132\n!84,Tabs are illegal for indentation\n"
182 ',':
183 name: Span
184 desc: >
185 This bytecode gives the span of the very next 'S', 'M',
186 or 'Q' bytecode -- including its subordinates. For scalars,
187 it includes the span of all subordinate 'N' and 'C' codes.
188 For mappings or sequences, this gives the length all the
189 way to the corresponding 'E' bytecode so that the entire
190 branch can be skipped. The length is given starting at
191 the corresponding 'S', 'M' or 'Q' bytecode and extends
192 to the first character following subordinate nodes.
193
194 Since this length instruction is meant to be used to 'speed'
195 things up, and since calculating the length via hand is not
196 really ideal, the length is expressed in Hex. This will allow
197 programs to easily convert the length to an actual value
198 (converting from hex to integers is easier than decimal).
199 Furthermore, all leading x's are ignored (so that they can
200 be filled in later) and if the bytecode value is all x's,
201 then the length is unknown. Lastly, this length is expressed
202 in 8 bit units for UTF-8, and 16 bit units for UTF-16.
203
204 For example,
205 --- [[one, two], three]
206 Is expressed as,
207 "?25\nD\n?x1E\nQ\n?xxE\nQ\nSone\nStwo\nE\nSthree\nE\n"
208
209 Thus it is seen that the address of D plus 37 is the null
210 terminator for the string, the first 'Q' plus 30 also
211 gives the null teriminator, and the second 'Q' plus
212 14 jumps to the opening 'S' for the third scalar.
213 '@':
214 name: Allocate
215 desc: >
216 This is a hint telling the processor how many items
217 are in the following collection (mapping pairs, or
218 sequence values), or how many character units need
219 to be allocated to hold the next value. Clearly this
220 is encoding specific value. The length which
221 follows is in hex (not decimal).
222
223 For example, "one", could be "@x3\nSone"
224
225design:
226 -
227 name: streaming support
228 problem: >
229 The interface should ideally allow for a YAML document to be
230 moved incrementally as a stream through a process. In particular,
231 YAML is inheritently line oriented, thus the interface should
232 probably reflect this fundamental character.
233 solution: >
234 The bytecodes deliver scalars as chunks, each chunk limited to
235 at most one line. While this is not ideal for passing large
236 binary objects, it is simple and easy to understand.
237 -
238 name: push
239 problem: >
240 The most common 'parsers' out there for YAML are push style, where
241 the producer owns the 'C' program stack, and the consumer keeps
242 its state as a heap object. Ideal use of a push interface is an
243 emitter, since this allows the sender (the application program)
244 to use the program stack and thus keep its state on the call stack
245 in local, automatic variables.
246 solution: >
247 A push interface simply can call a single event handler with a
248 (bytecode, payload) tuple. Since the core complexity is in the
249 bytecodes, the actual function signature is straight-forward
250 allowing for relative language independence. Since the bytecode
251 is always one character, the event handler could just receive
252 a string where the tuple is implicit.
253 -
254 name: pull
255 problem: >
256 The other alternative for a streaming interface is a 'pull' mechanism,
257 or iterator model where the consumer owns the C stack and the producer
258 keeps any state needed as a heap object. Ideal use of a pull
259 interface is a parser, since this allows the receiver (the application
260 program) to use the program stack, keeping its state on the call stack
261 in local variables.
262 solution: >
263 A pull interface would also be a simple function, that when called
264 filles a buffer with binary node(s). Or, in a language with
265 garbage collection, could be implemented as an iterator returning
266 a string containing the bytecode line (bytecode followed immediately
267 by the bytecode argument as a single string) or as a tuple.
268 -
269 name: pull2push
270 problem: >
271 This is done easily via a small loop which pulls from the
272 iterator and pushes to the event handler.
273 solution: >
274 For python, assuming the parser is implemented as an iterator
275 where one can 'pull' bytecode, args tuples, and assuming the
276 emitter has a event callback taking a bytecode, args tuple,
277 we have:
278
279 def push2pull(parser, emitter):
280 for (bytecode, args) in parser:
281 emitter.push(bytecode, args)
282
283 -
284 name: push2pull
285 problem: >
286 This requires the entire YAML stream be cashed in memory, or
287 each of the two stages in a thread or different continuation
288 with shared memory or pipe between them.
289 solution: >
290 This use case seems much easier with a binary stream; that is,
291 one need not convert the style of functions between the push
292 vs pull pattern. And, for languages supporting continuations,
293 (ruby) perhaps push vs pull is not even an issue... for a
294 language like python, one would use the threaded Queue object,
295 one thread pushes (bytecode, args) tuples into the Queue, while
296 the other thread pulls the tuples out. Simple.
297 -
298 name: neutrality
299 problem: >
300 It would be ideal of the C Program interface was simple enough
301 to be independent of programming language. In an ideal case,
302 imagine a flow of YAML structured data through various processing
303 stages on a server; where each processing stage is written in
304 a different programming language.
305 solution: >
306 While it may be hard for each language to write a syntax parser
307 filled with all of the little details, it would be much much
308 easier to write a parser for these bytecodes; as it involves
309 simple string handling, dispatching on the first character in
310 each string.
311 -
312 name: tools
313 problem: >
314 A goal of mine is to have a YPATH expression language, a schema
315 language, and a transformation language. I would like these items
316 to be reusable by a great number of platforms/languages, and in
317 particular as its own callable processing stage.
318 solution: >
319 If such an expression language was written on top of a bytecode
320 format like this, via a simple pull function (/w adapters for
321 push2pull and pull2push) quite a bit of reusability could emerge.
322 Imagine a schema validator which is injected into the bytecode stream
323 and it is an identity operation unless an exception occurs, in
324 which case, it terminates the document and makes the next document
325 be a description of the validation error.
326 -
327 name: encoding
328 problem: >
329 Text within the bytecode format must be given an encoding. There are
330 several considerations at hand listed below.
331 solution: >
332 The YAML bytecode format uses the same encodings as YAML itself,
333 and thus is independent of actual encoding. A parser library should
334 have several functions to convert between the encodings.
335examples:
336 -
337 yaml: |
338 ---
339 - plain
340 - >
341 this is a flow scalar
342 - >
343 another flow scalar which is continued
344 on a second line and indented 2 spaces
345 - &001 !str |
346 This is a block scalar, both typed
347 and anchored
348 - *001 # this was an alias
349 - "This is a \"double quoted\" scalar"
350 bytecode: |
351 D
352 Q
353 Splain
354 f
355 Sthis is a flow scalar
356 Sanother flow scalar which is continued
357 Con a second line and indented 2 spaces
358 b
359 a001
360 t!str
361 SThis is a block scalar, both typed
362 N
363 Cand anchored
364 R001
365 cthis was an alias
366 d
367 SThis is a "double quoted" scalar
368 E
369cheader: |
370 /* yamlbyte.h
371 *
372 * The YAML bytecode "C" interface header file. See the YAML bytecode
373 * reference for bytecode sequence rules and for the meaning of each
374 * bytecode.
375 */
376
377 #ifndef YAMLBYTE_H
378 #define YAMLBYTE_H
379 #include <stddef.h>
380 /* list out the various YAML bytecodes */
381 typedef enum {
382 /* content bytecodes */
383 YAML_FINISH = 0,
384 YAML_DOCUMENT = 'D',
385 YAML_DIRECTIVE = 'V',
386 YAML_PAUSE = 'P',
387 YAML_MAPPING = 'M',
388 YAML_SEQUENCE = 'S',
389 YAML_ENDMAPSEQ = 'E',
390 YAML_SCALAR = 'S',
391 YAML_CONTINUE = 'C',
392 YAML_NEWLINE = 'N',
393 YAML_NULLCHAR = 'Z',
394 YAML_ALIAS = 'A',
395 YAML_ANCHOR = 'R',
396 YAML_TRANSFER = 'T',
397 /* formatting bytecodes */
398 YAML_COMMENT = 'c',
399 YAML_INDENT = 'i',
400 YAML_STYLE = 's',
401 /* other bytecodes */
402 YAML_LINENUMBER = '#',
403 YAML_NOTICE = '!',
404 YAML_SPAN = ',',
405 YAML_ALLOC = '@'
406 } yaml_code_t;
407
408 /* additional modifiers for the YAML_STYLE bytecode */
409 typedef enum {
410 YAML_FLOW = '>',
411 YAML_LITERAL = '|',
412 YAML_BLOCK = 'b',
413 YAML_PLAIN = 'p',
414 YAML_INLINE_MAPPING = '{',
415 YAML_INLINE_SEQUENCE = '}',
416 YAML_SINGLE_QUOTED = 39,
417 YAML_DOUBLE_QUOTED = '"'
418 } yaml_style_t;
419
420 typedef unsigned char yaml_utf8_t;
421 typedef unsigned short yaml_utf16_t;
422 #ifdef YAML_UTF8
423 #ifdef YAML_UTF16
424 #error Must only define YAML_UTF8 or YAML_UTF16
425 #endif
426 typedef yaml_utf8_t yaml_char_t;
427 #else
428 #ifdef YAML_UTF16
429 typedef yaml_utf16_t yaml_char_t;
430 #else
431 #error Must define YAML_UTF8 or YAML_UTF16
432 #endif
433 #endif
434
435 /* return value for push function, tell parser if you want to stop */
436 typedef enum {
437 YAML_MORE = 1, /* producer should continue to fire events */
438 YAML_STOP = 0 /* producer should stop firing events */
439 } yaml_more_t;
440
441 /* push bytecodes from a producer to a consumer
442 * where arg is null terminated /w a length */
443 typedef void * yaml_consumer_t;
444 typedef
445 yaml_more_t
446 (*yaml_push_t)(
447 yaml_consumer_t self,
448 yaml_code_t code,
449 const yaml_char_t *arg,
450 size_t arglen
451 );
452
453 /* pull bytecodes by the producer from the consumer, where
454 * producer must null terminate buff and return the number
455 * of sizeof(yaml_char_t) bytes used */
456 typedef void * yaml_producer_t;
457 typedef
458 size_t
459 (*yaml_pull_t)(
460 yaml_producer_t self,
461 yaml_code_t *code,
462 yaml_char_t *buff, /* at least 1K buffer */
463 size_t buffsize
464 ); /* returns number of bytes used in the buffer */
465
466 /* canonical helper to show how to hook up a parser (as a push
467 * producer) to an emitter (as a push consumer) */
468 #define YAML_PULL2PUSH(pull, producer, push, consumer) \
469 do { \
470 yaml_code_t code = YAML_NOTICE; \
471 yaml_more_t more = YAML_CONTINUE; \
472 yaml_char_t buff[1024]; \
473 size_t size = 0; \
474 memset(buff, 0, 1024 * sizeof(yaml_char_t)); \
475 while( code && more) { \
476 size = (pull)((producer),&code, buff, 1024); \
477 assert(size < 1024 && !buff[size]); \
478 more = (push)((consumer),code, buff, size); \
479 } \
480 buff[0] = 0; \
481 (push)((consumer),YAML_FINISH, buff, 0); \
482 } while(1)
483
484 #endif
485
1$Id$
2
3This is the documentation for libsyck and describes how to extend it.
4
5= Overview =
6
7Syck is designed to take a YAML stream and a symbol table and move
8data between the two. Your job is to simply provide callback functions which
9understand the symbol table you are keeping.
10
11Syck also includes a simple symbol table implementation.
12
13== About the Source ==
14
15The Syck distribution is laid out as follows:
16
17 lib/ libsyck source (core API)
18 bytecode.re lexer for YAML bytecode (re2c)
19 emitter.c emitter functions
20 gram.y grammar for YAML documents (bison)
21 handler.c internal handlers which glue the lexer and grammar
22 implicit.re lexer for builtin YAML types (re2c)
23 node.c node allocation and access
24 syck.c parser funcs, central funcs
25 syck.h libsyck definitions
26 syck_st.c symbol table functions
27 syck_st.h symbol table definitions
28 token.re lexer for YAML plaintext (re2c)
29 yaml2byte.c simple bytecode emitter
30 ext/ ruby, python, php, cocoa extensions
31 tests/ unit tests for libsyck
32 YTS.c.rb generates YAML Testing Suite unit test
33 (use: ruby YTS.c.rb > YTS.c)
34 Basic.c allocation and buffering tests
35 Parse.c parser sanity
36 Emit.c emitter sanity
37
38== Using SyckNodes ==
39
40The SyckNode is the structure which YAML data is loaded into
41while parsing. It's also a good structure to use while emitting,
42however you may choose to emit directly from your native types
43if your extension is very small.
44
45SyckNodes are designed to be used in conjunction with a symbol
46table. More on that in a moment. For now, think of a symbol
47table as a library which stores nodes, assigning each node a
48unique identifier.
49
50This identifier is called the SYMID in Syck. Nodes refer to
51each other by SYMIDs, rather than pointers. This way, the
52nodes can be free'd as the parser goes.
53
54To be honest, SYMIDs are used because this is the way Ruby
55works. And this technique means Syck can use Ruby's symbol
56table directly. But the included symbol table is lightweight,
57solves the problem of keeping too much data in memory, and
58simply pairs SYMIDs with your native object type (such as
59PyObject pointers.)
60
61Three kinds of SyckNodes are available:
62
631. scalar nodes (syck_str_kind):
64 These nodes store a string, a length for the string
65 and a style (indicating the format used in the YAML
66 document).
67
682. sequence nodes (syck_seq_kind):
69 Sequences are YAML's array or list type.
70 These nodes store a list of items, which allocation
71 is handled by syck functions.
72
733. mapping nodes (syck_map_kind):
74 Mappings are YAML's dictionary or hashtable type.
75 These nodes store a list of pairs, which allocation
76 is handled by syck functions.
77
78The syck_kind_tag enum specifies the above enumerations,
79which can be tested against the SyckNode.kind field.
80
81PLEASE leave the SyckNode.shortcut field alone!! It's
82used by the parser to workaround parser ambiguities!!
83
84=== Node API ===
85
86 SyckNode *
87 syck_alloc_str()
88 syck_alloc_seq()
89 syck_alloc_str()
90
91 Allocates a node of a given type and initializes its
92 internal union to emptiness. When left as-is, these
93 nodes operate as a valid empty string, empty sequence
94 and empty map.
95
96 Remember that the node's id (SYMID) isn't set by the
97 allocation functions OR any other node functions herein.
98 It's up to your handler function to do that.
99
100 void
101 syck_free_node( SyckNode *n )
102
103 While the Syck parser will free nodes it creates, use
104 this to free your own nodes. This function will free
105 all of its internals, its type_id and its anchor. If
106 you don't need those members free, please be sure they
107 are set to NULL.
108
109 SyckNode *
110 syck_new_str( char *str, enum scalar_style style )
111 syck_new_str2( char *str, long len, enum scalar_style style )
112
113 Creates scalar nodes from C strings. The first function
114 will call strlen() to determine length.
115
116 void
117 syck_replace_str( SyckNode *n, char *str, enum scalar_style style )
118 syck_replace_str2( SyckNode *n, char *str, long len, enum scalar_style style )
119
120 Replaces the string content of a node `n', while keeping
121 the node's type_id, anchor and id.
122
123 char *
124 syck_str_read( SyckNode *n )
125
126 Returns a pointer to the null-terminated string inside scalar node
127 `n'. Normally, you might just want to use:
128
129 char *ptr = n->data.str->ptr
130 long len = n->data.str->len
131
132 SyckNode *
133 syck_new_map( SYMID key, SYMID value )
134
135 Allocates a new map with an initial pair of nodes.
136
137 void
138 syck_map_empty( SyckNode *n )
139
140 Empties the set of pairs for a mapping node.
141
142 void
143 syck_map_add( SyckNode *n, SYMID key, SYMID value )
144
145 Pushes a key-value pair on the mapping. While the ordering
146 of pairs DOES affect the ordering of pairs on output, loaded
147 nodes are deliberately out of order (since YAML mappings do
148 not preserve ordering.)
149
150 See YAML's builtin !omap type for ordering in mapping nodes.
151
152 SYMID
153 syck_map_read( SyckNode *n, enum map_part, long index )
154
155 Loads a specific key or value from position `index' within
156 a mapping node. Great for iteration:
157
158 for ( i = 0; i < syck_map_count( n ); i++ ) {
159 SYMID key = sym_map_read( n, map_key, i );
160 SYMID val = sym_map_read( n, map_value, i );
161 }
162
163 void
164 syck_map_assign( SyckNode *n, enum map_part, long index, SYMID id )
165
166 Replaces a specific key or value at position `index' within
167 a mapping node. Useful for replacement only, will not allocate
168 more room when assigned beyond the end of the pair list.
169
170 long
171 syck_map_count( SyckNode *n )
172
173 Returns a count of the pairs contained by the mapping node.
174
175 void
176 syck_map_update( SyckNode *n, SyckNode *n2 )
177
178 Combines all pairs from mapping node `n2' into mapping node
179 `n'.
180
181 SyckNode *
182 syck_new_seq( SYMID val )
183
184 Allocates a new seq with an entry `val'.
185
186 void
187 syck_seq_empty( SyckNode *n )
188
189 Empties a sequence node `n'.
190
191 void
192 syck_seq_add( SyckNode *n, SYMID val )
193
194 Pushes a new item `val' onto the end of the sequence.
195
196 void
197 syck_seq_assign( SyckNode *n, long index, SYMID val )
198
199 Replaces the item at position `index' in the sequence
200 node with item `val'. Useful for replacement only, will not allocate
201 more room when assigned beyond the end of the pair list.
202
203 SYMID
204 syck_seq_read( SyckNode *n, long index )
205
206 Reads the item at position `index' in the sequence node.
207 Again, for iteration:
208
209 for ( i = 0; i < syck_seq_count( n ); i++ ) {
210 SYMID val = sym_seq_read( n, i );
211 }
212
213 long
214 syck_seq_count( SyckNode *n )
215
216 Returns a count of items contained by sequence node `n'.
217
218== YAML Parser ==
219
220Syck's YAML parser is extremely simple. After setting up a
221SyckParser struct, along with callback functions for loading
222node data, use syck_parse() to start reading data. Since
223syck_parse() only reads single documents, the stream can be
224managed by calling syck_parse() repeatedly for an IO source.
225
226The parser has four callbacks: one for reading from the IO
227source, one for handling errors that show up, one for
228handling nodes as they come in, one for handling bad
229anchors in the document. Nodes are loaded in the order they
230appear in the YAML document, however nested nodes are loaded
231before their parent.
232
233=== How to Write a Node Handler ===
234
235Inside the node handler, the normal process should be:
236
2371. Convert the SyckNode data to a structure meaningful
238 to your application.
239
2402. Check for the bad anchor caveat described in the
241 next section.
242
2433. Add the new structure to the symbol table attached
244 to the parser. Found at parser->syms.
245
2464. Return the SYMID reserved in the symbol table.
247
248=== Nodes and Memory Allocation ===
249
250One thing about SyckNodes passed into your handler:
251Syck WILL free the node once your handler is done with it.
252The node is temporary. So, if you plan on keeping a node
253around, you'll need to make yourself a new copy.
254
255And you'll probably need to reassign all the items
256in a sequence and pairs in a map. You can do this
257with syck_seq_assign() and syck_map_assign(). But, before
258you do that, you might consider using your own node structure
259that fits your application better.
260
261=== A Note About Anchors in Parsing ===
262
263YAML anchors can be recursive. This means deeper alias nodes
264can be loaded before the anchor. This is the trickiest part
265of the loading process.
266
267Assuming this YAML document:
268
269 --- &a [*a]
270
271The loading process is:
272
2731. Load alias *a by calling parser->bad_anchor_handler, which
274 reserves a SYMID in the symbol table.
275
2762. The `a' anchor is added to Syck's own anchor table,
277 referencing the SYMID above.
278
2793. When the anchor &a is found, the SyckNode created is
280 given the SYMID of the bad anchor node above. (Usually
281 nodes created at this stage have the `id' blank.)
282
2834. The parser->handler function is called with that node.
284 Check for node->id in the handler and overwrite the
285 bad anchor node with the new node.
286
287=== Parser API ===
288
289 See <syck.h> for layouts of SyckParser and SyckNode.
290
291 SyckParser *
292 syck_new_parser()
293
294 Creates a new Syck parser.
295
296 void
297 syck_free_parser( SyckParser *p )
298
299 Frees the parser, as well as associated symbol tables
300 and buffers.
301
302 void
303 syck_parser_implicit_typing( SyckParser *p, int on )
304
305 Toggles implicit typing of builtin YAML types. If
306 this is passed a zero, YAML builtin types will be
307 ignored (!int, !float, etc.) The default is 1.
308
309 void
310 syck_parser_taguri_expansion( SyckParser *p, int on )
311
312 Toggles expansion of types in full taguri. This
313 defaults to 1 and is recommended to stay as 1.
314 Turning this off removes a layer of abstraction
315 that will cause incompatibilities between YAML
316 documents of differing versions.
317
318 void
319 syck_parser_handler( SyckParser *p, SyckNodeHandler h )
320
321 Assign a callback function as a node handler. The
322 SyckNodeHandler signature looks like this:
323
324 SYMID node_handler( SyckParser *p, SyckNode *n )
325
326 void
327 syck_parser_error_handler( SyckParser *p, SyckErrorHandler h )
328
329 Assign a callback function as an error handler. The
330 SyckErrorHandler signature looks like this:
331
332 void error_handler( SyckParser *p, char *str )
333
334 void
335 syck_parser_bad_anchor_handler( SyckParser *p, SyckBadAnchorHandler h )
336
337 Assign a callback function as a bad anchor handler.
338 The SyckBadAnchorHandler signature looks like this:
339
340 SyckNode *bad_anchor_handler( SyckParser *p, char *anchor )
341
342 void
343 syck_parser_file( SyckParser *p, FILE *f, SyckIoFileRead r )
344
345 Assigns a FILE pointer as an IO source and a callback function
346 which handles buffering of that IO source.
347
348 The SyckIoFileRead signature looks like this:
349
350 long SyckIoFileRead( char *buf, SyckIoFile *file, long max_size, long skip );
351
352 Syck comes with a default FILE handler named `syck_io_file_read'. You
353 can assign this default handler explicitly or by simply passing in NULL
354 as the `r' parameter.
355
356 void
357 syck_parser_str( SyckParser *p, char *ptr, long len, SyckIoStrRead r )
358
359 Assigns a string as the IO source with a callback function `r'
360 which handles buffering of the string.
361
362 The SyckIoStrRead signature looks like this:
363
364 long SyckIoFileRead( char *buf, SyckIoStr *str, long max_size, long skip );
365
366 Syck comes with a default string handler named `syck_io_str_read'. You
367 can assign this default handler explicitly or by simply passing in NULL
368 as the `r' parameter.
369
370 void
371 syck_parser_str_auto( SyckParser *p, char *ptr, SyckIoStrRead r )
372
373 Same as the above, but uses strlen() to determine string size.
374
375
376 SYMID
377 syck_parse( SyckParser *p )
378
379 Parses a single document from the YAML stream, returning the SYMID for
380 the root node.
381
382== YAML Emitter ==
383
384Since the YAML 0.50 release, Syck has featured a new emitter API. The idea
385here is to let Syck figure out shortcuts that will clean up output, detect
386builtin YAML types and -- especially -- determine the best way to format
387outgoing strings.
388
389The trick with the emitter is to learn its functions and let it do its
390job. If you don't like the formatting Syck is producing, please get in
391contact the author and pitch your ideas!!
392
393Like the YAML parser, the emitter has a couple of callbacks: namely,
394one for IO output and one for handling nodes. Nodes aren't necessarily
395SyckNodes. Since we're ultimately worried about creating a string, SyckNodes
396become sort of unnecessary.
397
398=== The Emitter Process ===
399
4001. Traverse the structure you will be emitting, registering all nodes
401 with the emitter using syck_emitter_mark_node(). This step will
402 determine anchors and aliases in advance.
403
4042. Call syck_emit() to begin emitting the root node.
405
4063. Within your emitter handler, use the syck_emit_* convenience methods
407 to build the document.
408
4094. Call syck_emit_flush() to end the document and push the remaining
410 document to the IO stream. Or continue to add documents to the output
411 stream with syck_emit().
412
413=== Emitter API ===
414
415 See <syck.h> for the layout of SyckEmitter.
416
417 SyckEmitter *
418 syck_new_emitter()
419
420 Creates a new Syck emitter.
421
422 SYMID
423 syck_emitter_mark_node( SyckEmitter *e, st_data_t node )
424
425 Adds an outgoing node to the symbol table, allocating an anchor
426 for it if it has repeated in the document and scanning the type
427 tag for auto-shortcut.
428
429 void
430 syck_output_handler( SyckEmitter *e, SyckOutputHandler out )
431
432 Assigns a callback as the output handler.
433
434 void *out_handler( SyckEmitter *e, char * ptr, long len );
435
436 Receives the emitter object, pointer to the buffer and a count
437 of bytes which should be read from the buffer.
438
439 void
440 syck_emitter_handler( SyckEmitter *e, SyckEmitterHandler
441
442
443 void
444 syck_free_emitter
445