1
2@section Introduction
3
4This chapter is under construction!
5
6
7This chapter describes some of the internals of @command{vasm}
8 and tries to explain
9what has to be done to write a cpu module, a syntax module
10or an output module for @command{vasm}.
11However if someone wants to write one, I suggest to contact me first,
12so that it can be integrated into the source tree.
13
14Note that this documentation may mention explicit values when introducing
15symbolic constants. This is due to copying and pasting from the source
16code. These values may not be up to date and in some cases can be overridden.
17Therefore do never use the absolute values but rather the symbolic
18representations.
19
20
21@section Building vasm
22
23This section deals with the steps necessary to build the typical
24@command{vasm} executable from the sources.
25
26@subsection Directory Structure
27
28    The vasm-directory contains the following important files and
29    directories:
30@table @file
31@item    vasm/
32The main directory containing the assembler sources.
33
34@item    vasm/Makefile
35The Makefile used to build @command{vasm}.
36
37@item    vasm/syntax/<syntax-module>/
38Directories for the syntax modules.
39
40@item    vasm/cpus/<cpu-module>/
41Directories for the cpu modules.
42
43@item vasm/obj/
44Directory the object modules will be stored in.
45
46@end table
47
48    All compiling is done from the main directory and
49    the executables will be placed there as well.
50    The main assembler for a combination of @code{<cpu>} and
51    @code{<syntax>} will
52    be called @command{vasm<cpu>_<syntax>}. All output modules are
53    usually integrated in every executable and can be selected at
54    runtime.
55
56@subsection Adapting the Makefile
57
58    Before building anything you have to insert correct values for
59    your compiler and operating system in the @file{Makefile}.
60
61@table @code
62    @item TARGET
63       Here you may define an extension which is appended to the executable's
64       name. Useful, if you build various targets in the same directory.
65
66    @item TARGETEXTENSION
67       Defines the file name extension for executable files. Not needed for
68       most operating systems. For Windows it would be @file{.exe}.
69
70    @item CC
71       Here you have to insert a command that invokes an ANSI C
72       compiler you want to use to build vasm. It must support
73       the @option{-I} option the same like e.g. @command{vc} or
74       @command{gcc}.
75
76    @item COPTS
77       Here you will usually define an option like @option{-c} to instruct
78       the compiler to generate an object file.
79       Additional options, like the optimization level, should also be
80       inserted here as well. When the host operating system is different
81       from a Unix (MacOSX and MiNT are Unix), you have to define one of the
82       following preprocessor macros:
83       @table @code
84          @item -DAMIGA
85          AmigaOS (M68k or PPC), MorphOS, AROS.
86          @item -DATARI
87          Atari TOS.
88          @item -DMSDOS
89          CP/M, MS-DOS, Windows.
90       @end table
91
92    @item CCOUT
93       Here you define the option which is used to specify the name of
94       an output file, which is usually @option{-o}.
95
96    @item LD
97       Here you insert a command which starts the linker. This may be the
98       the same as under @code{CC}.
99
100    @item LDFLAGS
101       Here you have to add options which are necessary for linking.
102       E.g. some compilers need special libraries for floating-point.
103
104    @item LDOUT
105       Here you define the option which is used by the linker to specify
106       the output file name.
107
108    @item RM
109      Specify a command to delete a file, e.g. @code{rm -f}.
110@end table
111
112    An example for the Amiga using @command{vbcc} would be:
113@example
114      TARGET = _os3
115      TARGETEXTENSION =
116      CC = vc +aos68k
117      CCOUT = -o
118      COPTS = -c -c99 -cpu=68020 -DAMIGA -O1
119      LD = $(CC)
120      LDOUT = $(CCOUT)
121      LDFLAGS = -lmieee
122      RM = delete force quiet
123@end example
124
125    An example for a typical Unix-installation would be:
126@example
127      TARGET =
128      TARGETEXTENSION =
129      CC = gcc
130      CCOUT = -o
131      COPTS = -c -O2
132      LD = $(CC)
133      LDOUT = $(CCOUT)
134      LDFLAGS = -lm
135      RM = rm -f
136@end example
137
138Open/Net/Free/Any BSD i386 systems will probably require the following
139an additional @option{-D_ANSI_SOURCE} in @code{COPTS}.
140
141
142@subsection Building vasm
143
144Note to users of Open/Free/Any BSD i386 systems: You will probably have to use
145GNU make instead of BSD make, i.e. in the following examples replace "make"
146with "gmake".
147
148    Type:
149@example
150      make CPU=<cpu> SYNTAX=<syntax>
151@end example
152    For example:
153@example
154      make CPU=ppc SYNTAX=std
155@end example
156
157The following CPU modules can be selected:
158@itemize
159@item @code{CPU=6502}
160@item @code{CPU=6800}
161@item @code{CPU=arm}
162@item @code{CPU=c16x}
163@item @code{CPU=jagrisc}
164@item @code{CPU=m68k}
165@item @code{CPU=ppc}
166@item @code{CPU=test}
167@item @code{CPU=tr3200}
168@item @code{CPU=vidcore}
169@item @code{CPU=x86}
170@item @code{CPU=z80}
171@end itemize
172
173The following syntax modules can be selected:
174@itemize
175@item @code{SYNTAX=std}
176@item @code{SYNTAX=mot}
177@item @code{SYNTAX=madmac}
178@item @code{SYNTAX=oldstyle}
179@item @code{SYNTAX=test}
180@end itemize
181
182For Windows and various Amiga targets there are already Makefiles included,
183which you may either copy on top of the default @file{Makefile}, or call
184it explicitely with @command{make}'s @option{-f} option:
185@example
186    make -f Makefile.OS4 CPU=ppc SYNTAX=std
187@end example
188
189
190@section General data structures
191
192This section describes the fundamental data structures used in vasm
193which are usually necessary to understand for writing any kind of
194module (cpu, syntax or output). More detailed information is given in
195the respective sections on writing specific modules where necessary.
196
197@subsection Source
198
199A source structure represents a source text module, which can be
200either the main source text, an included file or a macro. There is
201always a link to the parent source from where the current source context
202was included or called.
203
204@table @code
205@item struct source *parent;
206        Pointer to the parent source context. Assembly continues there
207        when the current source context ends.
208
209@item int parent_line;
210        Line number in the parent source context, from where we were called.
211        This information is needed, because line numbers are only reliable
212        during parsing and later from the atoms. But an include directive
213        doesn't create an atom.
214
215@item char *name;
216        File name of the main source or include file, or macro name.
217
218@item char *text;
219        Pointer to the source text start.
220
221@item size_t size;
222        Size of the source text to assemble in bytes.
223
224@item macro *macro;
225        Pointer to macro structure, when currently inside a macro
226        (see also @code{num_params}).
227
228@item unsigned long repeat;
229        Number of repetitions of this source text. Usually this is 1, but
230        for text blocks between a @code{rept} and @code{endr} directive
231        it allows any number of repetitions, which is decremented everytime
232        the end of this source text block is reached.
233
234@item char *irpname;
235        Name of the iterator symbol in special repeat loops which use a
236        sequence of arbitrary values, being assigned to this symbol within
237        the loop. Example: @code{irp} directive in std-syntax.
238
239@item struct macarg *irpvals;
240        A list of arbitrary values to iterate over in a loop. With each
241        iteration the frontmost value is removed from the list until it is
242        empty.
243
244@item int cond_level;
245        Current level of conditional nesting while entering this source
246        text. It is automatically restored to the previous level when
247        leaving the source prematurely through @code{end_source()}.
248
249@item struct macarg *argnames;
250        The current list of named macro arguments.
251
252@item int num_params;
253        Number of macro parameters passed at the invocation point from
254        the parent source. For normal source files this entry will be -1.
255        For macros 0 (no parameters) or higher.
256
257@item char *param[MAXMACPARAMS];
258        Pointer to the macro parameters.
259
260@item int param_len[MAXMACPARAMS];
261        Number of characters per macro parameter.
262
263@item int num_quals;
264        (If @code{MAX_QUALIFIERS!=0}.) Number of qualifiers for a macro.
265        when not passed on invocation these are the default qualifiers.
266
267@item char *qual[MAX_QUALIFIERS];
268        (If @code{MAX_QUALIFIERS!=0}.) Pointer to macro qualifiers.
269
270@item int qual_len[MAX_QUALIFIERS];
271        (If @code{MAX_QUALIFIERS!=0}.) Number of characters per macro qualifier.
272
273@item unsigned long id;
274        Every source has its unique id. Useful for macros supporting
275        the special @code{\@@} parameter.
276
277@item char *srcptr;
278        The current source text pointer, pointing to the beginning of
279        the next line to assemble.
280
281@item int line;
282        Line number in the current source context. After parsing the
283        line number of the current atom is stored here.
284
285@item size_t bufsize;
286        Current size of the line buffer (@code{linebuf}). The size of the
287        line buffer is extended automatically, when an overflow happens.
288
289@item char *linebuf;
290        A buffer for the current line being assembled
291        in this source text. A child-source, like a macro, can refer to
292        arguments from this buffer, so every source has got its own.
293        When returning to the parent source, the linebuf is deallocated
294        to save memory.
295
296@item expr *cargexp;
297        (If @code{CARGSYM} was defined.) Pointer to the current expression
298        assigned to the CARG-symbol (used to select a macro argument) in
299        this source instance. So it can be restored when reentering this
300        instance.
301
302@item long reptn;
303        (If @code{REPTNSYM} was defined.) Current value of the repetition
304        counter symbol in this source instance. So it can be restored when
305        reentering this instance.
306@end table
307
308@subsection Sections
309
310One of the top level structures is a linked list of sections describing
311continuous blocks of memory. A section is specified by an object of
312type @code{section} with the following members that can be accessed by
313the modules:
314
315@table @code
316@item  struct section *next;
317        A pointer to the next section in the list.
318
319@item  char *name;
320        The name of the section.
321
322@item  char *attr;
323        A string describing the section flags in ELF notation (see,
324        for example, documentation o the @code{.section} directive of
325        the standard syntax mopdule.
326
327@item  atom *first;
328@itemx atom *last;
329        Pointers to the first and last atom of the section. See following
330        sections for information on atoms.
331
332@item  taddr align;
333        Alignment of the section in bytes.
334
335@item  uint32_t flags;
336        Flags of the section. Currently available flags are:
337@table @code
338@item HAS_SYMBOLS
339        At least one symbol is defined in this section.
340@item RESOLVE_WARN
341        The current atom changed its size multiple times, so atom_size()
342        is now called with this flag set in its section to make the
343        backend (e.g. @code{instruction_size()}) aware of it and do less
344        aggressive optimizations.
345@item UNALLOCATED
346        Section is unallocated, which means it doesn't use any memory space
347        in the output file. Such a section will be removed before creating
348        the output file and all its labels converted into absolute expression
349        symbols. Used for "offset" sections. Refer to
350        @code{switch_offset_section()}.
351@item LABELS_ARE_LOCAL
352        As long as this flag is set new labels in a section are defined
353        as local labels, with the section name as global parent label.
354@item ABSOLUTE
355        Section is loaded at an absolute address in memory.
356@item PREVABS
357        Remembers state of the @code{ABSOLUTE} flag before entering
358        relocated-org mode (@code{IN_RORG}). So it can be restored later.
359@item IN_RORG
360        Section has entered relocated-org mode, which also sets the
361        @code{ABSOLUTE} flag. In this mode code is written into the current
362        section, but relocated to an absolute address. No relocation
363        information are generated.
364@item NEAR_ADDRESSING
365        Section is marked as suitable for cpu-specific "near" addressing
366        modes. For example, base-register relative. The cpu backend can use
367        this information as an optimization hint when referencing symbols
368        from this section.
369@end table
370
371@item  taddr org;
372        Start address of a section. Usually zero.
373
374@item  taddr pc;
375        Current address in this section. Can be used
376        while traversing through the section. Has to be updated by a
377        module using it. Is set to @code{org} at the beginning.
378
379@item   unsigned long idx;
380        A member usable by the output module for private purposes.
381
382@end table
383
384@subsection Symbols
385
386Symbols are represented by a linked list of type @code{symbol} with the
387following members that can be accessed by the modules:.
388
389@table @code
390
391@item  int type;
392        Type of the symbol. Available are:
393@table @code
394@item #define LABSYM 1
395        The symbol is a label defined at a specific location.
396
397@item #define IMPORT 2
398        The symbol is imported from another file.
399
400@item #define EXPRESSION 3
401        The symbol is defined using an expression.
402@end table
403
404@item  uint32_t flags;
405        Flags of this symbol. Available are:
406@table @code
407@item #define TYPE_UNKNOWN  0
408        The symbol has no type information.
409
410@item #define TYPE_OBJECT   1
411        The symbol defines an object.
412
413@item #define TYPE_FUNCTION 2
414        The symbol defines a function.
415
416@item #define TYPE_SECTION  3
417        The symbol defines a section.
418
419@item #define TYPE_FILE     4
420      The symbol defines a file.
421
422@item #define EXPORT (1<<3)
423        The symbol is exported to other files.
424
425@item #define INEVAL (1<<4)
426        Used internally.
427
428@item #define COMMON (1<<5)
429        The symbol is a common symbol.
430
431@item #define WEAK (1<<6)
432        The symbol is weak, which means the linker may overwrite it with
433        any global definition of the same name. Weak symbols may also stay
434        undefined, in which case the linker would assign them a value of
435        zero.
436
437@item #define LOCAL (1<<7)
438        Only informational. A symbol can be explicitely declared as local
439        by a syntax-module directive.
440
441@item #define VASMINTERN (1<<8)
442        Vasm-internal symbol, which is usually not exported into an output
443        file.
444
445@item #define PROTECTED (1<<9)
446        Used internally to protect the current-PC symbol from deletion.
447
448@item #define REFERENCED (1<<10)
449        Symbol was referenced in the source and a relocation entry has
450        been created.
451
452@item #define ABSLABEL (1<<11)
453        Label was defined inside an absolute section, or during
454        relocated-org mode. So it has an absolute address and will not
455        generate a relocation entry when being referenced.
456
457@item #define EQUATE (1<<12)
458        Symbols flagged as @code{EQUATE} are constant and its value must
459        not be changed.
460
461@item #define REGLIST (1<<13)
462        Symbol is a register list definition.
463
464@item #define USED (1<<14)
465        Symbol appeared in an expression. Symbols which were only defined,
466        (as label or equte) and never used throughout the whole source,
467        don't get this flag set.
468
469@item #define NEAR (1<<15)
470        Symbol may be referenced by "near" addressing mode. For example,
471        base register relative. Used as an optimization hint in the cpu
472        backend.
473
474@item #define RSRVD_S (1L<<24)
475        The range from bit 24 to 27 (counted from the LSB) is reserved for
476        use by the syntax module.
477
478@item #define RSRVD_O (1L<<28)
479        The range from bit 28 to 31 (counted from the LSB) is reserved for
480        use by the output module.
481@end table
482
483The type-flags can be extracted using the @code{TYPE()} macro which
484expects a pointer to a symbol as argument.
485
486@item  char *name;
487        The name of the symbol.
488
489@item   expr *expr;
490        The expression in case of @code{EXPRESSION} symbols.
491
492@item   expr *size;
493        The size of the symbol, if specified.
494
495@item  section *sec;
496        The section a @code{LABSYM} symbol is defined in.
497
498@item  taddr pc;
499        The address of a @code{LABSYM} symbol.
500
501@item  taddr align;
502        The alignment of the symbol in bytes.
503
504@item  unsigned long idx;
505        A member usable by the output module for private purposes.
506
507@end table
508
509@subsection Register symbols
510
511Optional register symbols are available when the backend defines
512@code{HAVE_REGSYMS} in @file{cpu.h} together with the hash table size.
513Example:
514@example
515#define HAVE_REGSYMS
516#define REGSYMHTSIZE 256
517@end example
518
519A register symbol is defined by an object of type @code{regsym}
520with the following members that can be accessed by the modules:
521
522@table @code
523@item char *reg_name;
524      Symbol name.
525@item int reg_type;
526      Optional type of register.
527@item unsigned int reg_flags;
528      Optional register symbol flags.
529@item unsigned int reg_num;
530      Register number or value.
531@end table
532
533Refer to @file{symbol.h} for functions to create and find register
534symbols.
535
536@subsection Atoms
537
538The contents of each section are a linked list built out of non-separable
539atoms. The general structure of an atom is:
540
541@example
542typedef struct atom @{
543  struct atom *next;
544  int type;
545  taddr align;
546  taddr lastsize;
547  unsigned changes;
548  source *src;
549  int line;
550  listing *list;
551  union @{
552    instruction *inst;
553    dblock *db;
554    symbol *label;
555    sblock *sb;
556    defblock *defb;
557    void *opts;
558    int srcline;
559    char *ptext;
560    printexpr *pexpr;
561    expr *roffs;
562    taddr *rorg;
563    assertion *assert;
564    aoutnlist *nlist;
565  @} content;
566@} atom;
567@end example
568
569The members have the following meaning:
570
571@table @code
572@item  struct atom *next;
573Pointer to the following atom (0 if last).
574
575@item  int type;
576The type of the atom. Can be one of
577@table @code
578@item #define LABEL 1
579A label is defined here.
580
581@item #define DATA  2
582Some data bytes of fixed length and constant data are put here.
583
584@item #define INSTRUCTION 3
585Generally refers to a machine instruction or pseudo/opcode. These atoms
586can change length during optimization passes and will be translated to
587@code{DATA}-atoms later.
588
589@item #define SPACE 4
590Defines a block of data filled with one value (byte). BSS sections usually
591contain only such atoms, but they are also sometimes useful as shorter
592versions of @code{DATA}-atoms in other sections.
593
594@item #define DATADEF 5
595Defines data of fixed size which can contain cpu specific operands and
596expressions. Will be translated to @code{DATA}-atoms later.
597
598@item #define LINE 6
599A source text line number (usually from a high level language) is bound
600to the atom's address. Useful for source level debugging in certain ABIs.
601
602@item #define OPTS 7
603A means to change assembler options at a specific source text line.
604For example optimization settings, or the cpu type to generate code for.
605The cpu module has to define @code{HAVE_CPU_OPTS} and export the required
606functions if it wants to use this type of atom.
607
608@item #define PRINTTEXT 8
609A string is printed to stdout during the final assembler pass. A newline
610is automatically appended.
611
612@item #define PRINTEXPR 9
613Prints the value of an expression during the final assembler pass to stdout.
614
615@item #define ROFFS 10
616Set the program counter to an address relative to the section's start
617address. These atoms will be translated into @code{SPACE} atoms in the
618final pass.
619
620@item #define RORG 11
621Assemble this block under the given base address, while the code is still
622written into the original memory region.
623
624@item #define RORGEND 12
625Ends a RORG block and returns to the original addessing.
626
627@item #define ASSERT 13
628The assertion expression is checked in the final pass and an error message
629is generated (using the expression string and an optional message out of
630this atom) when it evaluates to 0.
631
632@item #define NLIST 14
633Defines a stab-entry for the a.out object file format. nlist-style stabs
634can also occur embedded in other object file formats, like ELF.
635@end table
636
637@item taddr align;
638The alignment of this atom. Address must be dividable by @code{align}.
639
640@item taddr lastsize;
641The size of this atom in the last resolver pass. When the size has
642changed in the current pass, the assembler will request another resolver
643run through the section.
644
645@item unsigned changes;
646Number of changes in the size of this atom since pass number
647@code{FASTOPTPHASE}. An increasing number usually indicates a problem in
648the cpu backend's optimizer and will be flagged by setting
649@code{RESOLVE_WARN} in the Section flags, as soon as @code{changes} exceeds
650@code{MAXSIZECHANGES}. So the backend can choose not to optimize this atom
651as aggressive as before.
652
653@item source *src;
654Pointer to the source text object to which this atom belongs.
655
656@item  int line;
657The source line number that created this atom.
658
659@item listing *list;
660Pointer to the listing object to which this atoms belong.
661
662@item    instruction *inst;
663(In union @code{content}.) Pointer to an instruction structure in the case
664of an @code{INSTRUCTION}-atom. Contains the following elements:
665@table @code
666@item  int code;
667The cpu specific code of this instruction.
668
669@item  char *qualifiers[MAX_QUALIFIERS];
670(If @code{MAX_QUALIFIERS!=0}.) Pointer to the qualifiers of this instruction.
671
672@item  operand *op[MAX_OPERANDS];
673(If @code{MAX_OPERANDS!=0}.) The cpu-specific operands of this instruction.
674
675@item  instruction_ext ext;
676(If the cpu module defines @code{HAVE_INSTRUCTION_EXTENSION}.)
677A cpu-module-specific structure. Typically used to store appropriate
678opcodes, allowed addressing modes, supported cpu derivates etc.
679@end table
680
681@item    dblock *db;
682(In union @code{content}.) Pointer to a dblock structure in the case
683of a @code{DATA}-atom. Contains the following elements:
684@table @code
685@item  taddr size;
686The number of bytes stored in this atom.
687
688@item  char *data;
689A pointer to the data.
690
691@item  rlist *relocs;
692A pointer to relocation information for the data.
693@end table
694
695@item    symbol *label;
696(In union @code{content}.) Pointer to a symbol structure in the case
697of a @code{LABEL}-atom.
698
699@item    sblock *sb;
700(In union @code{content}.) Pointer to a sblock structure in the case
701of a @code{SPACE}-atom. Contains the following elements:
702@table @code
703@item  taddr space;
704The size of the empty/filled space in bytes.
705
706@item expr *space_exp;
707The above size as an expression, which will be evaluated during assembly
708and copied to @code{space} in the final pass.
709
710@item  int size;
711The size of each space-element and of the fill-pattern in bytes.
712
713@item  unsigned char fill[MAXBYTES];
714The fill pattern, up to MAXBYTES bytes.
715
716@item expr *fill_exp;
717Optional. Evaluated and copied to @code{fill} in the final pass, when not null.
718
719@item rlist *relocs;
720A pointer to relocation information for the space.
721
722@item taddr maxalignbytes;
723An optional number of maximum padding bytes to fulfil the atom's alignment
724requirement. Zero means there is no restriction.
725@end table
726
727@item    defblock *defb;
728(In union @code{content}.) Pointer to a defblock structure in the case
729of a @code{DATADEF}-atom. Contains the following elements:
730@table @code
731@item  taddr bitsize;
732The size of the definition in bits.
733
734@item  operand *op;
735Pointer to a cpu-specific operand structure.
736
737@end table
738
739@item    void *opts;
740(In union @code{content}.) Points to a cpu module specific options object
741in the case of a @code{OPTS}-atom.
742
743@item    int srcline;
744(In union @code{content}.) Line number for source level debugging in the
745case of a @code{LINE}-atom.
746
747@item    char *ptext;
748(In union @code{content}.) A string to print to stdout in case of a
749@code{PRINTTEXT}-atom.
750
751@item    printexpr *pexpr;
752(In union @code{content}.) Pointer to a printexpr structure in the case of
753a @code{PRINTEXPR}-atom. Contains the following elements:
754@table @code
755@item expr *print_exp;
756Pointer to an expression to evaluate and print.
757
758@item short type;
759Format type of the printed value. We can print as hexadecimal
760(@code{PEXP_HEX}), signed decimal (@code{PEXP_SDEC}),
761unsigned decimal (@code{PEXP_UDEC}), binary (@code{PEXP_BIN}) OR
762ASCII (@code{PEXP_ASC}).
763
764@item short size;
765Size (precision) of the printed value in bits. Excessive bits will be
766masked out, and sign-extended when requested.
767@end table
768
769@item    expr *roffs;
770(In union @code{content}.) The expression holds the relative section offset
771to align to in case of a @code{ROFFS}-atom.
772
773@item    taddr *rorg;
774(In union @code{content}.) Assemble the code under the base address in
775@code{rorg} in case of a @code{RORG}-atom.
776
777@item    assertion *assert;
778(In union @code{content}.) Pointer to an assertion structure in the case of
779an @code{ASSERT}-atom. Contains the following elements:
780@table @code
781@item expr *assert_exp;
782Pointer to an expression which should evaluate to non-zero.
783
784@item char *exprstr;
785Pointer to the expression as text (to be used in the output).
786
787@item char *msgstr;
788Pointer to the message, which would be printed when @code{assert_exp} evaluates
789to zero.
790@end table
791
792@item    aoutnlist *nlist;
793(In union @code{content}.) Pointer to an nlist structure, describing an
794aout stab entry, in case of an @code{NLIST}-atom. Contains the following
795elements:
796@table @code
797@item char *name;
798Name of the stab symbol.
799@item int type;
800Symbol type. Refer to @code{stabs.h} for definitions.
801@item int other;
802Defines the nature of the symbol (function, object, etc.).
803@item int desc;
804Debugger information.
805@item expr *value;
806Symbol's value.
807@end table
808
809@end table
810
811@subsection Relocations
812
813@code{DATA} and @code{SPACE} atoms can have a relocation list attached
814that describes how this data must be modified when linking/relocating.
815They always refer to the data in this atom only.
816
817There are a number of predefined standard relocations and it is possible
818to add other cpu-specific relocations. Note however, that it is always
819preferrable to use standard relocations, if possible. Chances that an
820output module supports a certain relocation are much higher if it is a
821standard relocation.
822
823A relocation list uses this structure:
824
825@example
826typedef struct rlist @{
827  struct rlist *next;
828  void *reloc;
829  int type;
830@} rlist;
831@end example
832
833Type identifies the relocation type. All the standard relocations have
834type numbers between @code{FIRST_STANDARD_RELOC} and
835@code{LAST_STANDARD_RELOC}. Consider @file{reloc.h} to see which
836standard relocations are available.
837
838 The detailed information can be accessed
839via the pointer @code{reloc}. It will point to a structure that depends
840on the relocation type, so a module must only use it if it knows the
841relocation type.
842
843All standard relocations point to a type @code{nreloc} with the following
844members:
845@table @code
846@item  size_t byteoffset;
847Offset in bytes, from the start of the current @code{DATA} atom, to the
848beginning of the relocation field. This may also be the address which is
849used as a basis for PC-relative relocations. Or a common basis for several
850separated relocation fields, which will be translated into a single
851relocation type by the output module.
852
853@item  size_t bitoffset;
854Offset in bits to the beginning of the relocation field, adds to
855@code{byteoffset*bitsperbyte}. Bits are counted in a bit-stream from lower
856to higher address bytes. But note, that inside a little-endian byte they
857are counted from the LSB to the MSB, while they are counted from the MSB to
858the LSB for big-endian targets.
859
860@item  int size;
861The size of the relocation field in bits.
862
863@item  taddr mask;
864The mask defines which portion of the relocated value is set by this
865relocation field.
866
867@item taddr addend;
868Value to be added to the symbol value.
869
870@item  symbol *sym;
871The symbol referred by this relocation
872
873@end table
874
875To describe the meaning of these entries, we will define the steps that
876shall be executed when performing a relocation:
877
878@enumerate 1
879@item Extract the @code{size} bits from the data atom, starting with bit
880        number @code{byteoffset*bitsperbyte+bitoffset}. We start counting
881        bits from the lowest to the highest numbered byte in memory.
882        Inside a big-endian byte we count from the MSB to the LSB. Inside
883        a little-endian byte we count from the LSB to the MSB.
884
885@item Determine the relocation value of the symbol. For a simple absolute
886        relocation, this will be the value of the symbol @code{sym} plus
887        the @code{addend}. For other relocation types, more complex
888        calculations will be needed.
889        For example, in a program-counter relative relocation,
890        the value will be obtained by subtracting the address of the data
891        atom plus @code{byteoffset} from the value
892        of @code{sym} plus @code{addend}.
893
894@item Calculate the bit-wise "and" of the value obtained in the step above
895        and the @code{mask} value.
896
897@item Normalize, i.e. shift the value above right as many bit positions as
898        there are low order zero bits in @code{mask}.
899
900@item Add this value to the value extracted in step 1.
901
902@item Insert the low order @code{size} bits of this value into the data atom
903        starting with bit @code{byteoffset*bitsperbyte+bitoffset}.
904@end enumerate
905
906
907@subsection Errors
908
909Each module can provide a list of possible error messages contained
910e.g. in @file{syntax_errors.h} or @file{cpu_errors.h}. They are a
911comma-separated list of a printf-format string and error flags. Allowed
912flags are @code{WARNING}, @code{ERROR}, @code{FATAL}, @code{MESSAGE} and
913@code{NOLINE}.
914They can be combined using or (@code{|}). @code{NOLINE} has to be set for
915error messages during initialiation or while writing the output, when
916no source text is available. Errors cause the assembler to return false.
917@code{FATAL} causes the assembler to terminate
918immediately.
919
920The errors can be emitted using the function @code{syntax_error(int n,...)},
921@code{cpu_error(int n,...)} or @code{output_error(int n,...)}. The first
922argument is the number of the error message (starting from zero). Additional
923arguments must be passed according to the format string of the
924corresponding error message.
925
926@section Syntax modules
927
928A new syntax module must have its own subdirectory under @file{vasm/syntax}.
929At least the files @file{syntax.h}, @file{syntax.c} and @file{syntax_errors.h}
930must be written.
931
932@subsection The file @file{syntax.h}
933
934@table @code
935
936@item #define ISIDSTART(x)/ISIDCHAR(x)
937These macros should return non-zero if and only if the argument is a
938valid character to start an identifier or a valid character inside an
939identifier, respectively.
940@code{ISIDCHAR} must be a superset of @code{ISIDSTART}.
941
942@item #define ISBADID(p,l)
943Even with @code{ISIDSTART} and @code{ISIDCHAR} checked, there may be
944combinations of characters which do not form a valid initializer (for
945example, a single character). This macro returns non-zero, when this is
946the case. First argument is a pointer to the new identifier and second
947is its length.
948
949@item #define ISEOL(x)
950This macro returns true when the string pointing at @code{x} is either
951a comment character or end-of-line.
952
953@item #define CHKIDEND(s,e) chkidend((s),(e))
954Defines an optional function to be called at the end of the identifier
955recognition process. It allows you to adjust the length of the identifier
956by returning a modified @code{e}. Default is to return @code{e}. The
957function is defined as @code{char *chkidend(char *startpos,char *endpos)}.
958
959@item #define BOOLEAN(x) -(x)
960Defines the result of boolean operations. Usually this is @code{(x)}, as
961in C, or @code{-(x)} to return -1 for True.
962
963@item #define NARGSYM "NARG"
964Defines the name of an optional symbol which contains the number of
965arguments in a macro.
966
967@item #define CARGSYM "CARG"
968Defines the name of an optional symbol which can be used to select a
969specific macro argument with @code{\.}, @code{\+} and @code{\-}.
970
971@item #define REPTNSYM "REPTN"
972Defines the name of an optional symbol containing the counter of the
973current repeat iteration.
974
975@item #define EXPSKIP() s=exp_skip(s)
976Defines an optional replacement for skip() to be used in expr.c, to skip
977blanks in an expression. Useful to forbid blanks in an expression and to
978ignore the rest of the line (e.g. to treat the rest as comment). The
979function is defined as @code{char *exp_skip(char *stream)}.
980
981@item #define IGNORE_FIRST_EXTRA_OP 1
982Should be defined when the syntax module wants to ignore the operand field
983on instructions without an operand. Useful, when everything following
984an operand should be regarded as comment, without a comment character.
985
986@item #define MAXMACPARAMS 35
987Optionally defines the maximum number of macro arguments, if you need more than
988the default number of 9.
989
990@item #define SKIP_MACRO_ARGNAME(p) skip_identifier(p)
991An optional function to skip a named macro argument in the macro
992definition.
993Argument is the current source stream pointer.
994The default is to skip an identifier.
995
996@item #define MACRO_ARG_OPTS(m,n,a,p) NULL
997An optional function to parse and skip options, default values and
998qualifiers for each macro argument. Returns @code{NULL} when no argument
999options have been found.
1000Arguments are:
1001  @table @code
1002    @item struct macro *m;
1003      Pointer to the macro structure being currently defined.
1004    @item int n;
1005      Argument index, starting with zero.
1006    @item char *a;
1007      Name of this argument.
1008    @item char *p;
1009      Current source stream pointer. An updated pointer will be returned.
1010  @end table
1011Defaults to unused.
1012
1013@item #define MACRO_ARG_SEP(p) (*p==',' ? skip(p+1) : NULL)
1014An optional function to skip a separator between the macro argument
1015names in the macro definition. Returns NULL when no valid separator is
1016found.
1017Argument is the current source stream pointer.
1018Defaults to using comma as the only valid separator.
1019
1020@item #define MACRO_PARAM_SEP(p) (*p==',' ? skip(p+1) : NULL)
1021An optional function to skip a separator between the macro parameters
1022in a macro call. Returns NULL when no valid separator is found.
1023Argument is the current source stream pointer.
1024Defaults to using comma as the only valid separator.
1025
1026@item #define EXEC_MACRO(s)
1027An optional function to be called just before a macro starts execution.
1028Parameters and qualifiers are already parsed.
1029Argument is the @code{source} pointer of the new macro.
1030Defaults to unused.
1031
1032@end table
1033
1034@subsection The file @file{syntax.c}
1035
1036A syntax module has to provide the following elements (all other funtions
1037should be @code{static} to prevent name clashes):
1038
1039@table @code
1040
1041@item char *syntax_copyright;
1042A string that will be emitted as part of the copyright message.
1043
1044@item hashtable *dirhash;
1045A pointer to the hash table with all directives.
1046
1047@item char commentchar;
1048A character used to introduce a comment until the end of the line.
1049
1050@item char *defsectname;
1051Name of a default section which vasm creates when a label or code occurs
1052in the source, but the programmer forgot to specify a section. Assigning
1053NULL means that there is no default and vasm will show an error in this
1054case.
1055
1056@item char *defsecttype;
1057Type of the default section (see above). May be NULL.
1058
1059@item int init_syntax();
1060Will be called during startup, after argument parsing Must return zero if
1061initializations failed, non-zero otherwise.
1062
1063@item int syntax_args(char *);
1064This function will be called with the command line arguments (unless they
1065were already recognized by other modules). If an argument was recognized,
1066return non-zero.
1067
1068@item char *skip(char *);
1069A function to skip whitespace etc.
1070
1071@item char *skip_operand(char *);
1072A function to skip an instruction's operand. Will terminate at end of line
1073or the next comma, returning a pointer to the rest of the line behind
1074the comma.
1075
1076@item void eol(char *);
1077This function should check that the argument points to the end of a line
1078(only comments or whitespace following). If not, an error or warning
1079message should be omitted.
1080
1081@item char *const_prefix(char *,int *);
1082Check if the first argument points to the start of a constant. If yes
1083return a pointer to the real start of the number (i.e. skip a prefix
1084that may indicate the base) and write the base of the number through the
1085pointer passed as second argument. Return zero if it does not point to a
1086number.
1087
1088@item char *const_suffix(char *,char *);
1089First argument points to the start of the constant (including prefix) and
1090the second argument to first character after the constant (excluding suffix).
1091Checks for a constant-suffix and skips it. Return pointer to the first
1092character after that constant. Example: constants with a 'h' suffix to
1093indicate a hexadecimal base.
1094
1095@item void parse(void);
1096This is the main parsing function. It has to read lines via
1097the @code{read_next_line()} function, parse them and create sections,
1098atoms and symbols. Pseudo directives are usually handled by the syntax
1099module. Instructions can be parsed by the cpu module using
1100@code{parse_instruction()}.
1101
1102@item char *parse_macro_arg(struct macro *,char *,struct namelen *,struct namelen *);
1103Called to parse a macro parameter by using the source stream pointer in
1104the second argument. The start pointer and length of a single passed
1105parameter is written to the first @code{struct namelen}, while the optionally
1106selected named macro argument is passed in the second @code{struct namelen}.
1107When the @code{len} field of the second @code{namelen} is zero, then the
1108argument is selected by position instead by name. Returns the updated
1109source stream pointer after successful parsing.
1110
1111@item int expand_macro(source *,char **,char *,int);
1112Expand parameters and special commands inside a macro source. The second
1113argument is a pointer to the current source stream pointer, which is
1114updated on any succesful expansion. The function will return the
1115number of characters written to the destination buffer (third argument)
1116in this case. Returning @code{-1} means: no expansion took place.
1117The last argument defines the space in characters which is left in the
1118destination buffer.
1119
1120@item char *get_local_label(char **);
1121Gets a pointer to the current source pointer. Has to check if a valid
1122local label is found at this point. If yes return a pointer to the
1123vasm-internal symbol name representing the local label and update
1124the current source pointer to point behind the label.
1125
1126Have a look at the support functions provided by the frontend to help.
1127
1128@end table
1129
1130@section CPU modules
1131
1132A new cpu module must have its own subdirectory under @file{vasm/cpus}.
1133At least the files @file{cpu.h}, @file{cpu.c} and @file{cpu_errors.h}
1134must be written.
1135
1136@subsection The file @file{cpu.h}
1137
1138A cpu module has to provide the following elements (all other functions
1139should be @code{static} to prevent name clashes) in @code{cpu.h}:
1140
1141@table @code
1142@item #define MAX_OPERANDS 3
1143Maximum number of operands of one instruction.
1144
1145@item #define MAX_QUALIFIERS 0
1146Maximum number of mnemonic-qualifiers per mnemonic.
1147
1148@item #define NO_MACRO_QUALIFIERS
1149Define this, when qualifiers shouldn't be allowed for macros. For some
1150architectures, like ARM, macro qualifiers make no sense.
1151
1152@item typedef int32_t taddr;
1153Data type to represent a target-address. Preferrably use the ones from
1154@file{stdint.h}.
1155
1156@item typedef uint32_t utaddr;
1157Unsigned data type to represent a target-address.
1158
1159@item #define LITTLEENDIAN 1
1160@itemx #define BIGENDIAN 0
1161Define these according to the target endianess. For CPUs which support big-
1162and little-endian, you may assign a global variable here. So be aware of
1163it, and never use @code{#if BIGENDIAN}, but always @code{if(BIGENDIAN)} in
1164your code.
1165
1166@item #define VASM_CPU_<cpu> 1
1167Insert the cpu specifier.
1168
1169@item #define INST_ALIGN 2
1170Minimum instruction alignment.
1171
1172@item #define DATA_ALIGN(n) ...
1173Default alignment for @code{n}-bit data. Can also be a function.
1174
1175@item #define DATA_OPERAND(n) ...
1176Operand class for n-bit data definitions. Can also be a function.
1177Negative values denote a floating point data definition of -n bits.
1178
1179@item typedef ... operand;
1180Structure to store an operand.
1181
1182@item typedef ... mnemonic_extension;
1183Mnemonic extension.
1184@end table
1185
1186Optional features, which can be enabled by defining the following macros:
1187
1188@table @code
1189@item #define HAVE_INSTRUCTION_EXTENSION 1
1190If cpu-specific data should be added to all instruction atoms.
1191
1192@item typedef ... instruction_ext;
1193Type for the above extension.
1194
1195@item #define NEED_CLEARED_OPERANDS 1
1196Backend requires a zeroed operand structure when calling @code{parse_operand()}
1197for the first time. Defaults to undefined.
1198
1199@item START_PARENTH(x)
1200Valid opening parenthesis for instruction operands. Defaults to @code{'('}.
1201
1202@item END_PARENTH(x)
1203Valid closing parenthesis for instruction operands. Defaults to @code{')'}.
1204
1205@item #define MNEMONIC_VALID(i)
1206An optional function with the arguments @code{(int idx)}. Returns true
1207when the mnemonic with index @code{idx} is valid for the current state of
1208the backend (e.g. it is available for the selected cpu architecture).
1209
1210@item #define MNEMOHTABSIZE 0x4000
1211You can optionally overwrite the default hash table size defined in
1212@file{vasm.h}. May be necessary for larger mnemonic tables.
1213
1214@item #define OPERAND_OPTIONAL(p,t)
1215When defined, this is a function with the arguments
1216@code{(operand *op,int type)}, which returns true when the given operand
1217type (@code{type}) is optional. The function is only called for missing
1218operands and should also initialize @code{op} with default values (e.g. 0).
1219@end table
1220
1221Implementing additional target-specific unary operations is done by defining
1222the following optional macros:
1223
1224@table @code
1225@item #define EXT_UNARY_NAME(s)
1226Should return True when the string in @code{s} points to an operation name
1227we want to handle.
1228
1229@item #define EXT_UNARY_TYPE(s)
1230Returns the operation type code for the string in @code{s}. Note that the
1231last valid standard operation is defined as @code{LAST_EXP_TYPE}, so the
1232target-specific types will start with @code{LAST_EXP_TYPE+1}.
1233
1234@item #define EXT_UNARY_EVAL(t,v,r,c)
1235Defines a function with the arguments @code{(int t, taddr v, taddr *r, int c)}
1236to handle the operation type @code{t} returning an @code{int} to indicate
1237whether this type has been handled or not. Your operation will by applied on
1238the value @code{v} and the result is stored in @code{*r}. The flag @code{c}
1239is passed as 1 when the value is constant (no relocatable addresses involved).
1240
1241@item #define EXT_FIND_BASE(b,e,s,p)
1242Defines a function with the arguments
1243@code{(symbol **b, expr *e, section *s, taddr p)}
1244to save a pointer to the base symbol of expression @code{e} into the
1245symbol pointer, pointed to by @code{b}. The type of this base is given
1246by an @code{int} return code. Further on, @code{e->type} has to checked
1247to be one of the operations to handle.
1248The section pointer @code{s} and the current pc @code{p} are needed to call
1249the standard @code{find_base()} function.
1250@end table
1251
1252@subsection The file @file{cpu.c}
1253
1254A cpu module has to provide the following elements (all other functions
1255and data should be @code{static} to prevent name clashes) in @code{cpu.c}:
1256
1257@table @code
1258@item int bitsperbyte;
1259The number of bits per byte of the target cpu.
1260
1261@item int bytespertaddr;
1262The number of bytes per @code{taddr}.
1263
1264@item mnemonic mnemonics[];
1265The mnemonic table keeps a list of mnemonic names and operand types the
1266assembler will match against using @code{parse_operand()}. It may also
1267include a target specific @code{mnemonic_extension}.
1268
1269@item char *cpu_copyright;
1270A string that will be emitted as part of the copyright message.
1271
1272@item char *cpuname;
1273A string describing the target cpu.
1274
1275@item int init_cpu();
1276Will be called during startup, after argument parsing. Must return zero if
1277initializations failed, non-zero otherwise.
1278
1279@item int cpu_args(char *);
1280This function will be called with the command line arguments (unless they
1281were already recognized by other modules). If an argument was recognized,
1282return non-zero.
1283
1284@item char *parse_cpu_special(char *);
1285This function will be called with a source line as argument and allows
1286the cpu module to handle cpu-specific directives etc. Functions like
1287@code{eol()} and @code{skip()} should be used by the syntax module to
1288keep the syntax consistent.
1289
1290@item operand *new_operand();
1291Allocate and initialize a new operand structure.
1292
1293@item int parse_operand(char *text,int len,operand *out,int requires);
1294Parses the source at @code{text} with length @code{len} to fill the target
1295specific operand structure pointed to by @code{out}. Returns @code{PO_MATCH}
1296when the operand matches the operand-type passed in @code{requires} and
1297@code{PO_NOMATCH} otherwise. When the source is definitely identified as
1298garbage, the function may return @code{PO_CORRUPT} to tell the assembler
1299that it is useless to try matching against any other operand types.
1300Another special case is @code{PO_SKIP}, which is also a match, but skips
1301the next operand from the mnemonic table (because it was already handled
1302together with the current operand).
1303
1304@item taddr instruction_size(instruction *ip, section *sec, taddr pc);
1305Returns the size of the instruction @code{ip} in bytes, which must be
1306identical to the number of bytes written by @code{eval_instruction()}
1307(see below).
1308
1309@item dblock *eval_instruction(instruction *ip, section *sec, taddr pc);
1310Converts the instruction @code{ip} into a DATA atom, including relocations,
1311if necessary.
1312
1313@item dblock *eval_data(operand *op, taddr bitsize, section *sec, taddr pc);
1314Converts a data operand into a DATA atom, including relocations.
1315
1316@item void init_instruction_ext(instruction_ext *);
1317(If @code{HAVE_INSTRUCTION_EXTENSION} is set.)
1318Initialize an instruction extension.
1319
1320@item char *parse_instruction(char *,int *,char **,int *,int *);
1321(If @code{MAX_QUALIFIERS} is greater than 0.)
1322Parses instruction and saves extension locations.
1323
1324@item int set_default_qualifiers(char **,int *);
1325(If @code{MAX_QUALIFIERS} is greater than 0.)
1326Saves pointers and lengths of default qualifiers for the selected CPU and
1327returns the number of default qualifiers. Example: for a M680x0 CPU this
1328would be a single qualifier, called "w". Used by @code{execute_macro()}.
1329
1330@item cpu_opts_init(section *);
1331(If @code{HAVE_CPU_OPTS} is set.)
1332Gives the cpu module the chance to write out @code{OPTS} atoms with
1333initial settings before the first atom is generated.
1334
1335@item cpu_opts(void *);
1336(If @code{HAVE_CPU_OPTS} is set.)
1337Apply option modifications from an @code{OPTS} atom. For example:
1338change cpu type or optimization flags.
1339
1340@item print_cpu_opts(FILE *,void *);
1341(If @code{HAVE_CPU_OPTS} is set.)
1342Called from @code{print_atom()} to print an @code{OPTS} atom's contents.
1343
1344@end table
1345
1346
1347@section Output modules
1348
1349Output modules can be chosen at runtime rather than compile time. Therefore,
1350several output modules are linked into one vasm executable and their
1351structure differs somewhat from syntax and cpu modules.
1352
1353Usually, an output module for some object format @code{fmt} should be contained
1354in a file @file{output_<fmt>.c} (it may use/include other files if necessary).
1355To automatically include this format in the build process, the @file{make.rules}
1356has to be extended. The module should be added to the @code{OBJS} variable
1357at the start of @file{make.rules}. Also, a dependency line should be added
1358(see the existing output modules).
1359
1360An output module must only export a single function which will return
1361pointers to necessary data/functions. This function should have the
1362following prototype:
1363@example
1364int init_output_<fmt>(
1365      char **copyright,
1366      void (**write_object)(FILE *,section *,symbol *),
1367      int (**output_args)(char *)
1368    );
1369@end example
1370
1371In case of an error, zero must be returned.
1372Otherwise, It should perform all necessary initializations, return non-zero
1373and return the following output parameters via the pointers passed as arguments:
1374
1375@table @code
1376@item copyright
1377A pointer to the copyright string.
1378
1379@item write_object
1380A pointer to a function emitting the output. It will be called after the
1381assembler has completed and will receive pointers to the output file,
1382to the first section of the section list and to the first symbol
1383in the symbol list. See the section on general data structures for further
1384details.
1385
1386
1387@item output_args
1388A pointer to a function checking arguments. It will be called with all
1389command line arguments (unless already handled by other modules). If the
1390output module recognizes an appropriate option, it has to handle it
1391and return non-zero. If it is not an option relevant to this output module,
1392zero must be returned.
1393
1394@end table
1395
1396At last, a call to the @code{output_init_<fmt>} has to be added in the
1397@code{init_output()} function in @file{vasm.c} (should be self-explanatory).
1398
1399Some remarks:
1400@itemize @minus
1401
1402@item
1403Some output modules can not handle all supported CPUs. Nevertheless,
1404they have to be written in a way that they can be compiled. If code
1405references CPU-specifics, they have to be enclosed in
1406@code{#ifdef VASM_CPU_MYCPU} ... @code{#endif} or similar.
1407
1408Also, if the selected CPU is not supported, the init function should fail.
1409
1410@item
1411Error/warning messages can be emitted with the @code{output_error} function.
1412As all output modules are linked together, they have a common list of error
1413messages in the file @file{output_errors.h}. If a new message is needed, this
1414file has to be extended (see the section on general data structures for
1415details).
1416
1417@item
1418@command{vasm} has a mechanism to specify rather complex relocations in a
1419standard way (see the section on general data structures). They can be
1420extended with CPU specific relocations, but usually CPU modules will
1421try to create standard relocations (sometimes several standard relocations
1422can be used to implement a CPU specific relocation). An output
1423module should try to find appropriate relocations supported by the
1424object format. The goal is to avoid special CPU specific
1425relocations as much as possible.
1426
1427@end itemize
1428
1429Volker Barthelmann                                      vb@@compilers.de
1430
1431@bye
1432