1 2@section Introduction 3 4This chapter is under construction! 5 6 7This chapter describes some of the internals of @command{vasm} 8 and tries to explain 9what has to be done to write a cpu module, a syntax module 10or an output module for @command{vasm}. 11However if someone wants to write one, I suggest to contact me first, 12so that it can be integrated into the source tree. 13 14Note that this documentation may mention explicit values when introducing 15symbolic constants. This is due to copying and pasting from the source 16code. These values may not be up to date and in some cases can be overridden. 17Therefore do never use the absolute values but rather the symbolic 18representations. 19 20 21@section Building vasm 22 23This section deals with the steps necessary to build the typical 24@command{vasm} executable from the sources. 25 26@subsection Directory Structure 27 28 The vasm-directory contains the following important files and 29 directories: 30@table @file 31@item vasm/ 32The main directory containing the assembler sources. 33 34@item vasm/Makefile 35The Makefile used to build @command{vasm}. 36 37@item vasm/syntax/<syntax-module>/ 38Directories for the syntax modules. 39 40@item vasm/cpus/<cpu-module>/ 41Directories for the cpu modules. 42 43@item vasm/obj/ 44Directory the object modules will be stored in. 45 46@end table 47 48 All compiling is done from the main directory and 49 the executables will be placed there as well. 50 The main assembler for a combination of @code{<cpu>} and 51 @code{<syntax>} will 52 be called @command{vasm<cpu>_<syntax>}. All output modules are 53 usually integrated in every executable and can be selected at 54 runtime. 55 56@subsection Adapting the Makefile 57 58 Before building anything you have to insert correct values for 59 your compiler and operating system in the @file{Makefile}. 60 61@table @code 62 @item TARGET 63 Here you may define an extension which is appended to the executable's 64 name. Useful, if you build various targets in the same directory. 65 66 @item TARGETEXTENSION 67 Defines the file name extension for executable files. Not needed for 68 most operating systems. For Windows it would be @file{.exe}. 69 70 @item CC 71 Here you have to insert a command that invokes an ANSI C 72 compiler you want to use to build vasm. It must support 73 the @option{-I} option the same like e.g. @command{vc} or 74 @command{gcc}. 75 76 @item COPTS 77 Here you will usually define an option like @option{-c} to instruct 78 the compiler to generate an object file. 79 Additional options, like the optimization level, should also be 80 inserted here as well. When the host operating system is different 81 from a Unix (MacOSX and MiNT are Unix), you have to define one of the 82 following preprocessor macros: 83 @table @code 84 @item -DAMIGA 85 AmigaOS (M68k or PPC), MorphOS, AROS. 86 @item -DATARI 87 Atari TOS. 88 @item -DMSDOS 89 CP/M, MS-DOS, Windows. 90 @end table 91 92 @item CCOUT 93 Here you define the option which is used to specify the name of 94 an output file, which is usually @option{-o}. 95 96 @item LD 97 Here you insert a command which starts the linker. This may be the 98 the same as under @code{CC}. 99 100 @item LDFLAGS 101 Here you have to add options which are necessary for linking. 102 E.g. some compilers need special libraries for floating-point. 103 104 @item LDOUT 105 Here you define the option which is used by the linker to specify 106 the output file name. 107 108 @item RM 109 Specify a command to delete a file, e.g. @code{rm -f}. 110@end table 111 112 An example for the Amiga using @command{vbcc} would be: 113@example 114 TARGET = _os3 115 TARGETEXTENSION = 116 CC = vc +aos68k 117 CCOUT = -o 118 COPTS = -c -c99 -cpu=68020 -DAMIGA -O1 119 LD = $(CC) 120 LDOUT = $(CCOUT) 121 LDFLAGS = -lmieee 122 RM = delete force quiet 123@end example 124 125 An example for a typical Unix-installation would be: 126@example 127 TARGET = 128 TARGETEXTENSION = 129 CC = gcc 130 CCOUT = -o 131 COPTS = -c -O2 132 LD = $(CC) 133 LDOUT = $(CCOUT) 134 LDFLAGS = -lm 135 RM = rm -f 136@end example 137 138Open/Net/Free/Any BSD i386 systems will probably require the following 139an additional @option{-D_ANSI_SOURCE} in @code{COPTS}. 140 141 142@subsection Building vasm 143 144Note to users of Open/Free/Any BSD i386 systems: You will probably have to use 145GNU make instead of BSD make, i.e. in the following examples replace "make" 146with "gmake". 147 148 Type: 149@example 150 make CPU=<cpu> SYNTAX=<syntax> 151@end example 152 For example: 153@example 154 make CPU=ppc SYNTAX=std 155@end example 156 157The following CPU modules can be selected: 158@itemize 159@item @code{CPU=6502} 160@item @code{CPU=6800} 161@item @code{CPU=arm} 162@item @code{CPU=c16x} 163@item @code{CPU=jagrisc} 164@item @code{CPU=m68k} 165@item @code{CPU=ppc} 166@item @code{CPU=test} 167@item @code{CPU=tr3200} 168@item @code{CPU=vidcore} 169@item @code{CPU=x86} 170@item @code{CPU=z80} 171@end itemize 172 173The following syntax modules can be selected: 174@itemize 175@item @code{SYNTAX=std} 176@item @code{SYNTAX=mot} 177@item @code{SYNTAX=madmac} 178@item @code{SYNTAX=oldstyle} 179@item @code{SYNTAX=test} 180@end itemize 181 182For Windows and various Amiga targets there are already Makefiles included, 183which you may either copy on top of the default @file{Makefile}, or call 184it explicitely with @command{make}'s @option{-f} option: 185@example 186 make -f Makefile.OS4 CPU=ppc SYNTAX=std 187@end example 188 189 190@section General data structures 191 192This section describes the fundamental data structures used in vasm 193which are usually necessary to understand for writing any kind of 194module (cpu, syntax or output). More detailed information is given in 195the respective sections on writing specific modules where necessary. 196 197@subsection Source 198 199A source structure represents a source text module, which can be 200either the main source text, an included file or a macro. There is 201always a link to the parent source from where the current source context 202was included or called. 203 204@table @code 205@item struct source *parent; 206 Pointer to the parent source context. Assembly continues there 207 when the current source context ends. 208 209@item int parent_line; 210 Line number in the parent source context, from where we were called. 211 This information is needed, because line numbers are only reliable 212 during parsing and later from the atoms. But an include directive 213 doesn't create an atom. 214 215@item char *name; 216 File name of the main source or include file, or macro name. 217 218@item char *text; 219 Pointer to the source text start. 220 221@item size_t size; 222 Size of the source text to assemble in bytes. 223 224@item macro *macro; 225 Pointer to macro structure, when currently inside a macro 226 (see also @code{num_params}). 227 228@item unsigned long repeat; 229 Number of repetitions of this source text. Usually this is 1, but 230 for text blocks between a @code{rept} and @code{endr} directive 231 it allows any number of repetitions, which is decremented everytime 232 the end of this source text block is reached. 233 234@item char *irpname; 235 Name of the iterator symbol in special repeat loops which use a 236 sequence of arbitrary values, being assigned to this symbol within 237 the loop. Example: @code{irp} directive in std-syntax. 238 239@item struct macarg *irpvals; 240 A list of arbitrary values to iterate over in a loop. With each 241 iteration the frontmost value is removed from the list until it is 242 empty. 243 244@item int cond_level; 245 Current level of conditional nesting while entering this source 246 text. It is automatically restored to the previous level when 247 leaving the source prematurely through @code{end_source()}. 248 249@item struct macarg *argnames; 250 The current list of named macro arguments. 251 252@item int num_params; 253 Number of macro parameters passed at the invocation point from 254 the parent source. For normal source files this entry will be -1. 255 For macros 0 (no parameters) or higher. 256 257@item char *param[MAXMACPARAMS]; 258 Pointer to the macro parameters. 259 260@item int param_len[MAXMACPARAMS]; 261 Number of characters per macro parameter. 262 263@item int num_quals; 264 (If @code{MAX_QUALIFIERS!=0}.) Number of qualifiers for a macro. 265 when not passed on invocation these are the default qualifiers. 266 267@item char *qual[MAX_QUALIFIERS]; 268 (If @code{MAX_QUALIFIERS!=0}.) Pointer to macro qualifiers. 269 270@item int qual_len[MAX_QUALIFIERS]; 271 (If @code{MAX_QUALIFIERS!=0}.) Number of characters per macro qualifier. 272 273@item unsigned long id; 274 Every source has its unique id. Useful for macros supporting 275 the special @code{\@@} parameter. 276 277@item char *srcptr; 278 The current source text pointer, pointing to the beginning of 279 the next line to assemble. 280 281@item int line; 282 Line number in the current source context. After parsing the 283 line number of the current atom is stored here. 284 285@item size_t bufsize; 286 Current size of the line buffer (@code{linebuf}). The size of the 287 line buffer is extended automatically, when an overflow happens. 288 289@item char *linebuf; 290 A buffer for the current line being assembled 291 in this source text. A child-source, like a macro, can refer to 292 arguments from this buffer, so every source has got its own. 293 When returning to the parent source, the linebuf is deallocated 294 to save memory. 295 296@item expr *cargexp; 297 (If @code{CARGSYM} was defined.) Pointer to the current expression 298 assigned to the CARG-symbol (used to select a macro argument) in 299 this source instance. So it can be restored when reentering this 300 instance. 301 302@item long reptn; 303 (If @code{REPTNSYM} was defined.) Current value of the repetition 304 counter symbol in this source instance. So it can be restored when 305 reentering this instance. 306@end table 307 308@subsection Sections 309 310One of the top level structures is a linked list of sections describing 311continuous blocks of memory. A section is specified by an object of 312type @code{section} with the following members that can be accessed by 313the modules: 314 315@table @code 316@item struct section *next; 317 A pointer to the next section in the list. 318 319@item char *name; 320 The name of the section. 321 322@item char *attr; 323 A string describing the section flags in ELF notation (see, 324 for example, documentation o the @code{.section} directive of 325 the standard syntax mopdule. 326 327@item atom *first; 328@itemx atom *last; 329 Pointers to the first and last atom of the section. See following 330 sections for information on atoms. 331 332@item taddr align; 333 Alignment of the section in bytes. 334 335@item uint32_t flags; 336 Flags of the section. Currently available flags are: 337@table @code 338@item HAS_SYMBOLS 339 At least one symbol is defined in this section. 340@item RESOLVE_WARN 341 The current atom changed its size multiple times, so atom_size() 342 is now called with this flag set in its section to make the 343 backend (e.g. @code{instruction_size()}) aware of it and do less 344 aggressive optimizations. 345@item UNALLOCATED 346 Section is unallocated, which means it doesn't use any memory space 347 in the output file. Such a section will be removed before creating 348 the output file and all its labels converted into absolute expression 349 symbols. Used for "offset" sections. Refer to 350 @code{switch_offset_section()}. 351@item LABELS_ARE_LOCAL 352 As long as this flag is set new labels in a section are defined 353 as local labels, with the section name as global parent label. 354@item ABSOLUTE 355 Section is loaded at an absolute address in memory. 356@item PREVABS 357 Remembers state of the @code{ABSOLUTE} flag before entering 358 relocated-org mode (@code{IN_RORG}). So it can be restored later. 359@item IN_RORG 360 Section has entered relocated-org mode, which also sets the 361 @code{ABSOLUTE} flag. In this mode code is written into the current 362 section, but relocated to an absolute address. No relocation 363 information are generated. 364@item NEAR_ADDRESSING 365 Section is marked as suitable for cpu-specific "near" addressing 366 modes. For example, base-register relative. The cpu backend can use 367 this information as an optimization hint when referencing symbols 368 from this section. 369@end table 370 371@item taddr org; 372 Start address of a section. Usually zero. 373 374@item taddr pc; 375 Current address in this section. Can be used 376 while traversing through the section. Has to be updated by a 377 module using it. Is set to @code{org} at the beginning. 378 379@item unsigned long idx; 380 A member usable by the output module for private purposes. 381 382@end table 383 384@subsection Symbols 385 386Symbols are represented by a linked list of type @code{symbol} with the 387following members that can be accessed by the modules:. 388 389@table @code 390 391@item int type; 392 Type of the symbol. Available are: 393@table @code 394@item #define LABSYM 1 395 The symbol is a label defined at a specific location. 396 397@item #define IMPORT 2 398 The symbol is imported from another file. 399 400@item #define EXPRESSION 3 401 The symbol is defined using an expression. 402@end table 403 404@item uint32_t flags; 405 Flags of this symbol. Available are: 406@table @code 407@item #define TYPE_UNKNOWN 0 408 The symbol has no type information. 409 410@item #define TYPE_OBJECT 1 411 The symbol defines an object. 412 413@item #define TYPE_FUNCTION 2 414 The symbol defines a function. 415 416@item #define TYPE_SECTION 3 417 The symbol defines a section. 418 419@item #define TYPE_FILE 4 420 The symbol defines a file. 421 422@item #define EXPORT (1<<3) 423 The symbol is exported to other files. 424 425@item #define INEVAL (1<<4) 426 Used internally. 427 428@item #define COMMON (1<<5) 429 The symbol is a common symbol. 430 431@item #define WEAK (1<<6) 432 The symbol is weak, which means the linker may overwrite it with 433 any global definition of the same name. Weak symbols may also stay 434 undefined, in which case the linker would assign them a value of 435 zero. 436 437@item #define LOCAL (1<<7) 438 Only informational. A symbol can be explicitely declared as local 439 by a syntax-module directive. 440 441@item #define VASMINTERN (1<<8) 442 Vasm-internal symbol, which is usually not exported into an output 443 file. 444 445@item #define PROTECTED (1<<9) 446 Used internally to protect the current-PC symbol from deletion. 447 448@item #define REFERENCED (1<<10) 449 Symbol was referenced in the source and a relocation entry has 450 been created. 451 452@item #define ABSLABEL (1<<11) 453 Label was defined inside an absolute section, or during 454 relocated-org mode. So it has an absolute address and will not 455 generate a relocation entry when being referenced. 456 457@item #define EQUATE (1<<12) 458 Symbols flagged as @code{EQUATE} are constant and its value must 459 not be changed. 460 461@item #define REGLIST (1<<13) 462 Symbol is a register list definition. 463 464@item #define USED (1<<14) 465 Symbol appeared in an expression. Symbols which were only defined, 466 (as label or equte) and never used throughout the whole source, 467 don't get this flag set. 468 469@item #define NEAR (1<<15) 470 Symbol may be referenced by "near" addressing mode. For example, 471 base register relative. Used as an optimization hint in the cpu 472 backend. 473 474@item #define RSRVD_S (1L<<24) 475 The range from bit 24 to 27 (counted from the LSB) is reserved for 476 use by the syntax module. 477 478@item #define RSRVD_O (1L<<28) 479 The range from bit 28 to 31 (counted from the LSB) is reserved for 480 use by the output module. 481@end table 482 483The type-flags can be extracted using the @code{TYPE()} macro which 484expects a pointer to a symbol as argument. 485 486@item char *name; 487 The name of the symbol. 488 489@item expr *expr; 490 The expression in case of @code{EXPRESSION} symbols. 491 492@item expr *size; 493 The size of the symbol, if specified. 494 495@item section *sec; 496 The section a @code{LABSYM} symbol is defined in. 497 498@item taddr pc; 499 The address of a @code{LABSYM} symbol. 500 501@item taddr align; 502 The alignment of the symbol in bytes. 503 504@item unsigned long idx; 505 A member usable by the output module for private purposes. 506 507@end table 508 509@subsection Register symbols 510 511Optional register symbols are available when the backend defines 512@code{HAVE_REGSYMS} in @file{cpu.h} together with the hash table size. 513Example: 514@example 515#define HAVE_REGSYMS 516#define REGSYMHTSIZE 256 517@end example 518 519A register symbol is defined by an object of type @code{regsym} 520with the following members that can be accessed by the modules: 521 522@table @code 523@item char *reg_name; 524 Symbol name. 525@item int reg_type; 526 Optional type of register. 527@item unsigned int reg_flags; 528 Optional register symbol flags. 529@item unsigned int reg_num; 530 Register number or value. 531@end table 532 533Refer to @file{symbol.h} for functions to create and find register 534symbols. 535 536@subsection Atoms 537 538The contents of each section are a linked list built out of non-separable 539atoms. The general structure of an atom is: 540 541@example 542typedef struct atom @{ 543 struct atom *next; 544 int type; 545 taddr align; 546 taddr lastsize; 547 unsigned changes; 548 source *src; 549 int line; 550 listing *list; 551 union @{ 552 instruction *inst; 553 dblock *db; 554 symbol *label; 555 sblock *sb; 556 defblock *defb; 557 void *opts; 558 int srcline; 559 char *ptext; 560 printexpr *pexpr; 561 expr *roffs; 562 taddr *rorg; 563 assertion *assert; 564 aoutnlist *nlist; 565 @} content; 566@} atom; 567@end example 568 569The members have the following meaning: 570 571@table @code 572@item struct atom *next; 573Pointer to the following atom (0 if last). 574 575@item int type; 576The type of the atom. Can be one of 577@table @code 578@item #define LABEL 1 579A label is defined here. 580 581@item #define DATA 2 582Some data bytes of fixed length and constant data are put here. 583 584@item #define INSTRUCTION 3 585Generally refers to a machine instruction or pseudo/opcode. These atoms 586can change length during optimization passes and will be translated to 587@code{DATA}-atoms later. 588 589@item #define SPACE 4 590Defines a block of data filled with one value (byte). BSS sections usually 591contain only such atoms, but they are also sometimes useful as shorter 592versions of @code{DATA}-atoms in other sections. 593 594@item #define DATADEF 5 595Defines data of fixed size which can contain cpu specific operands and 596expressions. Will be translated to @code{DATA}-atoms later. 597 598@item #define LINE 6 599A source text line number (usually from a high level language) is bound 600to the atom's address. Useful for source level debugging in certain ABIs. 601 602@item #define OPTS 7 603A means to change assembler options at a specific source text line. 604For example optimization settings, or the cpu type to generate code for. 605The cpu module has to define @code{HAVE_CPU_OPTS} and export the required 606functions if it wants to use this type of atom. 607 608@item #define PRINTTEXT 8 609A string is printed to stdout during the final assembler pass. A newline 610is automatically appended. 611 612@item #define PRINTEXPR 9 613Prints the value of an expression during the final assembler pass to stdout. 614 615@item #define ROFFS 10 616Set the program counter to an address relative to the section's start 617address. These atoms will be translated into @code{SPACE} atoms in the 618final pass. 619 620@item #define RORG 11 621Assemble this block under the given base address, while the code is still 622written into the original memory region. 623 624@item #define RORGEND 12 625Ends a RORG block and returns to the original addessing. 626 627@item #define ASSERT 13 628The assertion expression is checked in the final pass and an error message 629is generated (using the expression string and an optional message out of 630this atom) when it evaluates to 0. 631 632@item #define NLIST 14 633Defines a stab-entry for the a.out object file format. nlist-style stabs 634can also occur embedded in other object file formats, like ELF. 635@end table 636 637@item taddr align; 638The alignment of this atom. Address must be dividable by @code{align}. 639 640@item taddr lastsize; 641The size of this atom in the last resolver pass. When the size has 642changed in the current pass, the assembler will request another resolver 643run through the section. 644 645@item unsigned changes; 646Number of changes in the size of this atom since pass number 647@code{FASTOPTPHASE}. An increasing number usually indicates a problem in 648the cpu backend's optimizer and will be flagged by setting 649@code{RESOLVE_WARN} in the Section flags, as soon as @code{changes} exceeds 650@code{MAXSIZECHANGES}. So the backend can choose not to optimize this atom 651as aggressive as before. 652 653@item source *src; 654Pointer to the source text object to which this atom belongs. 655 656@item int line; 657The source line number that created this atom. 658 659@item listing *list; 660Pointer to the listing object to which this atoms belong. 661 662@item instruction *inst; 663(In union @code{content}.) Pointer to an instruction structure in the case 664of an @code{INSTRUCTION}-atom. Contains the following elements: 665@table @code 666@item int code; 667The cpu specific code of this instruction. 668 669@item char *qualifiers[MAX_QUALIFIERS]; 670(If @code{MAX_QUALIFIERS!=0}.) Pointer to the qualifiers of this instruction. 671 672@item operand *op[MAX_OPERANDS]; 673(If @code{MAX_OPERANDS!=0}.) The cpu-specific operands of this instruction. 674 675@item instruction_ext ext; 676(If the cpu module defines @code{HAVE_INSTRUCTION_EXTENSION}.) 677A cpu-module-specific structure. Typically used to store appropriate 678opcodes, allowed addressing modes, supported cpu derivates etc. 679@end table 680 681@item dblock *db; 682(In union @code{content}.) Pointer to a dblock structure in the case 683of a @code{DATA}-atom. Contains the following elements: 684@table @code 685@item taddr size; 686The number of bytes stored in this atom. 687 688@item char *data; 689A pointer to the data. 690 691@item rlist *relocs; 692A pointer to relocation information for the data. 693@end table 694 695@item symbol *label; 696(In union @code{content}.) Pointer to a symbol structure in the case 697of a @code{LABEL}-atom. 698 699@item sblock *sb; 700(In union @code{content}.) Pointer to a sblock structure in the case 701of a @code{SPACE}-atom. Contains the following elements: 702@table @code 703@item taddr space; 704The size of the empty/filled space in bytes. 705 706@item expr *space_exp; 707The above size as an expression, which will be evaluated during assembly 708and copied to @code{space} in the final pass. 709 710@item int size; 711The size of each space-element and of the fill-pattern in bytes. 712 713@item unsigned char fill[MAXBYTES]; 714The fill pattern, up to MAXBYTES bytes. 715 716@item expr *fill_exp; 717Optional. Evaluated and copied to @code{fill} in the final pass, when not null. 718 719@item rlist *relocs; 720A pointer to relocation information for the space. 721 722@item taddr maxalignbytes; 723An optional number of maximum padding bytes to fulfil the atom's alignment 724requirement. Zero means there is no restriction. 725@end table 726 727@item defblock *defb; 728(In union @code{content}.) Pointer to a defblock structure in the case 729of a @code{DATADEF}-atom. Contains the following elements: 730@table @code 731@item taddr bitsize; 732The size of the definition in bits. 733 734@item operand *op; 735Pointer to a cpu-specific operand structure. 736 737@end table 738 739@item void *opts; 740(In union @code{content}.) Points to a cpu module specific options object 741in the case of a @code{OPTS}-atom. 742 743@item int srcline; 744(In union @code{content}.) Line number for source level debugging in the 745case of a @code{LINE}-atom. 746 747@item char *ptext; 748(In union @code{content}.) A string to print to stdout in case of a 749@code{PRINTTEXT}-atom. 750 751@item printexpr *pexpr; 752(In union @code{content}.) Pointer to a printexpr structure in the case of 753a @code{PRINTEXPR}-atom. Contains the following elements: 754@table @code 755@item expr *print_exp; 756Pointer to an expression to evaluate and print. 757 758@item short type; 759Format type of the printed value. We can print as hexadecimal 760(@code{PEXP_HEX}), signed decimal (@code{PEXP_SDEC}), 761unsigned decimal (@code{PEXP_UDEC}), binary (@code{PEXP_BIN}) OR 762ASCII (@code{PEXP_ASC}). 763 764@item short size; 765Size (precision) of the printed value in bits. Excessive bits will be 766masked out, and sign-extended when requested. 767@end table 768 769@item expr *roffs; 770(In union @code{content}.) The expression holds the relative section offset 771to align to in case of a @code{ROFFS}-atom. 772 773@item taddr *rorg; 774(In union @code{content}.) Assemble the code under the base address in 775@code{rorg} in case of a @code{RORG}-atom. 776 777@item assertion *assert; 778(In union @code{content}.) Pointer to an assertion structure in the case of 779an @code{ASSERT}-atom. Contains the following elements: 780@table @code 781@item expr *assert_exp; 782Pointer to an expression which should evaluate to non-zero. 783 784@item char *exprstr; 785Pointer to the expression as text (to be used in the output). 786 787@item char *msgstr; 788Pointer to the message, which would be printed when @code{assert_exp} evaluates 789to zero. 790@end table 791 792@item aoutnlist *nlist; 793(In union @code{content}.) Pointer to an nlist structure, describing an 794aout stab entry, in case of an @code{NLIST}-atom. Contains the following 795elements: 796@table @code 797@item char *name; 798Name of the stab symbol. 799@item int type; 800Symbol type. Refer to @code{stabs.h} for definitions. 801@item int other; 802Defines the nature of the symbol (function, object, etc.). 803@item int desc; 804Debugger information. 805@item expr *value; 806Symbol's value. 807@end table 808 809@end table 810 811@subsection Relocations 812 813@code{DATA} and @code{SPACE} atoms can have a relocation list attached 814that describes how this data must be modified when linking/relocating. 815They always refer to the data in this atom only. 816 817There are a number of predefined standard relocations and it is possible 818to add other cpu-specific relocations. Note however, that it is always 819preferrable to use standard relocations, if possible. Chances that an 820output module supports a certain relocation are much higher if it is a 821standard relocation. 822 823A relocation list uses this structure: 824 825@example 826typedef struct rlist @{ 827 struct rlist *next; 828 void *reloc; 829 int type; 830@} rlist; 831@end example 832 833Type identifies the relocation type. All the standard relocations have 834type numbers between @code{FIRST_STANDARD_RELOC} and 835@code{LAST_STANDARD_RELOC}. Consider @file{reloc.h} to see which 836standard relocations are available. 837 838 The detailed information can be accessed 839via the pointer @code{reloc}. It will point to a structure that depends 840on the relocation type, so a module must only use it if it knows the 841relocation type. 842 843All standard relocations point to a type @code{nreloc} with the following 844members: 845@table @code 846@item size_t byteoffset; 847Offset in bytes, from the start of the current @code{DATA} atom, to the 848beginning of the relocation field. This may also be the address which is 849used as a basis for PC-relative relocations. Or a common basis for several 850separated relocation fields, which will be translated into a single 851relocation type by the output module. 852 853@item size_t bitoffset; 854Offset in bits to the beginning of the relocation field, adds to 855@code{byteoffset*bitsperbyte}. Bits are counted in a bit-stream from lower 856to higher address bytes. But note, that inside a little-endian byte they 857are counted from the LSB to the MSB, while they are counted from the MSB to 858the LSB for big-endian targets. 859 860@item int size; 861The size of the relocation field in bits. 862 863@item taddr mask; 864The mask defines which portion of the relocated value is set by this 865relocation field. 866 867@item taddr addend; 868Value to be added to the symbol value. 869 870@item symbol *sym; 871The symbol referred by this relocation 872 873@end table 874 875To describe the meaning of these entries, we will define the steps that 876shall be executed when performing a relocation: 877 878@enumerate 1 879@item Extract the @code{size} bits from the data atom, starting with bit 880 number @code{byteoffset*bitsperbyte+bitoffset}. We start counting 881 bits from the lowest to the highest numbered byte in memory. 882 Inside a big-endian byte we count from the MSB to the LSB. Inside 883 a little-endian byte we count from the LSB to the MSB. 884 885@item Determine the relocation value of the symbol. For a simple absolute 886 relocation, this will be the value of the symbol @code{sym} plus 887 the @code{addend}. For other relocation types, more complex 888 calculations will be needed. 889 For example, in a program-counter relative relocation, 890 the value will be obtained by subtracting the address of the data 891 atom plus @code{byteoffset} from the value 892 of @code{sym} plus @code{addend}. 893 894@item Calculate the bit-wise "and" of the value obtained in the step above 895 and the @code{mask} value. 896 897@item Normalize, i.e. shift the value above right as many bit positions as 898 there are low order zero bits in @code{mask}. 899 900@item Add this value to the value extracted in step 1. 901 902@item Insert the low order @code{size} bits of this value into the data atom 903 starting with bit @code{byteoffset*bitsperbyte+bitoffset}. 904@end enumerate 905 906 907@subsection Errors 908 909Each module can provide a list of possible error messages contained 910e.g. in @file{syntax_errors.h} or @file{cpu_errors.h}. They are a 911comma-separated list of a printf-format string and error flags. Allowed 912flags are @code{WARNING}, @code{ERROR}, @code{FATAL}, @code{MESSAGE} and 913@code{NOLINE}. 914They can be combined using or (@code{|}). @code{NOLINE} has to be set for 915error messages during initialiation or while writing the output, when 916no source text is available. Errors cause the assembler to return false. 917@code{FATAL} causes the assembler to terminate 918immediately. 919 920The errors can be emitted using the function @code{syntax_error(int n,...)}, 921@code{cpu_error(int n,...)} or @code{output_error(int n,...)}. The first 922argument is the number of the error message (starting from zero). Additional 923arguments must be passed according to the format string of the 924corresponding error message. 925 926@section Syntax modules 927 928A new syntax module must have its own subdirectory under @file{vasm/syntax}. 929At least the files @file{syntax.h}, @file{syntax.c} and @file{syntax_errors.h} 930must be written. 931 932@subsection The file @file{syntax.h} 933 934@table @code 935 936@item #define ISIDSTART(x)/ISIDCHAR(x) 937These macros should return non-zero if and only if the argument is a 938valid character to start an identifier or a valid character inside an 939identifier, respectively. 940@code{ISIDCHAR} must be a superset of @code{ISIDSTART}. 941 942@item #define ISBADID(p,l) 943Even with @code{ISIDSTART} and @code{ISIDCHAR} checked, there may be 944combinations of characters which do not form a valid initializer (for 945example, a single character). This macro returns non-zero, when this is 946the case. First argument is a pointer to the new identifier and second 947is its length. 948 949@item #define ISEOL(x) 950This macro returns true when the string pointing at @code{x} is either 951a comment character or end-of-line. 952 953@item #define CHKIDEND(s,e) chkidend((s),(e)) 954Defines an optional function to be called at the end of the identifier 955recognition process. It allows you to adjust the length of the identifier 956by returning a modified @code{e}. Default is to return @code{e}. The 957function is defined as @code{char *chkidend(char *startpos,char *endpos)}. 958 959@item #define BOOLEAN(x) -(x) 960Defines the result of boolean operations. Usually this is @code{(x)}, as 961in C, or @code{-(x)} to return -1 for True. 962 963@item #define NARGSYM "NARG" 964Defines the name of an optional symbol which contains the number of 965arguments in a macro. 966 967@item #define CARGSYM "CARG" 968Defines the name of an optional symbol which can be used to select a 969specific macro argument with @code{\.}, @code{\+} and @code{\-}. 970 971@item #define REPTNSYM "REPTN" 972Defines the name of an optional symbol containing the counter of the 973current repeat iteration. 974 975@item #define EXPSKIP() s=exp_skip(s) 976Defines an optional replacement for skip() to be used in expr.c, to skip 977blanks in an expression. Useful to forbid blanks in an expression and to 978ignore the rest of the line (e.g. to treat the rest as comment). The 979function is defined as @code{char *exp_skip(char *stream)}. 980 981@item #define IGNORE_FIRST_EXTRA_OP 1 982Should be defined when the syntax module wants to ignore the operand field 983on instructions without an operand. Useful, when everything following 984an operand should be regarded as comment, without a comment character. 985 986@item #define MAXMACPARAMS 35 987Optionally defines the maximum number of macro arguments, if you need more than 988the default number of 9. 989 990@item #define SKIP_MACRO_ARGNAME(p) skip_identifier(p) 991An optional function to skip a named macro argument in the macro 992definition. 993Argument is the current source stream pointer. 994The default is to skip an identifier. 995 996@item #define MACRO_ARG_OPTS(m,n,a,p) NULL 997An optional function to parse and skip options, default values and 998qualifiers for each macro argument. Returns @code{NULL} when no argument 999options have been found. 1000Arguments are: 1001 @table @code 1002 @item struct macro *m; 1003 Pointer to the macro structure being currently defined. 1004 @item int n; 1005 Argument index, starting with zero. 1006 @item char *a; 1007 Name of this argument. 1008 @item char *p; 1009 Current source stream pointer. An updated pointer will be returned. 1010 @end table 1011Defaults to unused. 1012 1013@item #define MACRO_ARG_SEP(p) (*p==',' ? skip(p+1) : NULL) 1014An optional function to skip a separator between the macro argument 1015names in the macro definition. Returns NULL when no valid separator is 1016found. 1017Argument is the current source stream pointer. 1018Defaults to using comma as the only valid separator. 1019 1020@item #define MACRO_PARAM_SEP(p) (*p==',' ? skip(p+1) : NULL) 1021An optional function to skip a separator between the macro parameters 1022in a macro call. Returns NULL when no valid separator is found. 1023Argument is the current source stream pointer. 1024Defaults to using comma as the only valid separator. 1025 1026@item #define EXEC_MACRO(s) 1027An optional function to be called just before a macro starts execution. 1028Parameters and qualifiers are already parsed. 1029Argument is the @code{source} pointer of the new macro. 1030Defaults to unused. 1031 1032@end table 1033 1034@subsection The file @file{syntax.c} 1035 1036A syntax module has to provide the following elements (all other funtions 1037should be @code{static} to prevent name clashes): 1038 1039@table @code 1040 1041@item char *syntax_copyright; 1042A string that will be emitted as part of the copyright message. 1043 1044@item hashtable *dirhash; 1045A pointer to the hash table with all directives. 1046 1047@item char commentchar; 1048A character used to introduce a comment until the end of the line. 1049 1050@item char *defsectname; 1051Name of a default section which vasm creates when a label or code occurs 1052in the source, but the programmer forgot to specify a section. Assigning 1053NULL means that there is no default and vasm will show an error in this 1054case. 1055 1056@item char *defsecttype; 1057Type of the default section (see above). May be NULL. 1058 1059@item int init_syntax(); 1060Will be called during startup, after argument parsing Must return zero if 1061initializations failed, non-zero otherwise. 1062 1063@item int syntax_args(char *); 1064This function will be called with the command line arguments (unless they 1065were already recognized by other modules). If an argument was recognized, 1066return non-zero. 1067 1068@item char *skip(char *); 1069A function to skip whitespace etc. 1070 1071@item char *skip_operand(char *); 1072A function to skip an instruction's operand. Will terminate at end of line 1073or the next comma, returning a pointer to the rest of the line behind 1074the comma. 1075 1076@item void eol(char *); 1077This function should check that the argument points to the end of a line 1078(only comments or whitespace following). If not, an error or warning 1079message should be omitted. 1080 1081@item char *const_prefix(char *,int *); 1082Check if the first argument points to the start of a constant. If yes 1083return a pointer to the real start of the number (i.e. skip a prefix 1084that may indicate the base) and write the base of the number through the 1085pointer passed as second argument. Return zero if it does not point to a 1086number. 1087 1088@item char *const_suffix(char *,char *); 1089First argument points to the start of the constant (including prefix) and 1090the second argument to first character after the constant (excluding suffix). 1091Checks for a constant-suffix and skips it. Return pointer to the first 1092character after that constant. Example: constants with a 'h' suffix to 1093indicate a hexadecimal base. 1094 1095@item void parse(void); 1096This is the main parsing function. It has to read lines via 1097the @code{read_next_line()} function, parse them and create sections, 1098atoms and symbols. Pseudo directives are usually handled by the syntax 1099module. Instructions can be parsed by the cpu module using 1100@code{parse_instruction()}. 1101 1102@item char *parse_macro_arg(struct macro *,char *,struct namelen *,struct namelen *); 1103Called to parse a macro parameter by using the source stream pointer in 1104the second argument. The start pointer and length of a single passed 1105parameter is written to the first @code{struct namelen}, while the optionally 1106selected named macro argument is passed in the second @code{struct namelen}. 1107When the @code{len} field of the second @code{namelen} is zero, then the 1108argument is selected by position instead by name. Returns the updated 1109source stream pointer after successful parsing. 1110 1111@item int expand_macro(source *,char **,char *,int); 1112Expand parameters and special commands inside a macro source. The second 1113argument is a pointer to the current source stream pointer, which is 1114updated on any succesful expansion. The function will return the 1115number of characters written to the destination buffer (third argument) 1116in this case. Returning @code{-1} means: no expansion took place. 1117The last argument defines the space in characters which is left in the 1118destination buffer. 1119 1120@item char *get_local_label(char **); 1121Gets a pointer to the current source pointer. Has to check if a valid 1122local label is found at this point. If yes return a pointer to the 1123vasm-internal symbol name representing the local label and update 1124the current source pointer to point behind the label. 1125 1126Have a look at the support functions provided by the frontend to help. 1127 1128@end table 1129 1130@section CPU modules 1131 1132A new cpu module must have its own subdirectory under @file{vasm/cpus}. 1133At least the files @file{cpu.h}, @file{cpu.c} and @file{cpu_errors.h} 1134must be written. 1135 1136@subsection The file @file{cpu.h} 1137 1138A cpu module has to provide the following elements (all other functions 1139should be @code{static} to prevent name clashes) in @code{cpu.h}: 1140 1141@table @code 1142@item #define MAX_OPERANDS 3 1143Maximum number of operands of one instruction. 1144 1145@item #define MAX_QUALIFIERS 0 1146Maximum number of mnemonic-qualifiers per mnemonic. 1147 1148@item #define NO_MACRO_QUALIFIERS 1149Define this, when qualifiers shouldn't be allowed for macros. For some 1150architectures, like ARM, macro qualifiers make no sense. 1151 1152@item typedef int32_t taddr; 1153Data type to represent a target-address. Preferrably use the ones from 1154@file{stdint.h}. 1155 1156@item typedef uint32_t utaddr; 1157Unsigned data type to represent a target-address. 1158 1159@item #define LITTLEENDIAN 1 1160@itemx #define BIGENDIAN 0 1161Define these according to the target endianess. For CPUs which support big- 1162and little-endian, you may assign a global variable here. So be aware of 1163it, and never use @code{#if BIGENDIAN}, but always @code{if(BIGENDIAN)} in 1164your code. 1165 1166@item #define VASM_CPU_<cpu> 1 1167Insert the cpu specifier. 1168 1169@item #define INST_ALIGN 2 1170Minimum instruction alignment. 1171 1172@item #define DATA_ALIGN(n) ... 1173Default alignment for @code{n}-bit data. Can also be a function. 1174 1175@item #define DATA_OPERAND(n) ... 1176Operand class for n-bit data definitions. Can also be a function. 1177Negative values denote a floating point data definition of -n bits. 1178 1179@item typedef ... operand; 1180Structure to store an operand. 1181 1182@item typedef ... mnemonic_extension; 1183Mnemonic extension. 1184@end table 1185 1186Optional features, which can be enabled by defining the following macros: 1187 1188@table @code 1189@item #define HAVE_INSTRUCTION_EXTENSION 1 1190If cpu-specific data should be added to all instruction atoms. 1191 1192@item typedef ... instruction_ext; 1193Type for the above extension. 1194 1195@item #define NEED_CLEARED_OPERANDS 1 1196Backend requires a zeroed operand structure when calling @code{parse_operand()} 1197for the first time. Defaults to undefined. 1198 1199@item START_PARENTH(x) 1200Valid opening parenthesis for instruction operands. Defaults to @code{'('}. 1201 1202@item END_PARENTH(x) 1203Valid closing parenthesis for instruction operands. Defaults to @code{')'}. 1204 1205@item #define MNEMONIC_VALID(i) 1206An optional function with the arguments @code{(int idx)}. Returns true 1207when the mnemonic with index @code{idx} is valid for the current state of 1208the backend (e.g. it is available for the selected cpu architecture). 1209 1210@item #define MNEMOHTABSIZE 0x4000 1211You can optionally overwrite the default hash table size defined in 1212@file{vasm.h}. May be necessary for larger mnemonic tables. 1213 1214@item #define OPERAND_OPTIONAL(p,t) 1215When defined, this is a function with the arguments 1216@code{(operand *op,int type)}, which returns true when the given operand 1217type (@code{type}) is optional. The function is only called for missing 1218operands and should also initialize @code{op} with default values (e.g. 0). 1219@end table 1220 1221Implementing additional target-specific unary operations is done by defining 1222the following optional macros: 1223 1224@table @code 1225@item #define EXT_UNARY_NAME(s) 1226Should return True when the string in @code{s} points to an operation name 1227we want to handle. 1228 1229@item #define EXT_UNARY_TYPE(s) 1230Returns the operation type code for the string in @code{s}. Note that the 1231last valid standard operation is defined as @code{LAST_EXP_TYPE}, so the 1232target-specific types will start with @code{LAST_EXP_TYPE+1}. 1233 1234@item #define EXT_UNARY_EVAL(t,v,r,c) 1235Defines a function with the arguments @code{(int t, taddr v, taddr *r, int c)} 1236to handle the operation type @code{t} returning an @code{int} to indicate 1237whether this type has been handled or not. Your operation will by applied on 1238the value @code{v} and the result is stored in @code{*r}. The flag @code{c} 1239is passed as 1 when the value is constant (no relocatable addresses involved). 1240 1241@item #define EXT_FIND_BASE(b,e,s,p) 1242Defines a function with the arguments 1243@code{(symbol **b, expr *e, section *s, taddr p)} 1244to save a pointer to the base symbol of expression @code{e} into the 1245symbol pointer, pointed to by @code{b}. The type of this base is given 1246by an @code{int} return code. Further on, @code{e->type} has to checked 1247to be one of the operations to handle. 1248The section pointer @code{s} and the current pc @code{p} are needed to call 1249the standard @code{find_base()} function. 1250@end table 1251 1252@subsection The file @file{cpu.c} 1253 1254A cpu module has to provide the following elements (all other functions 1255and data should be @code{static} to prevent name clashes) in @code{cpu.c}: 1256 1257@table @code 1258@item int bitsperbyte; 1259The number of bits per byte of the target cpu. 1260 1261@item int bytespertaddr; 1262The number of bytes per @code{taddr}. 1263 1264@item mnemonic mnemonics[]; 1265The mnemonic table keeps a list of mnemonic names and operand types the 1266assembler will match against using @code{parse_operand()}. It may also 1267include a target specific @code{mnemonic_extension}. 1268 1269@item char *cpu_copyright; 1270A string that will be emitted as part of the copyright message. 1271 1272@item char *cpuname; 1273A string describing the target cpu. 1274 1275@item int init_cpu(); 1276Will be called during startup, after argument parsing. Must return zero if 1277initializations failed, non-zero otherwise. 1278 1279@item int cpu_args(char *); 1280This function will be called with the command line arguments (unless they 1281were already recognized by other modules). If an argument was recognized, 1282return non-zero. 1283 1284@item char *parse_cpu_special(char *); 1285This function will be called with a source line as argument and allows 1286the cpu module to handle cpu-specific directives etc. Functions like 1287@code{eol()} and @code{skip()} should be used by the syntax module to 1288keep the syntax consistent. 1289 1290@item operand *new_operand(); 1291Allocate and initialize a new operand structure. 1292 1293@item int parse_operand(char *text,int len,operand *out,int requires); 1294Parses the source at @code{text} with length @code{len} to fill the target 1295specific operand structure pointed to by @code{out}. Returns @code{PO_MATCH} 1296when the operand matches the operand-type passed in @code{requires} and 1297@code{PO_NOMATCH} otherwise. When the source is definitely identified as 1298garbage, the function may return @code{PO_CORRUPT} to tell the assembler 1299that it is useless to try matching against any other operand types. 1300Another special case is @code{PO_SKIP}, which is also a match, but skips 1301the next operand from the mnemonic table (because it was already handled 1302together with the current operand). 1303 1304@item taddr instruction_size(instruction *ip, section *sec, taddr pc); 1305Returns the size of the instruction @code{ip} in bytes, which must be 1306identical to the number of bytes written by @code{eval_instruction()} 1307(see below). 1308 1309@item dblock *eval_instruction(instruction *ip, section *sec, taddr pc); 1310Converts the instruction @code{ip} into a DATA atom, including relocations, 1311if necessary. 1312 1313@item dblock *eval_data(operand *op, taddr bitsize, section *sec, taddr pc); 1314Converts a data operand into a DATA atom, including relocations. 1315 1316@item void init_instruction_ext(instruction_ext *); 1317(If @code{HAVE_INSTRUCTION_EXTENSION} is set.) 1318Initialize an instruction extension. 1319 1320@item char *parse_instruction(char *,int *,char **,int *,int *); 1321(If @code{MAX_QUALIFIERS} is greater than 0.) 1322Parses instruction and saves extension locations. 1323 1324@item int set_default_qualifiers(char **,int *); 1325(If @code{MAX_QUALIFIERS} is greater than 0.) 1326Saves pointers and lengths of default qualifiers for the selected CPU and 1327returns the number of default qualifiers. Example: for a M680x0 CPU this 1328would be a single qualifier, called "w". Used by @code{execute_macro()}. 1329 1330@item cpu_opts_init(section *); 1331(If @code{HAVE_CPU_OPTS} is set.) 1332Gives the cpu module the chance to write out @code{OPTS} atoms with 1333initial settings before the first atom is generated. 1334 1335@item cpu_opts(void *); 1336(If @code{HAVE_CPU_OPTS} is set.) 1337Apply option modifications from an @code{OPTS} atom. For example: 1338change cpu type or optimization flags. 1339 1340@item print_cpu_opts(FILE *,void *); 1341(If @code{HAVE_CPU_OPTS} is set.) 1342Called from @code{print_atom()} to print an @code{OPTS} atom's contents. 1343 1344@end table 1345 1346 1347@section Output modules 1348 1349Output modules can be chosen at runtime rather than compile time. Therefore, 1350several output modules are linked into one vasm executable and their 1351structure differs somewhat from syntax and cpu modules. 1352 1353Usually, an output module for some object format @code{fmt} should be contained 1354in a file @file{output_<fmt>.c} (it may use/include other files if necessary). 1355To automatically include this format in the build process, the @file{make.rules} 1356has to be extended. The module should be added to the @code{OBJS} variable 1357at the start of @file{make.rules}. Also, a dependency line should be added 1358(see the existing output modules). 1359 1360An output module must only export a single function which will return 1361pointers to necessary data/functions. This function should have the 1362following prototype: 1363@example 1364int init_output_<fmt>( 1365 char **copyright, 1366 void (**write_object)(FILE *,section *,symbol *), 1367 int (**output_args)(char *) 1368 ); 1369@end example 1370 1371In case of an error, zero must be returned. 1372Otherwise, It should perform all necessary initializations, return non-zero 1373and return the following output parameters via the pointers passed as arguments: 1374 1375@table @code 1376@item copyright 1377A pointer to the copyright string. 1378 1379@item write_object 1380A pointer to a function emitting the output. It will be called after the 1381assembler has completed and will receive pointers to the output file, 1382to the first section of the section list and to the first symbol 1383in the symbol list. See the section on general data structures for further 1384details. 1385 1386 1387@item output_args 1388A pointer to a function checking arguments. It will be called with all 1389command line arguments (unless already handled by other modules). If the 1390output module recognizes an appropriate option, it has to handle it 1391and return non-zero. If it is not an option relevant to this output module, 1392zero must be returned. 1393 1394@end table 1395 1396At last, a call to the @code{output_init_<fmt>} has to be added in the 1397@code{init_output()} function in @file{vasm.c} (should be self-explanatory). 1398 1399Some remarks: 1400@itemize @minus 1401 1402@item 1403Some output modules can not handle all supported CPUs. Nevertheless, 1404they have to be written in a way that they can be compiled. If code 1405references CPU-specifics, they have to be enclosed in 1406@code{#ifdef VASM_CPU_MYCPU} ... @code{#endif} or similar. 1407 1408Also, if the selected CPU is not supported, the init function should fail. 1409 1410@item 1411Error/warning messages can be emitted with the @code{output_error} function. 1412As all output modules are linked together, they have a common list of error 1413messages in the file @file{output_errors.h}. If a new message is needed, this 1414file has to be extended (see the section on general data structures for 1415details). 1416 1417@item 1418@command{vasm} has a mechanism to specify rather complex relocations in a 1419standard way (see the section on general data structures). They can be 1420extended with CPU specific relocations, but usually CPU modules will 1421try to create standard relocations (sometimes several standard relocations 1422can be used to implement a CPU specific relocation). An output 1423module should try to find appropriate relocations supported by the 1424object format. The goal is to avoid special CPU specific 1425relocations as much as possible. 1426 1427@end itemize 1428 1429Volker Barthelmann vb@@compilers.de 1430 1431@bye 1432