1.EQ 2delim $$ 3.EN 4.SH 1 "Machine instructions" 5.pp 6The syntax of machine instruction statements accepted by 7.i as 8is generally similar to the syntax of \*(DM. 9There are differences, 10however. 11.SH 2 "Character set" 12.pp 13.i As 14uses the character 15.q \*(DL 16instead of 17.q # 18for immediate constants, 19and the character 20.q * 21instead of 22.q @ 23for indirection. 24Opcodes and register names 25are spelled with lower-case rather than upper-case letters. 26.SH 2 "Specifying Displacement Lengths" 27.pp 28Under certain circumstances, 29the following constructs are (optionallly) recognized by 30.i as 31to indicate the number of bytes to allocate for 32the displacement used when constructing 33displacement and displacement deferred addressing modes: 34.(b 35.TS 36center; 37c c l 38cb cb l. 39primary alternate length 40_ 41B\` B^ byte (1 byte) 42W\` W^ word (2 bytes) 43L\` L^ long word (4 bytes) 44.TE 45.)b 46.pp 47One can also use lower case 48.b b , 49.b w 50or 51.b l 52instead of the upper 53case letters. 54There must be no space between the size specifier letter and the 55.q "^" 56or 57.q "\`" . 58The constructs 59.b "S^" 60and 61.b "G^" 62are not recognized 63by 64.i as , 65as they are by the \*(DM assembler. 66It is preferred to use the 67.q "\`" displacement specifier, 68so that the 69.q "^" 70is not 71misinterpreted as the 72.b xor 73operator. 74.pp 75Literal values 76(including floating-point literals used where the 77hardware expects a floating-point operand) 78are assembled as short 79literals if possible, 80hence not needing the 81.b "S^" 82\*(DM directive. 83.pp 84If the displacement length modifier is present, 85then the displacement is 86.b always 87assembled with that displacement, 88even if it will fit into a smaller field, 89or if significance is lost. 90If the length modifier is not present, 91and if the value of the displacment is known exactly in 92.i as 's 93first pass, 94then 95.i as 96determines the length automatically, 97assembling it in the shortest possible way, 98Otherwise, 99.i as 100will use the value specified by the 101.b \-d 102argument, 103which defaults to 4 bytes. 104.SH 2 "case\fIx\fP Instructions" 105.pp 106.i As 107considers the instructions 108.b caseb , 109.b casel , 110.b casew 111to have three operands. 112The displacements must be explicitly computed by 113.i as , 114using one or more 115.b .word 116statements. 117.SH 2 "Extended branch instructions" 118.pp 119These opcodes (formed in general 120by substituting a 121.q j 122for the initial 123.q b 124of the standard opcodes) 125take as branch destinations 126the name of a label in the current subsegment. 127It is an error if the destination is known to be in a different subsegment, 128and it is a warning if the destination is not defined within 129the object module being assembled. 130.pp 131If the branch destination is close enough, 132then the corresponding 133short branch 134.q b 135instruction is assembled. 136Otherwise the assembler choses a sequence 137of one or more instructions which together have the same effect as if the 138.q b 139instruction had a larger span. 140In general, 141.i as 142chooses the inverse branch followed by a 143.b brw , 144but a 145.b brw 146is sometimes pooled among several 147.q j 148instructions with the same destination. 149.pp 150.i As 151is unable to perform the same long/short branch generation 152for other instructions with a fixed byte displacement, 153such as the 154.b sob , 155.b aob 156families, 157or for the 158.b acbx 159family of instructions which has a fixed word displacement. 160This would be desirable, 161but is prohibitive because of the complexity of these instructions. 162.pp 163If the 164.b \-J 165assembler option is given, 166a 167.b jmp 168instruction is used instead of a 169.b brw 170instruction 171for 172.b ALL 173.q j 174instructions with distant destinations. 175This makes assembly of large (>32K bytes) 176programs (inefficiently) 177possible. 178.i As 179does not try to use clever combinations of 180.b brb , 181.b brw 182and 183.b jmp 184instructions. 185The 186.b jmp 187instructions use PC relative addressing, 188with the length of the offset given by the 189.b \-d 190assembler 191option. 192.pp 193These are the extended branch instructions 194.i as 195recognizes: 196.(b 197.TS 198center; 199lb lb lb. 200jeql jeqlu jneq jnequ 201jgeq jgequ jgtr jgtru 202jleq jlequ jlss jlssu 203jbcc jbsc jbcs jbss 204 205jlbc jlbs 206jcc jcs 207jvc jvs 208jbc jbs 209jbr 210.TE 211.)b 212.pp 213Note that 214.b jbr 215turns into 216.b brb 217if its target is close enough; 218otherwise a 219.b brw 220is used. 221.SH 1 "Diagnostics" 222.pp 223Diagnostics are intended to be self explanatory and appear on 224the standard output. 225Diagnostics either report an 226.i error 227or a 228.i warning. 229Error diagnostics complain about lexical, syntactic and some 230semantic errors, and abort the assembly. 231.pp 232The majority of the warnings complain about the use of \*(VX 233features not supported by all implementations of the architecture. 234.i As 235will warn if new opcodes are used, 236if 237.q G 238or 239.q H 240floating point numbers are used 241and will complain about mixed floating conversions, 242.SH 1 "Limits" 243.(b 244.TS 245center; 246l l. 247limit what 248_ 249Arbitrary\** Files to assemble 250BUFSIZ Significant characters per name 251BUFSIZ Characters per input line 252127 Characters per string 253Arbitrary Symbols 2544 Text segments 2554 Data segments 256.TE 257.)b 258.(f 259\**Although the number of characters available to the \fIargv\fP line 260is restricted by \*(UX to 10240. 261.)f 262.SH 1 "Annoyances and Future Work" 263.pp 264Most of the annoyances deal with restrictions on the extended 265branch instructions. 266.pp 267.i As 268only uses a two level algorithm for resolving extended branch 269instructions into short or long displacements. 270What is really needed is a general mechanism 271to turn a short conditional jump into a 272reverse conditional jump over one of 273.b two 274possible unconditional branches, 275either a 276.b brw 277or a 278.b jmp 279instruction. 280Currently, the 281.b \-J 282forces the 283.b jmp 284instruction to 285.i always 286be used, 287instead of the 288shorter 289.b brw 290instruction when needed. 291.pp 292The assembler should also recognize extended branch instructions for 293.b sob , 294.b aob , 295and 296.b acbx 297instructions. 298.b Sob 299instructions will be easy, 300.b aob 301will be harder because the synthesized instruction 302uses the index operand twice, 303so one must be careful of side effects, 304and the 305.b acbx 306family will be much harder (in the general case) 307because the comparision depends on the sign of the addend operand, 308and two operands are used more than once. 309Augmenting 310.i as 311with these extended loop instructions 312will allow the peephole optmizer to produce much better 313loop optimizations, 314since it currently assumes the worst 315case about the size of the loop body. 316.pp 317There has been no experience with foreign programs using 318the binary symbolic intermediate form. 319.bp 320.SH 1 "Appendix 1: Binary Symbolic Intermediate Format" 321.pp 322The binary symbolic (\c 323.i bs 324for short) intermediate 325form for assembly language 326closely follows the syntax of 327.q human 328symbolic assembly language. 329However, 330some of the expressive flexibility allowed in the 331human symbolic assembly language is not allowed in the 332.i bs 333form, 334to simplify the 335.i bs 336form as much as possible. 337In addition, 338concessions to the internals 339of the assembler are made in the 340.i bs 341form. 342This implementation decision 343simplifies the assembler's internal buffering and 344necessitates only one internal form. 345.pp 346.i Bs 347is structured as a prefix linearized forest of description trees. 348Each node in the description tree 349is represented by a byte code. 350The nodes may have up to six children. 351Some of the nodes have semantic attributes; 352some semantic attributes are of concern only to the assembler, 353but must be in the 354.i bs 355form as place holders. 356The semantic attributes immediately follow the byte code. 357.SH 2 "Binary Symbolic Node Definitions" 358.pp 359Table 1 360defines the symbolic names for the description nodes, 361the type of the node, 362the number of children to the node, 363the restrictions on the kind of children, 364and the mapping of the description node, 365including its children, 366to the human assembly format. 367Table 2 defines the semantic attributes required for 368all attributed nodes. 369.pp 370The restrictions on the children are encoded in the mapping string. 371In addition, 372the prefix left to right order of a node's children is identical 373to the left to right enumeration of the children in the mapping string. 374The restrictions are encoded in the mapping string as 375.i printf 376like escapes. 377.(b 378.TS 379center; 380l l. 381escape child requirement 382_ 383%a address mode node, ADDR 384%b Bignum (large scalar or floating) 385%e expression mode node, EXPR 386%c comma node for operands, CMTR 387%n name, BS\*(USNAME 388%r register, BS\*(USREG 389%r register expression, BS\*(USREGOP 390%s string, BS\*(USSTRING 391%% % sign 392 393%I print an integer constant 394%N print a name 395%S print a string 396%R print a register 397%B print a big number 398%O print an instruction 399.TE 400.)b 401.pp 402These are the node types used in Table 1: 403.(b 404.TS 405center; 406c l. 407node type description 408_ 409ROOT the node can only appear at the root of a tree 410CMTR the node is the only argument to an instruction 411ADDR an addressing mode 412EXPR an expression 413VADDR an illegal addressing mode 414.TE 415.)b 416.bp 417.ce 1 418Table 1: Binary Symbolic Node Definitions 419.ce 0 420.sp 1 421.TS 422center; 423l l n l l 424l l n lb l. 425node type arity key arguments 426= 427Root 428_ 429 BS\*(USNL ROOT 0 \en 430 BS\*(USPARSEEOF ROOT 0 <EOF> 431 BS\*(USLABEL ROOT 1 %n: 432= 433Directives 434_ 435 BS\*(USABORT ROOT 0 .ABORT; 436 BS\*(USFILE ROOT 1 .file %s; 437 BS\*(USLINENO ROOT 1 .line %e; 438_ 439 BS\*(USDATA ROOT 1 .data %e; 440 BS\*(USTEXT ROOT 1 .text %e; 441_ 442 BS\*(USORG ROOT 2 .org %e,%e; 443 BS\*(USALIGN ROOT 2 .align %e,%e; 444 BS\*(USSPACE ROOT 2 .space %e,%e; 445 BS\*(USFILL ROOT 3 .fill %e,%e,%e; 446_ 447 BS\*(USBYTE ROOT 1 .byte %e; 448 BS\*(USWORD ROOT 1 .word %e; 449 BS\*(USLONG ROOT 1 .long %e; 450 BS\*(USQUAD ROOT 1 .quad %b; 451 BS\*(USOCTA ROOT 1 .octa %b; 452 BS\*(USFFLOAT ROOT 1 .ffloat %b; 453 BS\*(USDFLOAT ROOT 1 .dfloat %b; 454 BS\*(USGFLOAT ROOT 1 .gfloat %b; 455 BS\*(USHFLOAT ROOT 1 .hfloat %b; 456 BS\*(USASCII ROOT 1 .ascii %s; 457_ 458 BS\*(USCOMM ROOT 2 .com %n,%e; 459 BS\*(USLCOMM ROOT 2 .lcomm %n,%e; 460 BS\*(USGLOBAL ROOT 1 .global %n; 461 BS\*(USSET ROOT 2 .set %n,%e; 462 BS\*(USLSYM ROOT 2 .lsym %n,%e; 463_ 464 BS\*(USSTABN ROOT 4 .stabn %e,%e,%e,%e; 465 BS\*(USSTABS ROOT 5 .stabs %s,%e,%e,%e,%e; 466 BS\*(USSTABD ROOT 3 .stabd %e,%e,%e; 467= 468Leaves 469_ 470 BS\*(USICON EXPR 0 \& <integer, in decimal> 471 BS\*(USNAME EXPR 0 \& <name> 472 BS\*(USSTRING EXPR 0 \& <quoted string> 473 BS\*(USREG EXPR 0 \& r<integer> 474_ 475 BS\*(USBNQ EXPR 0 <quad scalar, in hex> 476 BS\*(USBNO EXPR 0 \& <octal scalar, in hex> 477 BS\*(USBNF EXPR 0 \& <F float, in hex> 478 BS\*(USBND EXPR 0 \& <D float, in hex> 479 BS\*(USBNG EXPR 0 \& <G float, in hex> 480 BS\*(USBNH EXPR 0 \& <H float, in hex> 481.bp 482= 483Operators 484_ 485 BS\*(USREGOP EXPR 1 \& %%%e 486_ 487 BS\*(USPLUS EXPR 2 \& (%e + %e) 488 BS\*(USMINUS EXPR 2 \& (%e - %e) 489 BS\*(USMUL EXPR 2 \& (%e * %e) 490 BS\*(USDIV EXPR 2 \& (%e / %e) 491 BS\*(USMOD EXPR 2 \& (%e %% %e) 492_ 493 BS\*(USLSH EXPR 2 \& (%e < %e) 494 BS\*(USRSH EXPR 2 \& (%e > %e) 495_ 496 BS\*(USXOR EXPR 2 \& (%e ^ %e) 497 BS\*(USIOR EXPR 2 \& (%e | %e) 498 BS\*(USAND EXPR 2 \& (%e & %e) 499 BS\*(USORNOT EXPR 2 \& (%e ! %e) 500= 501Instructions 502_ 503 BS\*(USINST ROOT 1 %O %c; 504 BS\*(USJXXX ROOT 1 %O %c; 505_ 506 BS\*(USCM0 CMTR 0 \& 507 BS\*(USCM1 CMTR 1 \& %a 508 BS\*(USCM2 CMTR 2 \& %a,%a 509 BS\*(USCM3 CMTR 3 \& %a,%a,%a 510 BS\*(USCM4 CMTR 4 \& %a,%a,%a,%a 511 BS\*(USCM5 CMTR 5 \& %a,%a,%a,%a,%a 512 BS\*(USCM6 CMTR 6 \& %a,%a,%a,%a,%a,%a 513.bp 514= 515Address modes 516_ 517 AM\*(USIMM ADDR 1 \& \*(DL%e 518 AMD(AM\*(USIMM) VADDR 1 \& snark 519 AMI(AM\*(USIMM) VADDR 1 \& snark 520 AMDD(AM\*(USIMM) VADDR 1 \& snark 521_ 522 AM\*(USREG ADDR 1 \& %r 523 AMD(AM\*(USREG) ADDR 1 \& (%r) 524 AMI(AM\*(USREG) VADDR 1 \& snark 525 AMDI(AM\*(USREG) ADDR 2 \& (%r)[%r] 526_ 527 AM\*(USINCR ADDR 1 \& (%r)+ 528 AMD(AM\*(USINCR) ADDR 1 \& *(%r)+ 529 AMI(AM\*(USINCR) ADDR 2 \& (%r)+[%r] 530 AMDI(AM\*(USINCR) ADDR 2 \& *(%r)+[%r] 531_ 532 AM\*(USEXPR ADDR 1 \& %e 533 AMD(AM\*(USEXPR) ADDR 1 \& *%e 534 AMI(AM\*(USEXPR) ADDR 2 \& %e[%r] 535 AMDI(AM\*(USEXPR) ADDR 2 \& *%e[%r] 536_ 537 AM\*(USDECR ADDR 1 \& -(%r) 538 AMD(AM\*(USDECR) VADDR 1 \& snark 539 AMI(AM\*(USDECR) ADDR 2 \& -(%r)[%r] 540 AMDI(AM\*(USDECR) VADDR 2 \& snark 541_ 542 AM\*(USDISPA ADDR 2 \& %e(%r) 543 AMD(AM\*(USDISPA) ADDR 2 \& *%e(%r) 544 AMI(AM\*(USDISPA) ADDR 3 \& %e(%r)[%r] 545 AMDI(AM\*(USDISPA) ADDR 3 \& *%e(%r)[%r] 546_ 547 AM\*(USDISP1 ADDR 2 \& b\`%e(%r) 548 AMD(AM\*(USDISP1) ADDR 2 \& *b\`%e(%r) 549 AMI(AM\*(USDISP1) ADDR 3 \& b\`%e(%r)[%r] 550 AMDI(AM\*(USDISP1) ADDR 3 \& *b\`%e(%r)[%r] 551_ 552 AM\*(USDISP2 ADDR 2 \& w\`%e(%r) 553 AMD(AM\*(USDISP2) ADDR 2 \& *w\`%e(%r) 554 AMI(AM\*(USDISP2) ADDR 3 \& w\`%e(%r)[%r] 555 AMDI(AM\*(USDISP2) ADDR 3 \& *w\`%e(%r)[%r] 556_ 557 AM\*(USDISP4 ADDR 2 \& l\`%e(%r) 558 AMD(AM\*(USDISP4) ADDR 2 \& *l\`%e(%r) 559 AMI(AM\*(USDISP4) ADDR 3 \& l\`%e(%r)[%r] 560 AMDI(AM\*(USDISP4) ADDR 3 \& *l\`%e(%r)[%r] 561.TE 562