xref: /original-bsd/old/as.vax/PSD.doc/asdocs4.me (revision d6141097)
1.EQ
2delim $$
3.EN
4.SH 1 "Machine instructions"
5.pp
6The syntax of machine instruction statements accepted by
7.i as
8is generally similar to the syntax of \*(DM.
9There are differences,
10however.
11.SH 2 "Character set"
12.pp
13.i As
14uses the character
15.q \*(DL
16instead of
17.q #
18for immediate constants,
19and the character
20.q *
21instead of
22.q @
23for indirection.
24Opcodes and register names
25are spelled with lower-case rather than upper-case letters.
26.SH 2 "Specifying Displacement Lengths"
27.pp
28Under certain circumstances,
29the following constructs are (optionallly) recognized by
30.i as
31to indicate the number of bytes to allocate for
32the displacement used when constructing
33displacement and displacement deferred addressing modes:
34.(b
35.TS
36center;
37c c l
38cb cb l.
39primary	alternate	length
40_
41B\`	B^	byte (1 byte)
42W\`	W^	word (2 bytes)
43L\`	L^	long word (4 bytes)
44.TE
45.)b
46.pp
47One can also use lower case
48.b b ,
49.b w
50or
51.b l
52instead of the upper
53case letters.
54There must be no space between the size specifier letter and the
55.q "^"
56or
57.q "\`" .
58The constructs
59.b "S^"
60and
61.b "G^"
62are not recognized
63by
64.i as ,
65as they are by the \*(DM assembler.
66It is preferred to use the
67.q "\`" displacement specifier,
68so that the
69.q "^"
70is not
71misinterpreted as the
72.b xor
73operator.
74.pp
75Literal values
76(including floating-point literals used where the
77hardware expects a floating-point operand)
78are assembled as short
79literals if possible,
80hence not needing the
81.b "S^"
82\*(DM directive.
83.pp
84If the displacement length modifier is present,
85then the displacement is
86.b always
87assembled with that displacement,
88even if it will fit into a smaller field,
89or if significance is lost.
90If the length modifier is not present,
91and if the value of the displacment is known exactly in
92.i as 's
93first pass,
94then
95.i as
96determines the length automatically,
97assembling it in the shortest possible way,
98Otherwise,
99.i  as
100will use the value specified by the
101.b \-d
102argument,
103which defaults to 4 bytes.
104.SH 2 "case\fIx\fP Instructions"
105.pp
106.i As
107considers the instructions
108.b caseb ,
109.b casel ,
110.b casew
111to have three operands.
112The displacements must be explicitly computed by
113.i as ,
114using one or more
115.b .word
116statements.
117.SH 2 "Extended branch instructions"
118.pp
119These opcodes (formed in general
120by substituting a
121.q j
122for the initial
123.q b
124of the standard opcodes)
125take as branch destinations
126the name of a label in the current subsegment.
127It is an error if the destination is known to be in a different subsegment,
128and it is a warning if the destination is not defined within
129the object module being assembled.
130.pp
131If the branch destination is close enough,
132then the corresponding
133short branch
134.q b
135instruction is assembled.
136Otherwise the assembler choses a sequence
137of one or more instructions which together have the same effect as if the
138.q b
139instruction had a larger span.
140In general,
141.i as
142chooses the inverse branch followed by a
143.b brw ,
144but a
145.b brw
146is sometimes pooled among several
147.q j
148instructions with the same destination.
149.pp
150.i As
151is unable to perform the same long/short branch generation
152for other instructions with a fixed byte displacement,
153such as the
154.b sob ,
155.b aob
156families,
157or for the
158.b acbx
159family of instructions which has a fixed word displacement.
160This would be desirable,
161but is prohibitive because of the complexity of these instructions.
162.pp
163If the
164.b \-J
165assembler option is given,
166a
167.b jmp
168instruction is used instead of a
169.b brw
170instruction
171for
172.b ALL
173.q j
174instructions with distant destinations.
175This makes assembly of large (>32K bytes)
176programs (inefficiently)
177possible.
178.i As
179does not try to use clever combinations of
180.b brb ,
181.b brw
182and
183.b jmp
184instructions.
185The
186.b jmp
187instructions use PC relative addressing,
188with the length of the offset given by the
189.b \-d
190assembler
191option.
192.pp
193These are the extended branch instructions
194.i as
195recognizes:
196.(b
197.TS
198center;
199lb lb lb.
200jeql	jeqlu	jneq	jnequ
201jgeq	jgequ	jgtr	jgtru
202jleq	jlequ	jlss	jlssu
203jbcc	jbsc	jbcs	jbss
204
205jlbc	jlbs
206jcc	jcs
207jvc	jvs
208jbc	jbs
209jbr
210.TE
211.)b
212.pp
213Note that
214.b jbr
215turns into
216.b brb
217if its target is close enough;
218otherwise a
219.b brw
220is used.
221.SH 1 "Diagnostics"
222.pp
223Diagnostics are intended to be self explanatory and appear on
224the standard output.
225Diagnostics either report an
226.i error
227or a
228.i warning.
229Error diagnostics complain about lexical, syntactic and some
230semantic errors, and abort the assembly.
231.pp
232The majority of the warnings complain about the use of \*(VX
233features not supported by all implementations of the architecture.
234.i As
235will warn if new opcodes are used,
236if
237.q G
238or
239.q H
240floating point numbers are used
241and will complain about mixed floating conversions,
242.SH 1 "Limits"
243.(b
244.TS
245center;
246l l.
247limit	what
248_
249Arbitrary\**	Files to assemble
250BUFSIZ	Significant characters per name
251BUFSIZ	Characters per input line
252127	Characters per string
253Arbitrary	Symbols
2544	Text segments
2554	Data segments
256.TE
257.)b
258.(f
259\**Although the number of characters available to the \fIargv\fP line
260is restricted by \*(UX to 10240.
261.)f
262.SH 1 "Annoyances and Future Work"
263.pp
264Most of the annoyances deal with restrictions on the extended
265branch instructions.
266.pp
267.i As
268only uses a two level algorithm for resolving extended branch
269instructions into short or long displacements.
270What is really needed is a general mechanism
271to turn a short conditional jump into a
272reverse conditional jump over one of
273.b two
274possible unconditional branches,
275either a
276.b brw
277or a
278.b jmp
279instruction.
280Currently, the
281.b \-J
282forces the
283.b jmp
284instruction to
285.i always
286be used,
287instead of the
288shorter
289.b brw
290instruction when needed.
291.pp
292The assembler should also recognize extended branch instructions for
293.b sob ,
294.b aob ,
295and
296.b acbx
297instructions.
298.b Sob
299instructions will be easy,
300.b aob
301will be harder because the synthesized instruction
302uses the index operand twice,
303so one must be careful of side effects,
304and the
305.b acbx
306family will be much harder (in the general case)
307because the comparision depends on the sign of the addend operand,
308and two operands are used more than once.
309Augmenting
310.i as
311with these extended loop instructions
312will allow the peephole optmizer to produce much better
313loop optimizations,
314since it currently assumes the worst
315case about the size of the loop body.
316.pp
317There has been no experience with foreign programs using
318the binary symbolic intermediate form.
319.bp
320.SH 1 "Appendix 1: Binary Symbolic Intermediate Format"
321.pp
322The binary symbolic (\c
323.i bs
324for short) intermediate
325form for assembly language
326closely follows the syntax of
327.q human
328symbolic assembly language.
329However,
330some of the expressive flexibility allowed in the
331human symbolic assembly language is not allowed in the
332.i bs
333form,
334to simplify the
335.i bs
336form as much as possible.
337In addition,
338concessions to the internals
339of the assembler are made in the
340.i bs
341form.
342This implementation decision
343simplifies the assembler's internal buffering and
344necessitates only one internal form.
345.pp
346.i Bs
347is structured as a prefix linearized forest of description trees.
348Each node in the description tree
349is represented by a byte code.
350The nodes may have up to six children.
351Some of the nodes have semantic attributes;
352some semantic attributes are of concern only to the assembler,
353but must be in the
354.i bs
355form as place holders.
356The semantic attributes immediately follow the byte code.
357.SH 2 "Binary Symbolic Node Definitions"
358.pp
359Table 1
360defines the symbolic names for the description nodes,
361the type of the node,
362the number of children to the node,
363the restrictions on the kind of children,
364and the mapping of the description node,
365including its children,
366to the human assembly format.
367Table 2 defines the semantic attributes required for
368all attributed nodes.
369.pp
370The restrictions on the children are encoded in the mapping string.
371In addition,
372the prefix left to right order of a node's children is identical
373to the left to right enumeration of the children in the mapping string.
374The restrictions are encoded in the mapping string as
375.i printf
376like escapes.
377.(b
378.TS
379center;
380l l.
381escape	child requirement
382_
383%a	address mode node, ADDR
384%b	Bignum (large scalar or floating)
385%e	expression mode node, EXPR
386%c	comma node for operands, CMTR
387%n	name, BS\*(USNAME
388%r	register, BS\*(USREG
389%r	register expression, BS\*(USREGOP
390%s	string, BS\*(USSTRING
391%%	% sign
392
393%I	print an integer constant
394%N	print a name
395%S	print a string
396%R	print a register
397%B	print a big number
398%O	print an instruction
399.TE
400.)b
401.pp
402These are the node types used in Table 1:
403.(b
404.TS
405center;
406c l.
407node type	description
408_
409ROOT	the node can only appear at the root of a tree
410CMTR	the node is the only argument to an instruction
411ADDR	an addressing mode
412EXPR	an expression
413VADDR	an illegal addressing mode
414.TE
415.)b
416.bp
417.ce 1
418Table 1: Binary Symbolic Node Definitions
419.ce 0
420.sp 1
421.TS
422center;
423l       l       n      l        l
424l       l       n      lb       l.
425node	type	arity	key	arguments
426=
427Root
428_
429 BS\*(USNL	ROOT	0	\en
430 BS\*(USPARSEEOF	ROOT	0	<EOF>
431 BS\*(USLABEL	ROOT	1	%n:
432=
433Directives
434_
435 BS\*(USABORT	ROOT	0	.ABORT;
436 BS\*(USFILE	ROOT	1	.file	%s;
437 BS\*(USLINENO	ROOT	1	.line	%e;
438_
439 BS\*(USDATA	ROOT	1	.data	%e;
440 BS\*(USTEXT	ROOT	1	.text	%e;
441_
442 BS\*(USORG	ROOT	2	.org	%e,%e;
443 BS\*(USALIGN	ROOT	2	.align	%e,%e;
444 BS\*(USSPACE	ROOT	2	.space	%e,%e;
445 BS\*(USFILL	ROOT	3	.fill	%e,%e,%e;
446_
447 BS\*(USBYTE	ROOT	1	.byte	%e;
448 BS\*(USWORD	ROOT	1	.word	%e;
449 BS\*(USLONG	ROOT	1	.long	%e;
450 BS\*(USQUAD	ROOT	1	.quad	%b;
451 BS\*(USOCTA	ROOT	1	.octa	%b;
452 BS\*(USFFLOAT	ROOT	1	.ffloat	%b;
453 BS\*(USDFLOAT	ROOT	1	.dfloat	%b;
454 BS\*(USGFLOAT	ROOT	1	.gfloat	%b;
455 BS\*(USHFLOAT	ROOT	1	.hfloat	%b;
456 BS\*(USASCII	ROOT	1	.ascii	%s;
457_
458 BS\*(USCOMM	ROOT	2	.com	%n,%e;
459 BS\*(USLCOMM	ROOT	2	.lcomm	%n,%e;
460 BS\*(USGLOBAL	ROOT	1	.global	%n;
461 BS\*(USSET	ROOT	2	.set	%n,%e;
462 BS\*(USLSYM	ROOT	2	.lsym	%n,%e;
463_
464 BS\*(USSTABN	ROOT	4	.stabn	%e,%e,%e,%e;
465 BS\*(USSTABS	ROOT	5	.stabs	%s,%e,%e,%e,%e;
466 BS\*(USSTABD	ROOT	3	.stabd	%e,%e,%e;
467=
468Leaves
469_
470 BS\*(USICON	EXPR	0	\&	<integer, in decimal>
471 BS\*(USNAME	EXPR	0	\&	<name>
472 BS\*(USSTRING	EXPR	0	\&	<quoted string>
473 BS\*(USREG	EXPR	0	\&	r<integer>
474_
475 BS\*(USBNQ	EXPR	0		<quad scalar, in hex>
476 BS\*(USBNO	EXPR	0	\&	<octal scalar, in hex>
477 BS\*(USBNF	EXPR	0	\&	<F float, in hex>
478 BS\*(USBND	EXPR	0	\&	<D float, in hex>
479 BS\*(USBNG	EXPR	0	\&	<G float, in hex>
480 BS\*(USBNH	EXPR	0	\&	<H float, in hex>
481.bp
482=
483Operators
484_
485 BS\*(USREGOP	EXPR	1	\&	%%%e
486_
487 BS\*(USPLUS	EXPR	2	\&	(%e + %e)
488 BS\*(USMINUS	EXPR	2	\&	(%e - %e)
489 BS\*(USMUL	EXPR	2	\&	(%e * %e)
490 BS\*(USDIV	EXPR	2	\&	(%e / %e)
491 BS\*(USMOD	EXPR	2	\&	(%e %% %e)
492_
493 BS\*(USLSH	EXPR	2	\&	(%e < %e)
494 BS\*(USRSH	EXPR	2	\&	(%e > %e)
495_
496 BS\*(USXOR	EXPR	2	\&	(%e ^ %e)
497 BS\*(USIOR	EXPR	2	\&	(%e | %e)
498 BS\*(USAND	EXPR	2	\&	(%e & %e)
499 BS\*(USORNOT	EXPR	2	\&	(%e ! %e)
500=
501Instructions
502_
503 BS\*(USINST	ROOT	1	%O	%c;
504 BS\*(USJXXX	ROOT	1	%O	%c;
505_
506 BS\*(USCM0	CMTR	0	\&
507 BS\*(USCM1	CMTR	1	\&	%a
508 BS\*(USCM2	CMTR	2	\&	%a,%a
509 BS\*(USCM3	CMTR	3	\&	%a,%a,%a
510 BS\*(USCM4	CMTR	4	\&	%a,%a,%a,%a
511 BS\*(USCM5	CMTR	5	\&	%a,%a,%a,%a,%a
512 BS\*(USCM6	CMTR	6	\&	%a,%a,%a,%a,%a,%a
513.bp
514=
515Address modes
516_
517 AM\*(USIMM	ADDR	1	\&	\*(DL%e
518 AMD(AM\*(USIMM)	VADDR	1	\&	snark
519 AMI(AM\*(USIMM)	VADDR	1	\&	snark
520 AMDD(AM\*(USIMM)	VADDR	1	\&	snark
521_
522 AM\*(USREG	ADDR	1	\&	%r
523 AMD(AM\*(USREG)	ADDR	1	\&	(%r)
524 AMI(AM\*(USREG)	VADDR	1	\&	snark
525 AMDI(AM\*(USREG)	ADDR	2	\&	(%r)[%r]
526_
527 AM\*(USINCR	ADDR	1	\&	(%r)+
528 AMD(AM\*(USINCR)	ADDR	1	\&	*(%r)+
529 AMI(AM\*(USINCR)	ADDR	2	\&	(%r)+[%r]
530 AMDI(AM\*(USINCR)	ADDR	2	\&	*(%r)+[%r]
531_
532 AM\*(USEXPR	ADDR	1	\&	%e
533 AMD(AM\*(USEXPR)	ADDR	1	\&	*%e
534 AMI(AM\*(USEXPR)	ADDR	2	\&	%e[%r]
535 AMDI(AM\*(USEXPR)	ADDR	2	\&	*%e[%r]
536_
537 AM\*(USDECR	ADDR	1	\&	-(%r)
538 AMD(AM\*(USDECR)	VADDR	1	\&	snark
539 AMI(AM\*(USDECR)	ADDR	2	\&	-(%r)[%r]
540 AMDI(AM\*(USDECR)	VADDR	2	\&	snark
541_
542 AM\*(USDISPA	ADDR	2	\&	%e(%r)
543 AMD(AM\*(USDISPA)	ADDR	2	\&	*%e(%r)
544 AMI(AM\*(USDISPA)	ADDR	3	\&	%e(%r)[%r]
545 AMDI(AM\*(USDISPA)	ADDR	3	\&	*%e(%r)[%r]
546_
547 AM\*(USDISP1	ADDR	2	\&	b\`%e(%r)
548 AMD(AM\*(USDISP1)	ADDR	2	\&	*b\`%e(%r)
549 AMI(AM\*(USDISP1)	ADDR	3	\&	b\`%e(%r)[%r]
550 AMDI(AM\*(USDISP1)	ADDR	3	\&	*b\`%e(%r)[%r]
551_
552 AM\*(USDISP2	ADDR	2	\&	w\`%e(%r)
553 AMD(AM\*(USDISP2)	ADDR	2	\&	*w\`%e(%r)
554 AMI(AM\*(USDISP2)	ADDR	3	\&	w\`%e(%r)[%r]
555 AMDI(AM\*(USDISP2)	ADDR	3	\&	*w\`%e(%r)[%r]
556_
557 AM\*(USDISP4	ADDR	2	\&	l\`%e(%r)
558 AMD(AM\*(USDISP4)	ADDR	2	\&	*l\`%e(%r)
559 AMI(AM\*(USDISP4)	ADDR	3	\&	l\`%e(%r)[%r]
560 AMDI(AM\*(USDISP4)	ADDR	3	\&	*l\`%e(%r)[%r]
561.TE
562