xref: /original-bsd/old/as.vax/PSD.doc/asdocs1.me (revision 93ab02a6)
Copyright (c) 1982 The Regents of the University of California.
All rights reserved.

%sccs.include.redist.roff%

@(#)asdocs1.me 5.1 (Berkeley) 04/17/91

delim $$ .EN .(l C .i "\*(VS \*(AM" John F. Reiser Bell Laboratories, Holmdel, NJ .i and Robert R. Henry\** .(f \**Preparation of this paper supported in part by the National Science Foundation under grant MCS #78-07291. .)f Electronics Research Laboratory University of California Berkeley, CA 94720 November 5, 1979 .i Revised \*(TD .)l
1 Introduction
.pp This document describes the usage and input syntax of the \*(UX \*(VX-11 assembler .i as . .i As is designed for assembling the code produced by the \*(CL compiler; certain concessions have been made to handle code written directly by people, but in general little sympathy has been extended. This document is intended only for the writer of a compiler or a maintainer of the assembler.
2 "Assembler Revisions since November 5, 1979"
.pp There has been one major change to .i as since the last release. .i As has been updated to assemble the new instructions and data formats for .q G and .q H floating point numbers, as well as the new queue instructions.
2 "Features Supported, but No Longer Encouraged as of \*(TD"
.pp These feature(s) in .i as are supported, but no longer encouraged. .ip - The colon operator for field initialization is likely to disappear.
1 "Usage"
.pp .i As is invoked with these command arguments:

as [ .b -LVWJR ] [ .b -d $n$ ] [ .b -DTS ] [ .b -t .i directory ] [ .b -o .i output ] [ $name sub 1$ ] $...$ [ $name sub n$ ]

.pp The .b -L flag instructs the assembler to save labels beginning with a .q L in the symbol table portion of the .i output file. Labels are not saved by default, as the default action of the link editor .i ld is to discard them anyway. .pp The .b -V flag tells the assembler to place its interpass temporary file into virtual memory. In normal circumstances, the system manager will decide where the temporary file should lie. Our experiments with very large temporary files show that placing the temporary file into virtual memory will save about 13% of the assembly time, where the size of the temporary file is about 350K bytes. Most assembler sources will not be this long. .pp The .b -W turns of all warning error reporting. .pp The .b -J flag forces \*(UX style pseudo-branch instructions with destinations further away than a byte displacement to be turned into jump instructions with 4 byte offsets. The .b -J flag buys you nothing if .b -d2 is set. (See \(sc8.4, and future work described in \(sc11) .pp The .b -R flag effectively turns .q ".data $n$" directives into .q ".text $n$" directives. This obviates the need to run editor scripts on assembler source to .q "read-only" fix initialized data segments. Uninitialized data (via .b .lcomm and .b .comm directives) is still assembled into the data or bss segments. .pp The .b -d flag specifies the number of bytes which the assembler should allow for a displacement when the value of the displacement expression is undefined in the first pass. The possible values of .i n are 1, 2, or 4; the assembler uses 4 bytes if .b -d is not specified. See \(sc8.2. .pp Provided the .b -V flag is not set, the .b -t flag causes the assembler to place its single temporary file in the .i directory instead of in .i /tmp . .pp The .b -o flag causes the output to be placed on the file .i output . By default, the output of the assembler is placed in the file .i a.out in the current directory. .pp The input to the assembler is normally taken from the standard input. If file arguments occur, then the input is taken sequentially from the files $name sub 1$, $name sub 2~...~name sub n$ This is not to say that the files are assembled separately; $name sub 1$ is effectively concatenated to $name sub 2$, so multiple definitions cannot occur amongst the input sources. .pp .pp The .b -D (debug), .b -T (token trace), and the .b -S (symbol table) flags enable assembler trace information, provided that the assembler has been compiled with the debugging code enabled. The information printed is long and boring, but useful when debugging the assembler.

1 "Lexical conventions"
.pp Assembler tokens include identifiers (alternatively, .q symbols or .q names ), constants, and operators.
2 "Identifiers"
.pp An identifier consists of a sequence of alphanumeric characters (including period .q "\|.\|" , underscore .q "\*(US" , and dollar .q "\*(DL" ). The first character may not be numeric. Identifiers may be (practically) arbitrary long; all characters are significant.
2 "Constants"
3 "Scalar constants"
.pp All scalar (non floating point) constants are (potentially) 128 bits wide. Such constants are interpreted as two's complement numbers. Note that 64 bit (quad words) and 128 bit (octal word) integers are only partially supported by the \*(VX hardware. In addition, 128 bit integers are only supported by the extended \*(VX architecture. .i As supports 64 and 128 bit integers only so they can be used as immediate constants or to fill initialized data space. .i As can not perform arithmetic on constants larger than 32 bits. .pp Scalar constants are initially evaluated to a full 128 bits, but are pared down by discarding high order copies of the sign bit and categorizing the number as a long, quad or octal integer. Numbers with less precision than 32 bits are treated as 32 bit quantities. .pp The digits are .q 0123456789abcdefABCDEF with the obvious values. .pp An octal constant consists of a sequence of digits with a leading zero. .pp A decimal constant consists of a sequence of digits without a leading zero. .pp A hexadecimal constant consists of the characters .q 0x (or .q 0X ) followed by a sequence of digits. .pp A single-character constant consists of a single quote .q "\|\(fm\|" followed by an \*(AC character, including \*(AC newline. The constant's value is the code for the given character.
3 "Floating Point Constants"
.pp Floating point constants are internally represented in the \*(VX floating point format that is specified by the lexical form of the constant. Using the meta notation that [dec] is a decimal digit (\c .q "0123456789" ), [expt] is a type specification character (\c, .q "fFdDhHgG" ), [expe] is a exponent delimiter and type specification character (\c, .q "eEfFdDhHgG" ), $x sup roman "*"$ means 0 or more occurences of $x$, $x sup +$ means 1 or more occurences of $x$, then the general lexical form of a floating point number is:

1 0[expe]([+-])$roman "[dec]" sup +$(.)($roman "[dec]" sup roman "*"$)([expt]([+-])($roman "dec]" sup +$))

0 The standard semantic interpretation is used for the signed integer, fraction and signed power of 10 exponent. If the exponent delimiter is specified, it must be either an .q e or .q E , or must agree with the initial type specification character that is used. The type specification character specifies the type and representation of the constructed number, as follows: .(b

type character floating representation size (bits)
f, F F format floating 32
d, D D format floating 64
g, G G format floating 64
h, H H format floating 128
.)b Note that .q G and .q H format floating point numbers are not supported by all implementations of the \*(VX architecture. .i As does not require the augmented architecture in order to run. .pp The assembler uses the library routine .i atof() to convert .q F and .q D numbers, and uses its own conversion routine (derived from .i atof , and believed to be numerically accurate) to convert .q G and .q H floating point numbers. .pp Collectively, all floating point numbers, together with quad and octal scalars are called .i Bignums . When .i as requires a Bignum, a 32 bit scalar quantity may also be used.
3 "String Constants"
.pp A string constant is defined using the same syntax and semantics as the \*(CL language uses. Strings begin and end with a .q "''" (double quote). The \*(DM assembler conventions for flexible string quoting is not implemented. All \*(CL backslash conventions are observed; the backslash conventions peculiar to the \*(PD assembler are not observed. Strings are known by their value and their length; the assembler does not implicitly end strings with a null byte.
2 "Operators"
.pp There are several single-character operators; see \(sc6.1.
2 "Blanks"
.pp Blank and tab characters may be interspersed freely between tokens, but may not be used within tokens (except character constants). A blank or tab is required to separate adjacent identifiers or constants not otherwise separated.
2 "Scratch Mark Comments"
.pp The character .q "#" introduces a comment, which extends through the end of the line on which it appears. Comments starting in column 1, having the format .q "# $expression~~string$" , are interpreted as an indication that the assembler is now assembling file .i string at line .i expression . Thus, one can use the \*(CL preprocessor on an assembly language source file, and use the .i #include and .i #define preprocessor directives. (Note that there may not be an assembler comment starting in column 1 if the assembler source is given to the \*(CL preprocessor, as it will be interpreted by the preprocessor in a way not intended.) Comments are otherwise ignored by the assembler.
2 "\*(CL Style Comments"
.pp The assembler will recognize \*(CL style comments, introduced with the prologue .b "/*" and ending with the epilogue .b "*/" . \*(CL style comments may extend across multiple lines, and are the preferred comment style to use if one chooses to use the \*(CL preprocessor.
1 "Segments and Location Counters"
.pp Assembled code and data fall into three segments: the text segment, the data segment, and the bss segment. The \*(UX operating system makes some assumptions about the content of these segments; the assembler does not. Within the text and data segments there are a number of sub-segments, distinguished by number (\c .q "text 0" , .q "text 1" , $...$ .q "data 0" , .q "data 1" , $...$). Currently there are four subsegments each in text and data. The subsegments are for programming convenience only. .pp Before writing the output file, the assembler zero-pads each text subsegment to a multiple of four bytes and then concatenates the subsegments in order to form the text segment; an analogous operation is done for the data segment. Requesting that the loader define symbols and storage regions is the only action allowed by the assembler with respect to the bss segment. Assembly begins in .q "text 0" . .pp Associated with each (sub)segment is an implicit location counter which begins at zero and is incremented by 1 for each byte assembled into the (sub)segment. There is no way to explicitly reference a location counter. Note that the location counters of subsegments other than .q "text 0" and .q "data 0" behave peculiarly due to the concatenation used to form the text and data segments.