Copyright (c) 1980, 1993
The Regents of the University of California. All rights reserved.

%sccs.include.redist.man%

@(#)pumanA.n 8.2 (Berkeley) 06/01/94

.so tmac.p \} .ND .nr H1 0 .af H1 A Appendix to Wirth's Pascal Report

This section is an appendix to the definition of the Pascal language in Niklaus Wirth's "Pascal Report" and, with that Report, precisely defines the Berkeley implementation. This appendix includes a summary of extensions to the language, gives the ways in which the undefined specifications were resolved, gives limitations and restrictions of the current implementation, and lists the added functions and procedures available. It concludes with a list of differences with the commonly available Pascal 6000-3.4 implementation, and some comments on standard and portable Pascal. Extensions to the language Pascal

This section defines non-standard language constructs available in P . The s standard Pascal option of the translators

I and

C can be used to detect these extensions in programs which are to be transported.

String padding

P will pad constant strings with blanks in expressions and as value parameters to make them as long as is required. The following is a legal P program: .LS \*bprogram x(output); \*bvar z : \*bpacked \*barray [ 1 .. 13 ] \*bof char; \*bbegin z := 'red'; writeln(z) \*bend; .LE The padded blanks are added on the right. Thus the assignment above is equivalent to: .LS z := 'red ' .LE which is standard Pascal.

Octal constants, octal and hexadecimal write

Octal constants may be given as a sequence of octal digits followed by the character `b' or `B'. The forms .LS write(a:n \*boct) .LE and .LS write(a:n \*bhex) .LE cause the internal representation of expression a, which must be Boolean, character, integer, pointer, or a user-defined enumerated type, to be written in octal or hexadecimal respectively.

Assert statement

An assert statement causes a Boolean expression to be evaluated each time the statement is executed. A runtime error results if any of the expressions evaluates to be false . The assert statement is treated as a comment if run-time tests are disabled. The syntax for assert is: .LS \*bassert <expr> .LE

Enumerated type input-output

Enumerated types may be read and written. On output the string name associated with the enumerated value is output. If the value is out of range, a runtime error occurs. On input an identifier is read and looked up in a table of names associated with the type of the variable, and the appropriate internal value is assigned to the variable being read. If the name is not found in the table a runtime error occurs.

Structure returning functions

An extension has been added which allows functions to return arbitrary sized structures rather than just scalars as in the standard.

Separate compilation

The compiler

C has been extended to allow separate compilation of programs. Procedures and functions declared at the global level may be compiled separately. Type checking of calls to separately compiled routines is performed at load time to insure that the program as a whole is consistent. See section 5.10 for details. Resolution of the undefined specifications

File name - file variable associations

Each Pascal file variable is associated with a named X file. Except for input and output, which are exceptions to some of the rules, a name can become associated with a file in any of three ways:

" 1)" 10
If a global Pascal file variable appears in the program statement then it is associated with X file of the same name.
" 2)"
If a file was reset or rewritten using the extended two-argument form of reset or rewrite then the given name is associated.
" 3)"
If a file which has never had X name associated is reset or rewritten without specifying a name via the second argument, then a temporary name of the form `tmp.x' is associated with the file. Temporary names start with `tmp.1' and continue by incrementing the last character in the USASCII ordering. Temporary files are removed automatically when their scope is exited.
The program statement

The syntax of the program statement is: .LS \*bprogram <id> ( <file id> { , <file id > } ) ; .LE The file identifiers (other than input and output ) must be declared as variables of file type in the global declaration part.

The files input and output

The formal parameters input and output are associated with the X standard input and output and have a somewhat special status. The following rules must be noted:

" 1)" 10
The program heading must contains the formal parameter output. If input is used, explicitly or implicitly, then it must also be declared here.
" 2)"
Unlike all other files, the Pascal files input and output must not be defined in a declaration, as their declaration is automatically: .LS \*bvar input, output: text .LE
" 3)"
The procedure reset may be used on input. If no X file name has ever been associated with input, and no file name is given, then an attempt will be made to `rewind' input. If this fails, a run time error will occur. Rewrite calls to output act as for any other file, except that output initially has no associated file. This means that a simple .LS rewrite(output) .LE associates a temporary name with output.
Details for files

If a file other than input is to be read, then reading must be initiated by a call to the procedure reset which causes the Pascal system to attempt to open the associated X file for reading. If this fails, then a runtime error occurs. Writing of a file other than output must be initiated by a rewrite call, which causes the Pascal system to create the associated X file and to then open the file for writing only.

Buffering

The buffering for output is determined by the value of the b option at the end of the program statement. If it has its default value 1, then output is buffered in blocks of up to 512 characters, flushed whenever a writeln occurs and at each reference to the file input. If it has the value 0, output is unbuffered. Any value of 2 or more gives block buffering without line or input reference flushing. All other output files are always buffered in blocks of 512 characters. All output buffers are flushed when the files are closed at scope exit, whenever the procedure message is called, and can be flushed using the built-in procedure flush.

An important point for an interactive implementation is the definition of `input\(ua'. If input is a teletype, and the Pascal system reads a character at the beginning of execution to define `input\(ua', then no prompt could be printed by the program before the user is required to type some input. For this reason, `input\(ua' is not defined by the system until its definition is needed, reading from a file occurring only when necessary.

The character set

Seven bit USASCII is the character set used on X . The standard Pascal symbols `and', 'or', 'not', '<=', '>=', '<>', and the uparrow `\(ua' (for pointer qualification) are recognized.\*(dg .FS \*(dgOn many terminals and printers, the up arrow is represented as a circumflex `^'. These are not distinct characters, but rather different graphic representations of the same internal codes. .FE Less portable are the synonyms tilde `~' for not , `&' for and , and `|' for or .

Upper and lower case are considered to be distinct.\*(st .FS \*(stThe proposed standard for Pascal considers them to be the same. .FE Keywords and built-in procedure and function names are composed of all lower case letters. Thus the identifiers GOTO and GOto are distinct both from each other and from the keyword \*bgoto. The standard type `boolean' is also available as `Boolean'.

Character strings and constants may be delimited by the character `\'' or by the character `#'; the latter is sometimes convenient when programs are to be transported. Note that the `#' character has special meaning .up when it is the first character on a line - see "Multi-file programs" below.

The standard types

The standard type integer is conceptually defined as .LS \*btype integer = minint .. maxint; .LE Integer is implemented with 32 bit twos complement arithmetic. Predefined constants of type integer are: .LS \*bconst maxint = 2147483647; minint = -2147483648; .LE

The standard type char is conceptually defined as .LS \*btype char = minchar .. maxchar; .LE Built-in character constants are `minchar' and `maxchar', `bell' and `tab'; ord(minchar) = 0, ord(maxchar) = 127.

The type real is implemented using 64 bit floating point arithmetic. The floating point arithmetic is done in `rounded' mode, and provides approximately 17 digits of precision with numbers as small as 10 to the negative 38th power and as large as 10 to the 38th power.

Comments

Comments can be delimited by either `{' and `}' or by `(*' and `*)'. If the character `{' appears in a comment delimited by `{' and `}', a warning diagnostic is printed. A similar warning will be printed if the sequence `(*' appears in a comment delimited by `(*' and `*)'. The restriction implied by this warning is not part of standard Pascal, but detects many otherwise subtle errors.

Option control

Options of the translators may be controlled in two distinct ways. A number of options may appear on the command line invoking the translator. These options are given as one or more strings of letters preceded by the character `-' and cause the default setting of each given option to be changed. This method of communication of options is expected to predominate for X . Thus the command .LS % \*bpi -l -s foo.p .LE translates the file foo.p with the listing option enabled (as it normally is off), and with only standard Pascal features available.

If more control over the portions of the program where options are enabled is required, then option control in comments can and should be used. The format for option control in comments is identical to that used in Pascal 6000-3.4. One places the character `$' as the first character of the comment and follows it by a comma separated list of directives. Thus an equivalent to the command line example given above would be: .LS {$l+,s+ listing on, standard Pascal} .LE as the first line of the program. The `l' option is more appropriately specified on the command line, since it is extremely unlikely in an interactive environment that one wants a listing of the program each time it is translated.

Directives consist of a letter designating the option, followed either by a `+' to turn the option on, or by a `-' to turn the option off. The b option takes a single digit instead of a `+' or `-'.

Notes on the listings

The first page of a listing includes a banner line indicating the version and date of generation of

I or

C . It also includes the X path name supplied for the source file and the date of last modification of that file.

Within the body of the listing, lines are numbered consecutively and correspond to the line numbers for the editor. Currently, two special kinds of lines may be used to format the listing: a line consisting of a form-feed character, control-l, which causes a page eject in the listing, and a line with no characters which causes the line number to be suppressed in the listing, creating a truly blank line. These lines thus correspond to `eject' and `space' macros found in many assemblers. Non-printing characters are printed as the character `?' in the listing.\*(dg .FS \*(dgThe character generated by a control-i indents to the next `tab stop'. Tab stops are set every 8 columns in X . Tabs thus provide a quick way of indenting in the program. .FE

The standard procedure write

If no minimum field length parameter is specified for a write, the following default values are assumed: .KS

integer 10
real 22
Boolean length of `true' or `false'
char 1
string length of the string
oct 11
hex 8
.KE The end of each line in a text file should be explicitly indicated by `writeln(f)', where `writeln(output)' may be written simply as `writeln'. For X , the built-in function `page(f)' puts a single ASCII form-feed character on the output file. For programs which are to be transported the filter pcc can be used to interpret carriage control, as X does not normally do so. Restrictions and limitations
Files

Files cannot be members of files or members of dynamically allocated structures.

Arrays, sets and strings

The calculations involving array subscripts and set elements are done with 16 bit arithmetic. This restricts the types over which arrays and sets may be defined. The lower bound of such a range must be greater than or equal to -32768, and the upper bound less than 32768. In particular, strings may have any length from 1 to 65535 characters, and sets may contain no more than 65535 elements.

Line and symbol length

There is no intrinsic limit on the length of identifiers. Identifiers are considered to be distinct if they differ in any single position over their entire length. There is a limit, however, on the maximum input line length. This limit is quite generous however, currently exceeding 160 characters.

Procedure and function nesting and program size

At most 20 levels of procedure and function nesting are allowed. There is no fundamental, translator defined limit on the size of the program which can be translated. The ultimate limit is supplied by the hardware and thus, on the \s-2PDP\s0-11, by the 16 bit address space. If one runs up against the `ran out of memory' diagnostic the program may yet translate if smaller procedures are used, as a lot of space is freed by the translator at the completion of each procedure or function in the current implementation.

On the \s-2VAX\s0-11, there is an implementation defined limit of 65536 bytes per variable. There is no limit on the number of variables.

Overflow

There is currently no checking for overflow on arithmetic operations at run-time on the \s-2PDP\s0-11. Overflow checking is performed on the \s-2VAX\s0-11 by the hardware.

Added types, operators, procedures and functions

Additional predefined types

The type alfa is predefined as: .LS \*btype alfa = \*bpacked \*barray [ 1..10 ] \*bof \*bchar .LE

The type intset is predefined as: .LS \*btype intset = \*bset of 0..127 .LE In most cases the context of an expression involving a constant set allows the translator to determine the type of the set, even though the constant set itself may not uniquely determine this type. In the cases where it is not possible to determine the type of the set from local context, the expression type defaults to a set over the entire base type unless the base type is integer\*(dg. .FS \*(dgThe current translator makes a special case of the construct `if ... in [ ... ]' and enforces only the more lax restriction on 16 bit arithmetic given above in this case. .FE In the latter case the type defaults to the current binding of intset, which must be ``type set of (a subrange of) integer'' at that point.

Note that if intset is redefined via: .LS \*btype intset = \*bset of 0..58; .LE then the default integer set is the implicit intset of Pascal 6000-3.4

Additional predefined operators

The relationals `<' and `>' of proper set inclusion are available. With a and b sets, note that .LS (\*bnot (a < b)) <> (a >= b) .LE As an example consider the sets a = [0,2] and b = [1]. The only relation true between these sets is `<>'.

Non-standard procedures
argv(i,a) 25
where i is an integer and a is a string variable assigns the (possibly truncated or blank padded) i \|'th argument of the invocation of the current X process to the variable a . The range of valid i is 0 to argc-1 .
date(a)
assigns the current date to the alfa variable a in the format `dd mmm yy ', where `mmm' is the first three characters of the month, i.e. `Apr'.
flush(f)
writes the output buffered for Pascal file f into the associated X file.
halt
terminates the execution of the program with a control flow backtrace.
linelimit(f,x)\*(dd
.FS \*(ddCurrently ignored by pdp-11 .X . .FE with f a textfile and x an integer expression causes the program to be abnormally terminated if more than x lines are written on file f . If x is less than 0 then no limit is imposed.
message(x,...)
causes the parameters, which have the format of those to the built-in procedure write, to be written unbuffered on the diagnostic unit 2, almost always the user's terminal.
null
a procedure of no arguments which does absolutely nothing. It is useful as a place holder, and is generated by .XP in place of the invisible empty statement.
remove(a)
where a is a string causes the X file whose name is a, with trailing blanks eliminated, to be removed.
reset(f,a)
where a is a string causes the file whose name is a (with blanks trimmed) to be associated with f in addition to the normal function of reset.
rewrite(f,a)
is analogous to `reset' above.
stlimit(i)
where i is an integer sets the statement limit to be i statements. Specifying the p option to pc disables statement limit counting.
time(a)
causes the current time in the form ` hh:mm:ss ' to be assigned to the alfa variable a.
Non-standard functions
argc 25
returns the count of arguments when the Pascal program was invoked. Argc is always at least 1.
card(x)
returns the cardinality of the set x, i.e. the number of elements contained in the set.
clock
returns an integer which is the number of central processor milliseconds of user time used by this process.
expo(x)
yields the integer valued exponent of the floating-point representation of x ; expo(x) = entier(log2(abs(x))).
random(x)
where x is a real parameter, evaluated but otherwise ignored, invokes a linear congruential random number generator. Successive seeds are generated as (seed*a + c) mod m and the new random number is a normalization of the seed to the range 0.0 to 1.0; a is 62605, c is 113218009, and m is 536870912. The initial seed is 7774755.
seed(i)
where i is an integer sets the random number generator seed to i and returns the previous seed. Thus seed(seed(i)) has no effect except to yield value i.
sysclock
an integer function of no arguments returns the number of central processor milliseconds of system time used by this process.
undefined(x)
a Boolean function. Its argument is a real number and it always returns false.
wallclock
an integer function of no arguments returns the time in seconds since 00:00:00 GMT January 1, 1970. Remarks on standard and portable Pascal

It is occasionally desirable to prepare Pascal programs which will be acceptable at other Pascal installations. While certain system dependencies are bound to creep in, judicious design and programming practice can usually eliminate most of the non-portable usages. Wirth's "Pascal Report" concludes with a standard for implementation and program exchange.

In particular, the following differences may cause trouble when attempting to transport programs between this implementation and Pascal 6000-3.4. Using the s translator option may serve to indicate many problem areas.\*(dg .FS \*(dgThe s option does not, however, check that identifiers differ in the first 8 characters. Pi and

C also do not check the semantics of packed . .FE

Features not available in Berkeley Pascal
Segmented files and associated functions and procedures.
The function trunc with two arguments.
Arrays whose indices exceed the capacity of 16 bit arithmetic.
Features available in Berkeley Pascal but not in Pascal 6000-3.4
The procedures reset and rewrite with file names.
The functions argc, seed, sysclock, and wallclock.
The procedures argv, flush, and remove.
Message with arguments other than character strings.
Write with keyword hex .
The assert statement.
Reading and writing of enumerated types.
Allowing functions to return structures.
Separate compilation of programs.
Comparison of records.
Other problem areas

Sets and strings are more general in ® P ; see the restrictions given in the Jensen-Wirth "User Manual" for details on the 6000-3.4 restrictions.

The character set differences may cause problems, especially the use of the function chr, characters as arguments to ord, and comparisons of characters, since the character set ordering differs between the two machines.

The Pascal 6000-3.4 compiler uses a less strict notion of type equivalence. In P , types are considered identical only if they are represented by the same type identifier. Thus, in particular, unnamed types are unique to the variables/fields declared with them.

Pascal 6000-3.4 doesn't recognize our option flags, so it is wise to put the control of P options to the end of option lists or, better yet, restrict the option list length to one.

For Pascal 6000-3.4 the ordering of files in the program statement has significance. It is desirable to place input and output as the first two files in the program statement.

Acknowledgments

The financial support of William Joy and Susan Graham by the National Science Foundation under grants MCS74-07644-A04, MCS78-07291, and MCS80-05144, and of William Joy by an IBM Graduate Fellowship are gratefully acknowledged.