dis88 1 LOCAL
"NAME"
dis88 - 8088 symbolic disassembler
"SYNOPSIS"
dis88 [ -f -o ] ifile [ ofile ]
"DESCRIPTION"
Dis88 reads ifile, which must be in PC/IX a.out format. It interprets the binary opcodes and data locations, and writes corresponding assembler source code to stdout, or to ofile if specified. The program's output is in the format of, and fully compatible with, the PC/IX assembler, as(1). If a symbol table is present in ifile, labels and references will be symbolic in the output. If the input file lacks a symbol table, the fact will be noted, and the disassembly will proceed, with the disassembler generating synthetic labels as needed. If the input file has split I/D space, or if it is executable, the disassembler will make all necessary adjustments in address-reference calculations.

If the "-o" option appears, object code will be included in comments during disassembly of the text segment. This feature is used primarily for debugging the disassembler itself, but may provide information of passing interest to users.

If the "-f" option appears, dis88 will attempt to disassemble any file whatsoever. It has to assume that the file begins at address zero.

The program always outputs the current machine address before disassembling an opcode. If a symbol table is present, this address is output as an assembler comment; otherwise, it is incorporated into the synthetic label which is generated internally. Since relative jumps, especially short ones, may target unlabelled locations, the program always outputs the physical target address as a comment, to assist the user in following the code.

The text segment of an object file is always padded to an even machine address. In addition, if the file has split I/D space, the text segment will be padded to a paragraph boundary (i.e., an address divisible by 16). As a result of this padding, the disassembler may produce a few spurious, but harmless, instructions at the end of the text segment.

Disassembly of the data segment is a difficult matter. The information to which initialized data refers cannot be inferred from context, except in the special case of an external data or address reference, which will be reflected in the relocation table. Internal data and address references will already be resolved in the object file, and cannot be recreated. Therefore, the data segment is disassembled as a byte stream, with long stretches of null data represented by an appropriate ".zerow" pseudo-op. This limitation notwithstanding, labels (as opposed to symbolic references) are always output at appropriate points within the data segment.

If disassembly of the data segment is difficult, disassembly of the bss segment is quite easy, because uninitialized data is all zero by definition. No data is output in the bss segment, but symbolic labels are output as appropriate.

For each opcode which takes an operand, a particular symbol type (text, data, or bss) is appropriate. This tidy correspondence is complicated somewhat, however, by the existence of assembler symbolic constants and segment override opcodes. Therefore, the disassembler's symbol lookup routine attempts to apply a certain amount of intelligence when it is asked to find a symbol. If it cannot match on a symbol of the preferred type, it may return a symbol of some other type, depending on preassigned (and somewhat arbitrary) rankings within each type. Finally, if all else fails, it returns a string containing the address sought as a hex constant; this behavior allows calling routines to use the output of the lookup function regardless of the success of its search.

It is worth noting, at this point, that the symbol lookup routine operates linearly, and has not been optimized in any way. Execution time is thus likely to increase geometrically with input file size. The disassembler is internally limited to 1500 symbol table entries and 1500 relocation table entries; while these limits are generous (/unix, itself, has fewer than 800 symbols), they are not guaranteed to be adequate in all cases. If the symbol table or the relocation table overflows, the disassembly aborts.

Finally, users should be aware of a bug in the assembler, which causes it not to parse the "esc" mnemonic, even though "esc" is a completely legitimate opcode which is documented in all the Intel literature. To accommodate this deficiency, the disassembler translates opcodes of the "esc" family to .byte directives, but notes the correct mnemonic in a comment for reference.

In all cases, it should be possible to submit the output of the disassembler program to the assembler, and assemble it without error. In most cases, the resulting object code will be identical to the original; in any event, it will be functionally equivalent.

"SEE ALSO"
adb(1), as(1), cc(1), ld(1).

"Assembler Reference Manual" in the PC/IX Programmer's Guide.

"DIAGNOSTICS"
"can't access input file" if the input file cannot be found, opened, or read. "can't open output file" if the output file cannot be created. "warning: host/cpu clash" if the program is run on a machine with a different CPU. "input file not in object format" if the magic number does not correspond to that of a PC/IX object file. "not an 8086/8088 object file" if the CPU ID of the file header is incorrect. "reloc table overflow" if there are more than 1500 entries in the relocation table. "symbol table overflow" if there are more than 1500 entries in the symbol table. "lseek error" if the input file is corrupted (should never happen). "warning: no symbols" if the symbol table is missing. "can't reopen input file" if the input file is removed or altered during program execution (should never happen).
"BUGS"
Numeric co-processor (i.e., 8087) mnemonics are not currently supported. Instructions for the co-processor are disassembled as CPU escape sequences, or as interrupts, depending on how they were assembled in the first place. Despite the program's best efforts, a symbol retrieved from the symbol table may sometimes be different from the symbol used in the original assembly. The disassembler's internal tables are of fixed size, and the program aborts if they overflow.