xref: /original-bsd/share/man/man5/a.out.5 (revision c3e32dec)
1.\" Copyright (c) 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This man page is derived from documentation contributed to Berkeley by
5.\" Donn Seeley at UUNET Technologies, Inc.
6.\"
7.\" %sccs.include.redist.roff%
8.\"
9.\"	@(#)a.out.5	8.1 (Berkeley) 06/05/93
10.\"
11.Dd
12.Dt A.OUT 5
13.Os
14.Sh NAME
15.Nm a.out
16.Nd format of executable binary files
17.Sh SYNOPSIS
18.Fd #include <a.out.h>
19.Sh DESCRIPTION
20The include file
21.Aq Pa a.out.h
22declares three structures and several macros.
23The structures describe the format of
24executable machine code files
25.Pq Sq binaries
26on the system.
27.Pp
28A binary file consists of up to 7 sections.
29In order, these sections are:
30.Bl -tag -width "text relocations"
31.It exec header
32Contains parameters used by the kernel
33to load a binary file into memory and execute it,
34and by the link editor
35.Xr ld 1
36to combine a binary file with other binary files.
37This section is the only mandatory one.
38.It text segment
39Contains machine code and related data
40that are loaded into memory when a program executes.
41May be loaded read-only.
42.It data segment
43Contains initialized data; always loaded into writable memory.
44.It text relocations
45Contains records used by the link editor
46to update pointers in the text segment when combining binary files.
47.It data relocations
48Like the text relocation section, but for data segment pointers.
49.It symbol table
50Contains records used by the link editor
51to cross reference the addresses of named variables and functions
52.Pq Sq symbols
53between binary files.
54.It string table
55Contains the character strings corresponding to the symbol names.
56.El
57.Pp
58Every binary file begins with an
59.Fa exec
60structure:
61.Bd -literal -offset indent
62struct exec {
63	unsigned short	a_mid;
64	unsigned short	a_magic;
65	unsigned long	a_text;
66	unsigned long	a_data;
67	unsigned long	a_bss;
68	unsigned long	a_syms;
69	unsigned long	a_entry;
70	unsigned long	a_trsize;
71	unsigned long	a_drsize;
72};
73.Ed
74.Pp
75The fields have the following functions:
76.Bl -tag -width a_trsize
77.It Fa a_mid
78Contains a bit pattern that
79identifies binaries that were built for
80certain sub-classes of an architecture
81.Pq Sq machine IDs
82or variants of the operating system on a given architecture.
83The kernel may not support all machine IDs
84on a given architecture.
85The
86.Fa a_mid
87field is not present on some architectures;
88in this case, the
89.Fa a_magic
90field has type
91.Em unsigned long .
92.It Fa a_magic
93Contains a bit pattern
94.Pq Sq magic number
95that uniquely identifies binary files
96and distinguishes different loading conventions.
97The field must contain one of the following values:
98.Bl -tag -width ZMAGIC
99.It Dv OMAGIC
100The text and data segments immediately follow the header
101and are contiguous.
102The kernel loads both text and data segments into writable memory.
103.It Dv NMAGIC
104As with
105.Dv OMAGIC ,
106text and data segments immediately follow the header and are contiguous.
107However, the kernel loads the text into read-only memory
108and loads the data into writable memory at the next
109page boundary after the text.
110.It Dv ZMAGIC
111The kernel loads individual pages on demand from the binary.
112The header, text segment and data segment are all
113padded by the link editor to a multiple of the page size.
114Pages that the kernel loads from the text segment are read-only,
115while pages from the data segment are writable.
116.El
117.It Fa a_text
118Contains the size of the text segment in bytes.
119.It Fa a_data
120Contains the size of the data segment in bytes.
121.It Fa a_bss
122Contains the number of bytes in the
123.Sq bss segment
124and is used by the kernel to set the initial break
125.Pq Xr brk 2
126after the data segment.
127The kernel loads the program so that this amount of writable memory
128appears to follow the data segment and initially reads as zeroes.
129.It Fa a_syms
130Contains the size in bytes of the symbol table section.
131.It Fa a_entry
132Contains the address in memory of the entry point
133of the program after the kernel has loaded it;
134the kernel starts the execution of the program
135from the machine instruction at this address.
136.It Fa a_trsize
137Contains the size in bytes of the text relocation table.
138.It Fa a_drsize
139Contains the size in bytes of the data relocation table.
140.El
141.Pp
142The
143.Pa a.out.h
144include file defines several macros which use an
145.Fa exec
146structure to test consistency or to locate section offsets in the binary file.
147.Bl -tag -width N_BADMAG(exec)
148.It Fn N_BADMAG exec
149Nonzero if the
150.Fa a_magic
151field does not contain a recognized value.
152.It Fn N_TXTOFF exec
153The byte offset in the binary file of the beginning of the text segment.
154.It Fn N_SYMOFF exec
155The byte offset of the beginning of the symbol table.
156.It Fn N_STROFF exec
157The byte offset of the beginning of the string table.
158.El
159.Pp
160Relocation records have a standard format which
161is described by the
162.Fa relocation_info
163structure:
164.Bd -literal -offset indent
165struct relocation_info {
166	int		r_address;
167	unsigned int	r_symbolnum : 24,
168			r_pcrel : 1,
169			r_length : 2,
170			r_extern : 1,
171			: 4;
172};
173.Ed
174.Pp
175The
176.Fa relocation_info
177fields are used as follows:
178.Bl -tag -width r_symbolnum
179.It Fa r_address
180Contains the byte offset of a pointer that needs to be link-edited.
181Text relocation offsets are reckoned from the start of the text segment,
182and data relocation offsets from the start of the data segment.
183The link editor adds the value that is already stored at this offset
184into the new value that it computes using this relocation record.
185.It Fa r_symbolnum
186Contains the ordinal number of a symbol structure
187in the symbol table (it is
188.Em not
189a byte offset).
190After the link editor resolves the absolute address for this symbol,
191it adds that address to the pointer that is undergoing relocation.
192(If the
193.Fa r_extern
194bit is clear, the situation is different; see below.)
195.It Fa r_pcrel
196If this is set,
197the link editor assumes that it is updating a pointer
198that is part of a machine code instruction using pc-relative addressing.
199The address of the relocated pointer is implicitly added
200to its value when the running program uses it.
201.It Fa r_length
202Contains the log base 2 of the length of the pointer in bytes;
2030 for 1-byte displacements, 1 for 2-byte displacements,
2042 for 4-byte displacements.
205.It Fa r_extern
206Set if this relocation requires an external reference;
207the link editor must use a symbol address to update the pointer.
208When the
209.Fa r_extern
210bit is clear, the relocation is
211.Sq local ;
212the link editor updates the pointer to reflect
213changes in the load addresses of the various segments,
214rather than changes in the value of a symbol.
215In this case, the content of the
216.Fa r_symbolnum
217field is an
218.Fa n_type
219value (see below);
220this type field tells the link editor
221what segment the relocated pointer points into.
222.El
223.Pp
224Symbols map names to addresses (or more generally, strings to values).
225Since the link-editor adjusts addresses,
226a symbol's name must be used to stand for its address
227until an absolute value has been assigned.
228Symbols consist of a fixed-length record in the symbol table
229and a variable-length name in the string table.
230The symbol table is an array of
231.Fa nlist
232structures:
233.Bd -literal -offset indent
234struct nlist {
235	union {
236		char	*n_name;
237		long	n_strx;
238	} n_un;
239	unsigned char	n_type;
240	char		n_other;
241	short		n_desc;
242	unsigned long	n_value;
243};
244.Ed
245.Pp
246The fields are used as follows:
247.Bl -tag -width n_un.n_strx
248.It Fa n_un.n_strx
249Contains a byte offset into the string table
250for the name of this symbol.
251When a program accesses a symbol table with the
252.Xr nlist 3
253function,
254this field is replaced with the
255.Fa n_un.n_name
256field, which is a pointer to the string in memory.
257.It Fa n_type
258Used by the link editor to determine
259how to update the symbol's value.
260The
261.Fa n_type
262field is broken down into three sub-fields using bitmasks.
263The link editor treats symbols with the
264.Dv N_EXT
265type bit set as
266.Sq external
267symbols and permits references to them from other binary files.
268The
269.Dv N_TYPE
270mask selects bits of interest to the link editor:
271.Bl -tag -width N_TEXT
272.It Dv N_UNDF
273An undefined symbol.
274The link editor must locate an external symbol with the same name
275in another binary file to determine the absolute value of this symbol.
276As a special case, if the
277.Fa n_value
278field is nonzero and no binary file in the link-edit defines this symbol,
279the link-editor will resolve this symbol to an address
280in the bss segment,
281reserving an amount of bytes equal to
282.Fa n_value .
283If this symbol is undefined in more than one binary file
284and the binary files do not agree on the size,
285the link editor chooses the greatest size found across all binaries.
286.It Dv N_ABS
287An absolute symbol.
288The link editor does not update an absolute symbol.
289.It Dv N_TEXT
290A text symbol.
291This symbol's value is a text address and
292the link editor will update it when it merges binary files.
293.It Dv N_DATA
294A data symbol; similar to
295.Dv N_TEXT
296but for data addresses.
297The values for text and data symbols are not file offsets but
298addresses; to recover the file offsets, it is necessary
299to identify the loaded address of the beginning of the corresponding
300section and subtract it, then add the offset of the section.
301.It Dv N_BSS
302A bss symbol; like text or data symbols but
303has no corresponding offset in the binary file.
304.It Dv N_FN
305A filename symbol.
306The link editor inserts this symbol before
307the other symbols from a binary file when
308merging binary files.
309The name of the symbol is the filename given to the link editor,
310and its value is the first text address from that binary file.
311Filename symbols are not needed for link-editing or loading,
312but are useful for debuggers.
313.El
314.Pp
315The
316.Dv N_STAB
317mask selects bits of interest to symbolic debuggers
318such as
319.Xr gdb 1 ;
320the values are described in
321.Xr stab 5 .
322.It Fa n_other
323This field is currently unused.
324.It Fa n_desc
325Reserved for use by debuggers; passed untouched by the link editor.
326Different debuggers use this field for different purposes.
327.It Fa n_value
328Contains the value of the symbol.
329For text, data and bss symbols, this is an address;
330for other symbols (such as debugger symbols),
331the value may be arbitrary.
332.El
333.Pp
334The string table consists of an
335.Em unsigned long
336length followed by null-terminated symbol strings.
337The length represents the size of the entire table in bytes,
338so its minimum value (or the offset of the first string)
339is always 4 on 32-bit machines.
340.Sh SEE ALSO
341.Xr ld 1 ,
342.Xr execve 2 ,
343.Xr nlist 3 ,
344.Xr core 5 ,
345.Xr dbx 5 ,
346.Xr stab 5
347.Sh HISTORY
348The
349.Pa a.out.h
350include file appeared in
351.At v7 .
352.Sh BUGS
353Since not all of the supported architectures use the
354.Fa a_mid
355field,
356it can be difficult to determine what
357architecture a binary will execute on
358without examining its actual machine code.
359Even with a machine identifier,
360the byte order of the
361.Fa exec
362header is machine-dependent.
363.Pp
364Nobody seems to agree on what
365.Em bss
366stands for.
367.Pp
368New binary file formats may be supported in the future,
369and they probably will not be compatible at any level
370with this ancient format.
371