xref: /original-bsd/share/man/man5/a.out.5 (revision f4a18198)
1.\" Copyright (c) 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" This man page is derived from documentation contributed to Berkeley by
5.\" Donn Seeley at UUNET Technologies, Inc.
6.\"
7.\" %sccs.include.redist.roff%
8.\"
9.\"	@(#)a.out.5	8.2 (Berkeley) 06/01/94
10.\"
11.Dd
12.Dt A.OUT 5
13.Os
14.Sh NAME
15.Nm a.out
16.Nd format of executable binary files
17.Sh SYNOPSIS
18.Fd #include <a.out.h>
19.Sh DESCRIPTION
20The include file
21.Aq Pa a.out.h
22declares three structures and several macros.
23The structures describe the format of
24executable machine code files
25.Pq Sq binaries
26on the system.
27.Pp
28A binary file consists of up to 7 sections.
29In order, these sections are:
30.Bl -tag -width "text relocations"
31.It exec header
32Contains parameters used by the kernel
33to load a binary file into memory and execute it,
34and by the link editor
35.Xr ld 1
36to combine a binary file with other binary files.
37This section is the only mandatory one.
38.It text segment
39Contains machine code and related data
40that are loaded into memory when a program executes.
41May be loaded read-only.
42.It data segment
43Contains initialized data; always loaded into writable memory.
44.It text relocations
45Contains records used by the link editor
46to update pointers in the text segment when combining binary files.
47.It data relocations
48Like the text relocation section, but for data segment pointers.
49.It symbol table
50Contains records used by the link editor
51to cross reference the addresses of named variables and functions
52.Pq Sq symbols
53between binary files.
54.It string table
55Contains the character strings corresponding to the symbol names.
56.El
57.Pp
58Every binary file begins with an
59.Fa exec
60structure:
61.Bd -literal -offset indent
62struct exec {
63	unsigned short	a_mid;
64	unsigned short	a_magic;
65	unsigned long	a_text;
66	unsigned long	a_data;
67	unsigned long	a_bss;
68	unsigned long	a_syms;
69	unsigned long	a_entry;
70	unsigned long	a_trsize;
71	unsigned long	a_drsize;
72};
73.Ed
74.Pp
75The fields have the following functions:
76.Bl -tag -width a_trsize
77.It Fa a_mid
78Contains a bit pattern that
79identifies binaries that were built for
80certain sub-classes of an architecture
81.Pq Sq machine IDs
82or variants of the operating system on a given architecture.
83The kernel may not support all machine IDs
84on a given architecture.
85The
86.Fa a_mid
87field is not present on some architectures;
88in this case, the
89.Fa a_magic
90field has type
91.Em unsigned long .
92.It Fa a_magic
93Contains a bit pattern
94.Pq Sq magic number
95that uniquely identifies binary files
96and distinguishes different loading conventions.
97The field must contain one of the following values:
98.Bl -tag -width ZMAGIC
99.ne 1i
100.It Dv OMAGIC
101The text and data segments immediately follow the header
102and are contiguous.
103The kernel loads both text and data segments into writable memory.
104.It Dv NMAGIC
105As with
106.Dv OMAGIC ,
107text and data segments immediately follow the header and are contiguous.
108However, the kernel loads the text into read-only memory
109and loads the data into writable memory at the next
110page boundary after the text.
111.It Dv ZMAGIC
112The kernel loads individual pages on demand from the binary.
113The header, text segment and data segment are all
114padded by the link editor to a multiple of the page size.
115Pages that the kernel loads from the text segment are read-only,
116while pages from the data segment are writable.
117.El
118.It Fa a_text
119Contains the size of the text segment in bytes.
120.It Fa a_data
121Contains the size of the data segment in bytes.
122.It Fa a_bss
123Contains the number of bytes in the
124.Sq bss segment
125and is used by the kernel to set the initial break
126.Pq Xr brk 2
127after the data segment.
128The kernel loads the program so that this amount of writable memory
129appears to follow the data segment and initially reads as zeroes.
130.It Fa a_syms
131Contains the size in bytes of the symbol table section.
132.It Fa a_entry
133Contains the address in memory of the entry point
134of the program after the kernel has loaded it;
135the kernel starts the execution of the program
136from the machine instruction at this address.
137.It Fa a_trsize
138Contains the size in bytes of the text relocation table.
139.It Fa a_drsize
140Contains the size in bytes of the data relocation table.
141.El
142.Pp
143The
144.Pa a.out.h
145include file defines several macros which use an
146.Fa exec
147structure to test consistency or to locate section offsets in the binary file.
148.Bl -tag -width N_BADMAG(exec)
149.It Fn N_BADMAG exec
150Nonzero if the
151.Fa a_magic
152field does not contain a recognized value.
153.It Fn N_TXTOFF exec
154The byte offset in the binary file of the beginning of the text segment.
155.It Fn N_SYMOFF exec
156The byte offset of the beginning of the symbol table.
157.It Fn N_STROFF exec
158The byte offset of the beginning of the string table.
159.El
160.Pp
161Relocation records have a standard format which
162is described by the
163.Fa relocation_info
164structure:
165.Bd -literal -offset indent
166struct relocation_info {
167	int		r_address;
168	unsigned int	r_symbolnum : 24,
169			r_pcrel : 1,
170			r_length : 2,
171			r_extern : 1,
172			: 4;
173};
174.Ed
175.Pp
176The
177.Fa relocation_info
178fields are used as follows:
179.Bl -tag -width r_symbolnum
180.It Fa r_address
181Contains the byte offset of a pointer that needs to be link-edited.
182Text relocation offsets are reckoned from the start of the text segment,
183and data relocation offsets from the start of the data segment.
184The link editor adds the value that is already stored at this offset
185into the new value that it computes using this relocation record.
186.ne 1i
187.It Fa r_symbolnum
188Contains the ordinal number of a symbol structure
189in the symbol table (it is
190.Em not
191a byte offset).
192After the link editor resolves the absolute address for this symbol,
193it adds that address to the pointer that is undergoing relocation.
194(If the
195.Fa r_extern
196bit is clear, the situation is different; see below.)
197.It Fa r_pcrel
198If this is set,
199the link editor assumes that it is updating a pointer
200that is part of a machine code instruction using pc-relative addressing.
201The address of the relocated pointer is implicitly added
202to its value when the running program uses it.
203.It Fa r_length
204Contains the log base 2 of the length of the pointer in bytes;
2050 for 1-byte displacements, 1 for 2-byte displacements,
2062 for 4-byte displacements.
207.It Fa r_extern
208Set if this relocation requires an external reference;
209the link editor must use a symbol address to update the pointer.
210When the
211.Fa r_extern
212bit is clear, the relocation is
213.Sq local ;
214the link editor updates the pointer to reflect
215changes in the load addresses of the various segments,
216rather than changes in the value of a symbol.
217In this case, the content of the
218.Fa r_symbolnum
219field is an
220.Fa n_type
221value (see below);
222this type field tells the link editor
223what segment the relocated pointer points into.
224.El
225.Pp
226Symbols map names to addresses (or more generally, strings to values).
227Since the link-editor adjusts addresses,
228a symbol's name must be used to stand for its address
229until an absolute value has been assigned.
230Symbols consist of a fixed-length record in the symbol table
231and a variable-length name in the string table.
232The symbol table is an array of
233.Fa nlist
234structures:
235.Bd -literal -offset indent
236struct nlist {
237	union {
238		char	*n_name;
239		long	n_strx;
240	} n_un;
241	unsigned char	n_type;
242	char		n_other;
243	short		n_desc;
244	unsigned long	n_value;
245};
246.Ed
247.Pp
248The fields are used as follows:
249.Bl -tag -width n_un.n_strx
250.It Fa n_un.n_strx
251Contains a byte offset into the string table
252for the name of this symbol.
253When a program accesses a symbol table with the
254.Xr nlist 3
255function,
256this field is replaced with the
257.Fa n_un.n_name
258field, which is a pointer to the string in memory.
259.It Fa n_type
260Used by the link editor to determine
261how to update the symbol's value.
262The
263.Fa n_type
264field is broken down into three sub-fields using bitmasks.
265The link editor treats symbols with the
266.Dv N_EXT
267type bit set as
268.Sq external
269symbols and permits references to them from other binary files.
270The
271.Dv N_TYPE
272mask selects bits of interest to the link editor:
273.Bl -tag -width N_TEXT
274.It Dv N_UNDF
275An undefined symbol.
276The link editor must locate an external symbol with the same name
277in another binary file to determine the absolute value of this symbol.
278As a special case, if the
279.Fa n_value
280field is nonzero and no binary file in the link-edit defines this symbol,
281the link-editor will resolve this symbol to an address
282in the bss segment,
283reserving an amount of bytes equal to
284.Fa n_value .
285If this symbol is undefined in more than one binary file
286and the binary files do not agree on the size,
287the link editor chooses the greatest size found across all binaries.
288.It Dv N_ABS
289An absolute symbol.
290The link editor does not update an absolute symbol.
291.It Dv N_TEXT
292A text symbol.
293This symbol's value is a text address and
294the link editor will update it when it merges binary files.
295.It Dv N_DATA
296A data symbol; similar to
297.Dv N_TEXT
298but for data addresses.
299The values for text and data symbols are not file offsets but
300addresses; to recover the file offsets, it is necessary
301to identify the loaded address of the beginning of the corresponding
302section and subtract it, then add the offset of the section.
303.It Dv N_BSS
304A bss symbol; like text or data symbols but
305has no corresponding offset in the binary file.
306.It Dv N_FN
307A filename symbol.
308The link editor inserts this symbol before
309the other symbols from a binary file when
310merging binary files.
311The name of the symbol is the filename given to the link editor,
312and its value is the first text address from that binary file.
313Filename symbols are not needed for link-editing or loading,
314but are useful for debuggers.
315.El
316.Pp
317The
318.Dv N_STAB
319mask selects bits of interest to symbolic debuggers
320such as
321.Xr gdb 1 ;
322the values are described in
323.Xr stab 5 .
324.It Fa n_other
325This field is currently unused.
326.It Fa n_desc
327Reserved for use by debuggers; passed untouched by the link editor.
328Different debuggers use this field for different purposes.
329.It Fa n_value
330Contains the value of the symbol.
331For text, data and bss symbols, this is an address;
332for other symbols (such as debugger symbols),
333the value may be arbitrary.
334.El
335.Pp
336The string table consists of an
337.Em unsigned long
338length followed by null-terminated symbol strings.
339The length represents the size of the entire table in bytes,
340so its minimum value (or the offset of the first string)
341is always 4 on 32-bit machines.
342.Sh SEE ALSO
343.Xr ld 1 ,
344.Xr execve 2 ,
345.Xr nlist 3 ,
346.Xr core 5 ,
347.Xr dbx 5 ,
348.Xr stab 5
349.Sh HISTORY
350The
351.Pa a.out.h
352include file appeared in
353.At v7 .
354.Sh BUGS
355Since not all of the supported architectures use the
356.Fa a_mid
357field,
358it can be difficult to determine what
359architecture a binary will execute on
360without examining its actual machine code.
361Even with a machine identifier,
362the byte order of the
363.Fa exec
364header is machine-dependent.
365.Pp
366Nobody seems to agree on what
367.Em bss
368stands for.
369.Pp
370New binary file formats may be supported in the future,
371and they probably will not be compatible at any level
372with this ancient format.
373