xref: /netbsd/external/gpl3/gdb/dist/bfd/doc/bfdsumm.texi (revision 1424dfb3)
1377e23a2Schristos@c This summary of BFD is shared by the BFD and LD docs.
2*1424dfb3Schristos@c Copyright (C) 2012-2020 Free Software Foundation, Inc.
348596154Schristos
4377e23a2SchristosWhen an object file is opened, BFD subroutines automatically determine
5377e23a2Schristosthe format of the input object file.  They then build a descriptor in
6377e23a2Schristosmemory with pointers to routines that will be used to access elements of
7377e23a2Schristosthe object file's data structures.
8377e23a2Schristos
9377e23a2SchristosAs different information from the object files is required,
10377e23a2SchristosBFD reads from different sections of the file and processes them.
11377e23a2SchristosFor example, a very common operation for the linker is processing symbol
12377e23a2Schristostables.  Each BFD back end provides a routine for converting
13377e23a2Schristosbetween the object file's representation of symbols and an internal
14377e23a2Schristoscanonical format. When the linker asks for the symbol table of an object
15377e23a2Schristosfile, it calls through a memory pointer to the routine from the
16377e23a2Schristosrelevant BFD back end which reads and converts the table into a canonical
17377e23a2Schristosform.  The linker then operates upon the canonical form. When the link is
18377e23a2Schristosfinished and the linker writes the output file's symbol table,
19377e23a2Schristosanother BFD back end routine is called to take the newly
20377e23a2Schristoscreated symbol table and convert it into the chosen output format.
21377e23a2Schristos
22377e23a2Schristos@menu
23377e23a2Schristos* BFD information loss::	Information Loss
24377e23a2Schristos* Canonical format::		The BFD	canonical object-file format
25377e23a2Schristos@end menu
26377e23a2Schristos
27377e23a2Schristos@node BFD information loss
28377e23a2Schristos@subsection Information Loss
29377e23a2Schristos
30377e23a2Schristos@emph{Information can be lost during output.} The output formats
31377e23a2Schristossupported by BFD do not provide identical facilities, and
32377e23a2Schristosinformation which can be described in one form has nowhere to go in
33377e23a2Schristosanother format. One example of this is alignment information in
34377e23a2Schristos@code{b.out}. There is nowhere in an @code{a.out} format file to store
35377e23a2Schristosalignment information on the contained data, so when a file is linked
36377e23a2Schristosfrom @code{b.out} and an @code{a.out} image is produced, alignment
37377e23a2Schristosinformation will not propagate to the output file. (The linker will
38377e23a2Schristosstill use the alignment information internally, so the link is performed
39377e23a2Schristoscorrectly).
40377e23a2Schristos
41377e23a2SchristosAnother example is COFF section names. COFF files may contain an
42377e23a2Schristosunlimited number of sections, each one with a textual section name. If
43377e23a2Schristosthe target of the link is a format which does not have many sections (e.g.,
44377e23a2Schristos@code{a.out}) or has sections without names (e.g., the Oasys format), the
45377e23a2Schristoslink cannot be done simply. You can circumvent this problem by
46377e23a2Schristosdescribing the desired input-to-output section mapping with the linker command
47377e23a2Schristoslanguage.
48377e23a2Schristos
49377e23a2Schristos@emph{Information can be lost during canonicalization.} The BFD
50377e23a2Schristosinternal canonical form of the external formats is not exhaustive; there
51377e23a2Schristosare structures in input formats for which there is no direct
52377e23a2Schristosrepresentation internally.  This means that the BFD back ends
53377e23a2Schristoscannot maintain all possible data richness through the transformation
54377e23a2Schristosbetween external to internal and back to external formats.
55377e23a2Schristos
56377e23a2SchristosThis limitation is only a problem when an application reads one
57377e23a2Schristosformat and writes another.  Each BFD back end is responsible for
58377e23a2Schristosmaintaining as much data as possible, and the internal BFD
59377e23a2Schristoscanonical form has structures which are opaque to the BFD core,
60377e23a2Schristosand exported only to the back ends. When a file is read in one format,
61377e23a2Schristosthe canonical form is generated for BFD and the application. At the
62377e23a2Schristossame time, the back end saves away any information which may otherwise
63377e23a2Schristosbe lost. If the data is then written back in the same format, the back
64377e23a2Schristosend routine will be able to use the canonical form provided by the
65377e23a2SchristosBFD core as well as the information it prepared earlier.  Since
66377e23a2Schristosthere is a great deal of commonality between back ends,
67377e23a2Schristosthere is no information lost when
68377e23a2Schristoslinking or copying big endian COFF to little endian COFF, or @code{a.out} to
69377e23a2Schristos@code{b.out}.  When a mixture of formats is linked, the information is
70377e23a2Schristosonly lost from the files whose format differs from the destination.
71377e23a2Schristos
72377e23a2Schristos@node Canonical format
73377e23a2Schristos@subsection The BFD canonical object-file format
74377e23a2Schristos
75377e23a2SchristosThe greatest potential for loss of information occurs when there is the least
76377e23a2Schristosoverlap between the information provided by the source format, that
77377e23a2Schristosstored by the canonical format, and that needed by the
78377e23a2Schristosdestination format. A brief description of the canonical form may help
79377e23a2Schristosyou understand which kinds of data you can count on preserving across
80377e23a2Schristosconversions.
81377e23a2Schristos@cindex BFD canonical format
82377e23a2Schristos@cindex internal object-file format
83377e23a2Schristos
84377e23a2Schristos@table @emph
85377e23a2Schristos@item files
86377e23a2SchristosInformation stored on a per-file basis includes target machine
87377e23a2Schristosarchitecture, particular implementation format type, a demand pageable
88377e23a2Schristosbit, and a write protected bit.  Information like Unix magic numbers is
89377e23a2Schristosnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
90377e23a2Schristosfile would have both the demand pageable bit and the write protected
91377e23a2Schristostext bit set.  The byte order of the target is stored on a per-file
92377e23a2Schristosbasis, so that big- and little-endian object files may be used with one
93377e23a2Schristosanother.
94377e23a2Schristos
95377e23a2Schristos@item sections
96377e23a2SchristosEach section in the input file contains the name of the section, the
97377e23a2Schristossection's original address in the object file, size and alignment
98377e23a2Schristosinformation, various flags, and pointers into other BFD data
99377e23a2Schristosstructures.
100377e23a2Schristos
101377e23a2Schristos@item symbols
102377e23a2SchristosEach symbol contains a pointer to the information for the object file
103377e23a2Schristoswhich originally defined it, its name, its value, and various flag
104377e23a2Schristosbits.  When a BFD back end reads in a symbol table, it relocates all
105377e23a2Schristossymbols to make them relative to the base of the section where they were
106377e23a2Schristosdefined.  Doing this ensures that each symbol points to its containing
107377e23a2Schristossection.  Each symbol also has a varying amount of hidden private data
108377e23a2Schristosfor the BFD back end.  Since the symbol points to the original file, the
109377e23a2Schristosprivate data format for that symbol is accessible.  @code{ld} can
110377e23a2Schristosoperate on a collection of symbols of wildly different formats without
111377e23a2Schristosproblems.
112377e23a2Schristos
113377e23a2SchristosNormal global and simple local symbols are maintained on output, so an
114377e23a2Schristosoutput file (no matter its format) will retain symbols pointing to
115377e23a2Schristosfunctions and to global, static, and common variables.  Some symbol
116377e23a2Schristosinformation is not worth retaining; in @code{a.out}, type information is
117377e23a2Schristosstored in the symbol table as long symbol names.  This information would
11807163879Schristosbe useless to most COFF debuggers; the linker has command-line switches
119377e23a2Schristosto allow users to throw it away.
120377e23a2Schristos
121377e23a2SchristosThere is one word of type information within the symbol, so if the
122377e23a2Schristosformat supports symbol type information within symbols (for example, COFF,
12307163879SchristosOasys) and the type is simple enough to fit within one word
124377e23a2Schristos(nearly everything but aggregates), the information will be preserved.
125377e23a2Schristos
126377e23a2Schristos@item relocation level
127377e23a2SchristosEach canonical BFD relocation record contains a pointer to the symbol to
128377e23a2Schristosrelocate to, the offset of the data to relocate, the section the data
129377e23a2Schristosis in, and a pointer to a relocation type descriptor. Relocation is
130377e23a2Schristosperformed by passing messages through the relocation type
131377e23a2Schristosdescriptor and the symbol pointer. Therefore, relocations can be performed
132377e23a2Schristoson output data using a relocation method that is only available in one of the
133377e23a2Schristosinput formats. For instance, Oasys provides a byte relocation format.
134377e23a2SchristosA relocation record requesting this relocation type would point
135377e23a2Schristosindirectly to a routine to perform this, so the relocation may be
136377e23a2Schristosperformed on a byte being written to a 68k COFF file, even though 68k COFF
137377e23a2Schristoshas no such relocation type.
138377e23a2Schristos
139377e23a2Schristos@item line numbers
140377e23a2SchristosObject formats can contain, for debugging purposes, some form of mapping
141377e23a2Schristosbetween symbols, source line numbers, and addresses in the output file.
142377e23a2SchristosThese addresses have to be relocated along with the symbol information.
143377e23a2SchristosEach symbol with an associated list of line number records points to the
144377e23a2Schristosfirst record of the list.  The head of a line number list consists of a
145377e23a2Schristospointer to the symbol, which allows finding out the address of the
146377e23a2Schristosfunction whose line number is being described. The rest of the list is
147377e23a2Schristosmade up of pairs: offsets into the section and line numbers. Any format
148377e23a2Schristoswhich can simply derive this information can pass it successfully
14907163879Schristosbetween formats.
150377e23a2Schristos@end table
151