1*a9fa9459Szrj@c This summary of BFD is shared by the BFD and LD docs.
2*a9fa9459Szrj@c Copyright (C) 2012-2016 Free Software Foundation, Inc.
3*a9fa9459Szrj
4*a9fa9459SzrjWhen an object file is opened, BFD subroutines automatically determine
5*a9fa9459Szrjthe format of the input object file.  They then build a descriptor in
6*a9fa9459Szrjmemory with pointers to routines that will be used to access elements of
7*a9fa9459Szrjthe object file's data structures.
8*a9fa9459Szrj
9*a9fa9459SzrjAs different information from the object files is required,
10*a9fa9459SzrjBFD reads from different sections of the file and processes them.
11*a9fa9459SzrjFor example, a very common operation for the linker is processing symbol
12*a9fa9459Szrjtables.  Each BFD back end provides a routine for converting
13*a9fa9459Szrjbetween the object file's representation of symbols and an internal
14*a9fa9459Szrjcanonical format. When the linker asks for the symbol table of an object
15*a9fa9459Szrjfile, it calls through a memory pointer to the routine from the
16*a9fa9459Szrjrelevant BFD back end which reads and converts the table into a canonical
17*a9fa9459Szrjform.  The linker then operates upon the canonical form. When the link is
18*a9fa9459Szrjfinished and the linker writes the output file's symbol table,
19*a9fa9459Szrjanother BFD back end routine is called to take the newly
20*a9fa9459Szrjcreated symbol table and convert it into the chosen output format.
21*a9fa9459Szrj
22*a9fa9459Szrj@menu
23*a9fa9459Szrj* BFD information loss::	Information Loss
24*a9fa9459Szrj* Canonical format::		The BFD	canonical object-file format
25*a9fa9459Szrj@end menu
26*a9fa9459Szrj
27*a9fa9459Szrj@node BFD information loss
28*a9fa9459Szrj@subsection Information Loss
29*a9fa9459Szrj
30*a9fa9459Szrj@emph{Information can be lost during output.} The output formats
31*a9fa9459Szrjsupported by BFD do not provide identical facilities, and
32*a9fa9459Szrjinformation which can be described in one form has nowhere to go in
33*a9fa9459Szrjanother format. One example of this is alignment information in
34*a9fa9459Szrj@code{b.out}. There is nowhere in an @code{a.out} format file to store
35*a9fa9459Szrjalignment information on the contained data, so when a file is linked
36*a9fa9459Szrjfrom @code{b.out} and an @code{a.out} image is produced, alignment
37*a9fa9459Szrjinformation will not propagate to the output file. (The linker will
38*a9fa9459Szrjstill use the alignment information internally, so the link is performed
39*a9fa9459Szrjcorrectly).
40*a9fa9459Szrj
41*a9fa9459SzrjAnother example is COFF section names. COFF files may contain an
42*a9fa9459Szrjunlimited number of sections, each one with a textual section name. If
43*a9fa9459Szrjthe target of the link is a format which does not have many sections (e.g.,
44*a9fa9459Szrj@code{a.out}) or has sections without names (e.g., the Oasys format), the
45*a9fa9459Szrjlink cannot be done simply. You can circumvent this problem by
46*a9fa9459Szrjdescribing the desired input-to-output section mapping with the linker command
47*a9fa9459Szrjlanguage.
48*a9fa9459Szrj
49*a9fa9459Szrj@emph{Information can be lost during canonicalization.} The BFD
50*a9fa9459Szrjinternal canonical form of the external formats is not exhaustive; there
51*a9fa9459Szrjare structures in input formats for which there is no direct
52*a9fa9459Szrjrepresentation internally.  This means that the BFD back ends
53*a9fa9459Szrjcannot maintain all possible data richness through the transformation
54*a9fa9459Szrjbetween external to internal and back to external formats.
55*a9fa9459Szrj
56*a9fa9459SzrjThis limitation is only a problem when an application reads one
57*a9fa9459Szrjformat and writes another.  Each BFD back end is responsible for
58*a9fa9459Szrjmaintaining as much data as possible, and the internal BFD
59*a9fa9459Szrjcanonical form has structures which are opaque to the BFD core,
60*a9fa9459Szrjand exported only to the back ends. When a file is read in one format,
61*a9fa9459Szrjthe canonical form is generated for BFD and the application. At the
62*a9fa9459Szrjsame time, the back end saves away any information which may otherwise
63*a9fa9459Szrjbe lost. If the data is then written back in the same format, the back
64*a9fa9459Szrjend routine will be able to use the canonical form provided by the
65*a9fa9459SzrjBFD core as well as the information it prepared earlier.  Since
66*a9fa9459Szrjthere is a great deal of commonality between back ends,
67*a9fa9459Szrjthere is no information lost when
68*a9fa9459Szrjlinking or copying big endian COFF to little endian COFF, or @code{a.out} to
69*a9fa9459Szrj@code{b.out}.  When a mixture of formats is linked, the information is
70*a9fa9459Szrjonly lost from the files whose format differs from the destination.
71*a9fa9459Szrj
72*a9fa9459Szrj@node Canonical format
73*a9fa9459Szrj@subsection The BFD canonical object-file format
74*a9fa9459Szrj
75*a9fa9459SzrjThe greatest potential for loss of information occurs when there is the least
76*a9fa9459Szrjoverlap between the information provided by the source format, that
77*a9fa9459Szrjstored by the canonical format, and that needed by the
78*a9fa9459Szrjdestination format. A brief description of the canonical form may help
79*a9fa9459Szrjyou understand which kinds of data you can count on preserving across
80*a9fa9459Szrjconversions.
81*a9fa9459Szrj@cindex BFD canonical format
82*a9fa9459Szrj@cindex internal object-file format
83*a9fa9459Szrj
84*a9fa9459Szrj@table @emph
85*a9fa9459Szrj@item files
86*a9fa9459SzrjInformation stored on a per-file basis includes target machine
87*a9fa9459Szrjarchitecture, particular implementation format type, a demand pageable
88*a9fa9459Szrjbit, and a write protected bit.  Information like Unix magic numbers is
89*a9fa9459Szrjnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
90*a9fa9459Szrjfile would have both the demand pageable bit and the write protected
91*a9fa9459Szrjtext bit set.  The byte order of the target is stored on a per-file
92*a9fa9459Szrjbasis, so that big- and little-endian object files may be used with one
93*a9fa9459Szrjanother.
94*a9fa9459Szrj
95*a9fa9459Szrj@item sections
96*a9fa9459SzrjEach section in the input file contains the name of the section, the
97*a9fa9459Szrjsection's original address in the object file, size and alignment
98*a9fa9459Szrjinformation, various flags, and pointers into other BFD data
99*a9fa9459Szrjstructures.
100*a9fa9459Szrj
101*a9fa9459Szrj@item symbols
102*a9fa9459SzrjEach symbol contains a pointer to the information for the object file
103*a9fa9459Szrjwhich originally defined it, its name, its value, and various flag
104*a9fa9459Szrjbits.  When a BFD back end reads in a symbol table, it relocates all
105*a9fa9459Szrjsymbols to make them relative to the base of the section where they were
106*a9fa9459Szrjdefined.  Doing this ensures that each symbol points to its containing
107*a9fa9459Szrjsection.  Each symbol also has a varying amount of hidden private data
108*a9fa9459Szrjfor the BFD back end.  Since the symbol points to the original file, the
109*a9fa9459Szrjprivate data format for that symbol is accessible.  @code{ld} can
110*a9fa9459Szrjoperate on a collection of symbols of wildly different formats without
111*a9fa9459Szrjproblems.
112*a9fa9459Szrj
113*a9fa9459SzrjNormal global and simple local symbols are maintained on output, so an
114*a9fa9459Szrjoutput file (no matter its format) will retain symbols pointing to
115*a9fa9459Szrjfunctions and to global, static, and common variables.  Some symbol
116*a9fa9459Szrjinformation is not worth retaining; in @code{a.out}, type information is
117*a9fa9459Szrjstored in the symbol table as long symbol names.  This information would
118*a9fa9459Szrjbe useless to most COFF debuggers; the linker has command line switches
119*a9fa9459Szrjto allow users to throw it away.
120*a9fa9459Szrj
121*a9fa9459SzrjThere is one word of type information within the symbol, so if the
122*a9fa9459Szrjformat supports symbol type information within symbols (for example, COFF,
123*a9fa9459SzrjIEEE, Oasys) and the type is simple enough to fit within one word
124*a9fa9459Szrj(nearly everything but aggregates), the information will be preserved.
125*a9fa9459Szrj
126*a9fa9459Szrj@item relocation level
127*a9fa9459SzrjEach canonical BFD relocation record contains a pointer to the symbol to
128*a9fa9459Szrjrelocate to, the offset of the data to relocate, the section the data
129*a9fa9459Szrjis in, and a pointer to a relocation type descriptor. Relocation is
130*a9fa9459Szrjperformed by passing messages through the relocation type
131*a9fa9459Szrjdescriptor and the symbol pointer. Therefore, relocations can be performed
132*a9fa9459Szrjon output data using a relocation method that is only available in one of the
133*a9fa9459Szrjinput formats. For instance, Oasys provides a byte relocation format.
134*a9fa9459SzrjA relocation record requesting this relocation type would point
135*a9fa9459Szrjindirectly to a routine to perform this, so the relocation may be
136*a9fa9459Szrjperformed on a byte being written to a 68k COFF file, even though 68k COFF
137*a9fa9459Szrjhas no such relocation type.
138*a9fa9459Szrj
139*a9fa9459Szrj@item line numbers
140*a9fa9459SzrjObject formats can contain, for debugging purposes, some form of mapping
141*a9fa9459Szrjbetween symbols, source line numbers, and addresses in the output file.
142*a9fa9459SzrjThese addresses have to be relocated along with the symbol information.
143*a9fa9459SzrjEach symbol with an associated list of line number records points to the
144*a9fa9459Szrjfirst record of the list.  The head of a line number list consists of a
145*a9fa9459Szrjpointer to the symbol, which allows finding out the address of the
146*a9fa9459Szrjfunction whose line number is being described. The rest of the list is
147*a9fa9459Szrjmade up of pairs: offsets into the section and line numbers. Any format
148*a9fa9459Szrjwhich can simply derive this information can pass it successfully
149*a9fa9459Szrjbetween formats (COFF, IEEE and Oasys).
150*a9fa9459Szrj@end table
151