197ec5308Schristos@c This summary of BFD is shared by the BFD and LD docs.
2*184b2d41Schristos@c Copyright (C) 2012-2020 Free Software Foundation, Inc.
397ec5308Schristos
497ec5308SchristosWhen an object file is opened, BFD subroutines automatically determine
597ec5308Schristosthe format of the input object file.  They then build a descriptor in
697ec5308Schristosmemory with pointers to routines that will be used to access elements of
797ec5308Schristosthe object file's data structures.
897ec5308Schristos
997ec5308SchristosAs different information from the object files is required,
1097ec5308SchristosBFD reads from different sections of the file and processes them.
1197ec5308SchristosFor example, a very common operation for the linker is processing symbol
1297ec5308Schristostables.  Each BFD back end provides a routine for converting
1397ec5308Schristosbetween the object file's representation of symbols and an internal
1497ec5308Schristoscanonical format. When the linker asks for the symbol table of an object
1597ec5308Schristosfile, it calls through a memory pointer to the routine from the
1697ec5308Schristosrelevant BFD back end which reads and converts the table into a canonical
1797ec5308Schristosform.  The linker then operates upon the canonical form. When the link is
1897ec5308Schristosfinished and the linker writes the output file's symbol table,
1997ec5308Schristosanother BFD back end routine is called to take the newly
2097ec5308Schristoscreated symbol table and convert it into the chosen output format.
2197ec5308Schristos
2297ec5308Schristos@menu
2397ec5308Schristos* BFD information loss::	Information Loss
2497ec5308Schristos* Canonical format::		The BFD	canonical object-file format
2597ec5308Schristos@end menu
2697ec5308Schristos
2797ec5308Schristos@node BFD information loss
2897ec5308Schristos@subsection Information Loss
2997ec5308Schristos
3097ec5308Schristos@emph{Information can be lost during output.} The output formats
3197ec5308Schristossupported by BFD do not provide identical facilities, and
3297ec5308Schristosinformation which can be described in one form has nowhere to go in
3397ec5308Schristosanother format. One example of this is alignment information in
3497ec5308Schristos@code{b.out}. There is nowhere in an @code{a.out} format file to store
3597ec5308Schristosalignment information on the contained data, so when a file is linked
3697ec5308Schristosfrom @code{b.out} and an @code{a.out} image is produced, alignment
3797ec5308Schristosinformation will not propagate to the output file. (The linker will
3897ec5308Schristosstill use the alignment information internally, so the link is performed
3997ec5308Schristoscorrectly).
4097ec5308Schristos
4197ec5308SchristosAnother example is COFF section names. COFF files may contain an
4297ec5308Schristosunlimited number of sections, each one with a textual section name. If
4397ec5308Schristosthe target of the link is a format which does not have many sections (e.g.,
4497ec5308Schristos@code{a.out}) or has sections without names (e.g., the Oasys format), the
4597ec5308Schristoslink cannot be done simply. You can circumvent this problem by
4697ec5308Schristosdescribing the desired input-to-output section mapping with the linker command
4797ec5308Schristoslanguage.
4897ec5308Schristos
4997ec5308Schristos@emph{Information can be lost during canonicalization.} The BFD
5097ec5308Schristosinternal canonical form of the external formats is not exhaustive; there
5197ec5308Schristosare structures in input formats for which there is no direct
5297ec5308Schristosrepresentation internally.  This means that the BFD back ends
5397ec5308Schristoscannot maintain all possible data richness through the transformation
5497ec5308Schristosbetween external to internal and back to external formats.
5597ec5308Schristos
5697ec5308SchristosThis limitation is only a problem when an application reads one
5797ec5308Schristosformat and writes another.  Each BFD back end is responsible for
5897ec5308Schristosmaintaining as much data as possible, and the internal BFD
5997ec5308Schristoscanonical form has structures which are opaque to the BFD core,
6097ec5308Schristosand exported only to the back ends. When a file is read in one format,
6197ec5308Schristosthe canonical form is generated for BFD and the application. At the
6297ec5308Schristossame time, the back end saves away any information which may otherwise
6397ec5308Schristosbe lost. If the data is then written back in the same format, the back
6497ec5308Schristosend routine will be able to use the canonical form provided by the
6597ec5308SchristosBFD core as well as the information it prepared earlier.  Since
6697ec5308Schristosthere is a great deal of commonality between back ends,
6797ec5308Schristosthere is no information lost when
6897ec5308Schristoslinking or copying big endian COFF to little endian COFF, or @code{a.out} to
6997ec5308Schristos@code{b.out}.  When a mixture of formats is linked, the information is
7097ec5308Schristosonly lost from the files whose format differs from the destination.
7197ec5308Schristos
7297ec5308Schristos@node Canonical format
7397ec5308Schristos@subsection The BFD canonical object-file format
7497ec5308Schristos
7597ec5308SchristosThe greatest potential for loss of information occurs when there is the least
7697ec5308Schristosoverlap between the information provided by the source format, that
7797ec5308Schristosstored by the canonical format, and that needed by the
7897ec5308Schristosdestination format. A brief description of the canonical form may help
7997ec5308Schristosyou understand which kinds of data you can count on preserving across
8097ec5308Schristosconversions.
8197ec5308Schristos@cindex BFD canonical format
8297ec5308Schristos@cindex internal object-file format
8397ec5308Schristos
8497ec5308Schristos@table @emph
8597ec5308Schristos@item files
8697ec5308SchristosInformation stored on a per-file basis includes target machine
8797ec5308Schristosarchitecture, particular implementation format type, a demand pageable
8897ec5308Schristosbit, and a write protected bit.  Information like Unix magic numbers is
8997ec5308Schristosnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
9097ec5308Schristosfile would have both the demand pageable bit and the write protected
9197ec5308Schristostext bit set.  The byte order of the target is stored on a per-file
9297ec5308Schristosbasis, so that big- and little-endian object files may be used with one
9397ec5308Schristosanother.
9497ec5308Schristos
9597ec5308Schristos@item sections
9697ec5308SchristosEach section in the input file contains the name of the section, the
9797ec5308Schristossection's original address in the object file, size and alignment
9897ec5308Schristosinformation, various flags, and pointers into other BFD data
9997ec5308Schristosstructures.
10097ec5308Schristos
10197ec5308Schristos@item symbols
10297ec5308SchristosEach symbol contains a pointer to the information for the object file
10397ec5308Schristoswhich originally defined it, its name, its value, and various flag
10497ec5308Schristosbits.  When a BFD back end reads in a symbol table, it relocates all
10597ec5308Schristossymbols to make them relative to the base of the section where they were
10697ec5308Schristosdefined.  Doing this ensures that each symbol points to its containing
10797ec5308Schristossection.  Each symbol also has a varying amount of hidden private data
10897ec5308Schristosfor the BFD back end.  Since the symbol points to the original file, the
10997ec5308Schristosprivate data format for that symbol is accessible.  @code{ld} can
11097ec5308Schristosoperate on a collection of symbols of wildly different formats without
11197ec5308Schristosproblems.
11297ec5308Schristos
11397ec5308SchristosNormal global and simple local symbols are maintained on output, so an
11497ec5308Schristosoutput file (no matter its format) will retain symbols pointing to
11597ec5308Schristosfunctions and to global, static, and common variables.  Some symbol
11697ec5308Schristosinformation is not worth retaining; in @code{a.out}, type information is
11797ec5308Schristosstored in the symbol table as long symbol names.  This information would
118051580eeSchristosbe useless to most COFF debuggers; the linker has command-line switches
11997ec5308Schristosto allow users to throw it away.
12097ec5308Schristos
12197ec5308SchristosThere is one word of type information within the symbol, so if the
12297ec5308Schristosformat supports symbol type information within symbols (for example, COFF,
123051580eeSchristosOasys) and the type is simple enough to fit within one word
12497ec5308Schristos(nearly everything but aggregates), the information will be preserved.
12597ec5308Schristos
12697ec5308Schristos@item relocation level
12797ec5308SchristosEach canonical BFD relocation record contains a pointer to the symbol to
12897ec5308Schristosrelocate to, the offset of the data to relocate, the section the data
12997ec5308Schristosis in, and a pointer to a relocation type descriptor. Relocation is
13097ec5308Schristosperformed by passing messages through the relocation type
13197ec5308Schristosdescriptor and the symbol pointer. Therefore, relocations can be performed
13297ec5308Schristoson output data using a relocation method that is only available in one of the
13397ec5308Schristosinput formats. For instance, Oasys provides a byte relocation format.
13497ec5308SchristosA relocation record requesting this relocation type would point
13597ec5308Schristosindirectly to a routine to perform this, so the relocation may be
13697ec5308Schristosperformed on a byte being written to a 68k COFF file, even though 68k COFF
13797ec5308Schristoshas no such relocation type.
13897ec5308Schristos
13997ec5308Schristos@item line numbers
14097ec5308SchristosObject formats can contain, for debugging purposes, some form of mapping
14197ec5308Schristosbetween symbols, source line numbers, and addresses in the output file.
14297ec5308SchristosThese addresses have to be relocated along with the symbol information.
14397ec5308SchristosEach symbol with an associated list of line number records points to the
14497ec5308Schristosfirst record of the list.  The head of a line number list consists of a
14597ec5308Schristospointer to the symbol, which allows finding out the address of the
14697ec5308Schristosfunction whose line number is being described. The rest of the list is
14797ec5308Schristosmade up of pairs: offsets into the section and line numbers. Any format
14897ec5308Schristoswhich can simply derive this information can pass it successfully
149051580eeSchristosbetween formats.
15097ec5308Schristos@end table
151