bfd/doc/bfdsumm.texi

*a9fa9459Szrj@c This summary of BFD is shared by the BFD and LD docs.
*a9fa9459Szrj@c Copyright (C) 2012-2016 Free Software Foundation, Inc.
*a9fa9459Szrj
*a9fa9459SzrjWhen an object file is opened, BFD subroutines automatically determine
*a9fa9459Szrjthe format of the input object file.  They then build a descriptor in
*a9fa9459Szrjmemory with pointers to routines that will be used to access elements of
*a9fa9459Szrjthe object file's data structures.
*a9fa9459Szrj
*a9fa9459SzrjAs different information from the object files is required,
*a9fa9459SzrjBFD reads from different sections of the file and processes them.
*a9fa9459SzrjFor example, a very common operation for the linker is processing symbol
*a9fa9459Szrjtables.  Each BFD back end provides a routine for converting
*a9fa9459Szrjbetween the object file's representation of symbols and an internal
*a9fa9459Szrjcanonical format. When the linker asks for the symbol table of an object
*a9fa9459Szrjfile, it calls through a memory pointer to the routine from the
*a9fa9459Szrjrelevant BFD back end which reads and converts the table into a canonical
*a9fa9459Szrjform.  The linker then operates upon the canonical form. When the link is
*a9fa9459Szrjfinished and the linker writes the output file's symbol table,
*a9fa9459Szrjanother BFD back end routine is called to take the newly
*a9fa9459Szrjcreated symbol table and convert it into the chosen output format.
*a9fa9459Szrj
*a9fa9459Szrj@menu
*a9fa9459Szrj* BFD information loss::	Information Loss
*a9fa9459Szrj* Canonical format::		The BFD	canonical object-file format
*a9fa9459Szrj@end menu
*a9fa9459Szrj
*a9fa9459Szrj@node BFD information loss
*a9fa9459Szrj@subsection Information Loss
*a9fa9459Szrj
*a9fa9459Szrj@emph{Information can be lost during output.} The output formats
*a9fa9459Szrjsupported by BFD do not provide identical facilities, and
*a9fa9459Szrjinformation which can be described in one form has nowhere to go in
*a9fa9459Szrjanother format. One example of this is alignment information in
*a9fa9459Szrj@code{b.out}. There is nowhere in an @code{a.out} format file to store
*a9fa9459Szrjalignment information on the contained data, so when a file is linked
*a9fa9459Szrjfrom @code{b.out} and an @code{a.out} image is produced, alignment
*a9fa9459Szrjinformation will not propagate to the output file. (The linker will
*a9fa9459Szrjstill use the alignment information internally, so the link is performed
*a9fa9459Szrjcorrectly).
*a9fa9459Szrj
*a9fa9459SzrjAnother example is COFF section names. COFF files may contain an
*a9fa9459Szrjunlimited number of sections, each one with a textual section name. If
*a9fa9459Szrjthe target of the link is a format which does not have many sections (e.g.,
*a9fa9459Szrj@code{a.out}) or has sections without names (e.g., the Oasys format), the
*a9fa9459Szrjlink cannot be done simply. You can circumvent this problem by
*a9fa9459Szrjdescribing the desired input-to-output section mapping with the linker command
*a9fa9459Szrjlanguage.
*a9fa9459Szrj
*a9fa9459Szrj@emph{Information can be lost during canonicalization.} The BFD
*a9fa9459Szrjinternal canonical form of the external formats is not exhaustive; there
*a9fa9459Szrjare structures in input formats for which there is no direct
*a9fa9459Szrjrepresentation internally.  This means that the BFD back ends
*a9fa9459Szrjcannot maintain all possible data richness through the transformation
*a9fa9459Szrjbetween external to internal and back to external formats.
*a9fa9459Szrj
*a9fa9459SzrjThis limitation is only a problem when an application reads one
*a9fa9459Szrjformat and writes another.  Each BFD back end is responsible for
*a9fa9459Szrjmaintaining as much data as possible, and the internal BFD
*a9fa9459Szrjcanonical form has structures which are opaque to the BFD core,
*a9fa9459Szrjand exported only to the back ends. When a file is read in one format,
*a9fa9459Szrjthe canonical form is generated for BFD and the application. At the
*a9fa9459Szrjsame time, the back end saves away any information which may otherwise
*a9fa9459Szrjbe lost. If the data is then written back in the same format, the back
*a9fa9459Szrjend routine will be able to use the canonical form provided by the
*a9fa9459SzrjBFD core as well as the information it prepared earlier.  Since
*a9fa9459Szrjthere is a great deal of commonality between back ends,
*a9fa9459Szrjthere is no information lost when
*a9fa9459Szrjlinking or copying big endian COFF to little endian COFF, or @code{a.out} to
*a9fa9459Szrj@code{b.out}.  When a mixture of formats is linked, the information is
*a9fa9459Szrjonly lost from the files whose format differs from the destination.
*a9fa9459Szrj
*a9fa9459Szrj@node Canonical format
*a9fa9459Szrj@subsection The BFD canonical object-file format
*a9fa9459Szrj
*a9fa9459SzrjThe greatest potential for loss of information occurs when there is the least
*a9fa9459Szrjoverlap between the information provided by the source format, that
*a9fa9459Szrjstored by the canonical format, and that needed by the
*a9fa9459Szrjdestination format. A brief description of the canonical form may help
*a9fa9459Szrjyou understand which kinds of data you can count on preserving across
*a9fa9459Szrjconversions.
*a9fa9459Szrj@cindex BFD canonical format
*a9fa9459Szrj@cindex internal object-file format
*a9fa9459Szrj
*a9fa9459Szrj@table @emph
*a9fa9459Szrj@item files
*a9fa9459SzrjInformation stored on a per-file basis includes target machine
*a9fa9459Szrjarchitecture, particular implementation format type, a demand pageable
*a9fa9459Szrjbit, and a write protected bit.  Information like Unix magic numbers is
*a9fa9459Szrjnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC}
*a9fa9459Szrjfile would have both the demand pageable bit and the write protected
*a9fa9459Szrjtext bit set.  The byte order of the target is stored on a per-file
*a9fa9459Szrjbasis, so that big- and little-endian object files may be used with one
*a9fa9459Szrjanother.
*a9fa9459Szrj
*a9fa9459Szrj@item sections
*a9fa9459SzrjEach section in the input file contains the name of the section, the
*a9fa9459Szrjsection's original address in the object file, size and alignment
*a9fa9459Szrjinformation, various flags, and pointers into other BFD data
*a9fa9459Szrjstructures.
*a9fa9459Szrj
*a9fa9459Szrj@item symbols
*a9fa9459SzrjEach symbol contains a pointer to the information for the object file
*a9fa9459Szrjwhich originally defined it, its name, its value, and various flag
*a9fa9459Szrjbits.  When a BFD back end reads in a symbol table, it relocates all
*a9fa9459Szrjsymbols to make them relative to the base of the section where they were
*a9fa9459Szrjdefined.  Doing this ensures that each symbol points to its containing
*a9fa9459Szrjsection.  Each symbol also has a varying amount of hidden private data
*a9fa9459Szrjfor the BFD back end.  Since the symbol points to the original file, the
*a9fa9459Szrjprivate data format for that symbol is accessible.  @code{ld} can
*a9fa9459Szrjoperate on a collection of symbols of wildly different formats without
*a9fa9459Szrjproblems.
*a9fa9459Szrj
*a9fa9459SzrjNormal global and simple local symbols are maintained on output, so an
*a9fa9459Szrjoutput file (no matter its format) will retain symbols pointing to
*a9fa9459Szrjfunctions and to global, static, and common variables.  Some symbol
*a9fa9459Szrjinformation is not worth retaining; in @code{a.out}, type information is
*a9fa9459Szrjstored in the symbol table as long symbol names.  This information would
*a9fa9459Szrjbe useless to most COFF debuggers; the linker has command line switches
*a9fa9459Szrjto allow users to throw it away.
*a9fa9459Szrj
*a9fa9459SzrjThere is one word of type information within the symbol, so if the
*a9fa9459Szrjformat supports symbol type information within symbols (for example, COFF,
*a9fa9459SzrjIEEE, Oasys) and the type is simple enough to fit within one word
*a9fa9459Szrj(nearly everything but aggregates), the information will be preserved.
*a9fa9459Szrj
*a9fa9459Szrj@item relocation level
*a9fa9459SzrjEach canonical BFD relocation record contains a pointer to the symbol to
*a9fa9459Szrjrelocate to, the offset of the data to relocate, the section the data
*a9fa9459Szrjis in, and a pointer to a relocation type descriptor. Relocation is
*a9fa9459Szrjperformed by passing messages through the relocation type
*a9fa9459Szrjdescriptor and the symbol pointer. Therefore, relocations can be performed
*a9fa9459Szrjon output data using a relocation method that is only available in one of the
*a9fa9459Szrjinput formats. For instance, Oasys provides a byte relocation format.
*a9fa9459SzrjA relocation record requesting this relocation type would point
*a9fa9459Szrjindirectly to a routine to perform this, so the relocation may be
*a9fa9459Szrjperformed on a byte being written to a 68k COFF file, even though 68k COFF
*a9fa9459Szrjhas no such relocation type.
*a9fa9459Szrj
*a9fa9459Szrj@item line numbers
*a9fa9459SzrjObject formats can contain, for debugging purposes, some form of mapping
*a9fa9459Szrjbetween symbols, source line numbers, and addresses in the output file.
*a9fa9459SzrjThese addresses have to be relocated along with the symbol information.
*a9fa9459SzrjEach symbol with an associated list of line number records points to the
*a9fa9459Szrjfirst record of the list.  The head of a line number list consists of a
*a9fa9459Szrjpointer to the symbol, which allows finding out the address of the
*a9fa9459Szrjfunction whose line number is being described. The rest of the list is
*a9fa9459Szrjmade up of pairs: offsets into the section and line numbers. Any format
*a9fa9459Szrjwhich can simply derive this information can pass it successfully
*a9fa9459Szrjbetween formats (COFF, IEEE and Oasys).
*a9fa9459Szrj@end table