1*a9fa9459Szrj@c This summary of BFD is shared by the BFD and LD docs. 2*a9fa9459Szrj@c Copyright (C) 2012-2016 Free Software Foundation, Inc. 3*a9fa9459Szrj 4*a9fa9459SzrjWhen an object file is opened, BFD subroutines automatically determine 5*a9fa9459Szrjthe format of the input object file. They then build a descriptor in 6*a9fa9459Szrjmemory with pointers to routines that will be used to access elements of 7*a9fa9459Szrjthe object file's data structures. 8*a9fa9459Szrj 9*a9fa9459SzrjAs different information from the object files is required, 10*a9fa9459SzrjBFD reads from different sections of the file and processes them. 11*a9fa9459SzrjFor example, a very common operation for the linker is processing symbol 12*a9fa9459Szrjtables. Each BFD back end provides a routine for converting 13*a9fa9459Szrjbetween the object file's representation of symbols and an internal 14*a9fa9459Szrjcanonical format. When the linker asks for the symbol table of an object 15*a9fa9459Szrjfile, it calls through a memory pointer to the routine from the 16*a9fa9459Szrjrelevant BFD back end which reads and converts the table into a canonical 17*a9fa9459Szrjform. The linker then operates upon the canonical form. When the link is 18*a9fa9459Szrjfinished and the linker writes the output file's symbol table, 19*a9fa9459Szrjanother BFD back end routine is called to take the newly 20*a9fa9459Szrjcreated symbol table and convert it into the chosen output format. 21*a9fa9459Szrj 22*a9fa9459Szrj@menu 23*a9fa9459Szrj* BFD information loss:: Information Loss 24*a9fa9459Szrj* Canonical format:: The BFD canonical object-file format 25*a9fa9459Szrj@end menu 26*a9fa9459Szrj 27*a9fa9459Szrj@node BFD information loss 28*a9fa9459Szrj@subsection Information Loss 29*a9fa9459Szrj 30*a9fa9459Szrj@emph{Information can be lost during output.} The output formats 31*a9fa9459Szrjsupported by BFD do not provide identical facilities, and 32*a9fa9459Szrjinformation which can be described in one form has nowhere to go in 33*a9fa9459Szrjanother format. One example of this is alignment information in 34*a9fa9459Szrj@code{b.out}. There is nowhere in an @code{a.out} format file to store 35*a9fa9459Szrjalignment information on the contained data, so when a file is linked 36*a9fa9459Szrjfrom @code{b.out} and an @code{a.out} image is produced, alignment 37*a9fa9459Szrjinformation will not propagate to the output file. (The linker will 38*a9fa9459Szrjstill use the alignment information internally, so the link is performed 39*a9fa9459Szrjcorrectly). 40*a9fa9459Szrj 41*a9fa9459SzrjAnother example is COFF section names. COFF files may contain an 42*a9fa9459Szrjunlimited number of sections, each one with a textual section name. If 43*a9fa9459Szrjthe target of the link is a format which does not have many sections (e.g., 44*a9fa9459Szrj@code{a.out}) or has sections without names (e.g., the Oasys format), the 45*a9fa9459Szrjlink cannot be done simply. You can circumvent this problem by 46*a9fa9459Szrjdescribing the desired input-to-output section mapping with the linker command 47*a9fa9459Szrjlanguage. 48*a9fa9459Szrj 49*a9fa9459Szrj@emph{Information can be lost during canonicalization.} The BFD 50*a9fa9459Szrjinternal canonical form of the external formats is not exhaustive; there 51*a9fa9459Szrjare structures in input formats for which there is no direct 52*a9fa9459Szrjrepresentation internally. This means that the BFD back ends 53*a9fa9459Szrjcannot maintain all possible data richness through the transformation 54*a9fa9459Szrjbetween external to internal and back to external formats. 55*a9fa9459Szrj 56*a9fa9459SzrjThis limitation is only a problem when an application reads one 57*a9fa9459Szrjformat and writes another. Each BFD back end is responsible for 58*a9fa9459Szrjmaintaining as much data as possible, and the internal BFD 59*a9fa9459Szrjcanonical form has structures which are opaque to the BFD core, 60*a9fa9459Szrjand exported only to the back ends. When a file is read in one format, 61*a9fa9459Szrjthe canonical form is generated for BFD and the application. At the 62*a9fa9459Szrjsame time, the back end saves away any information which may otherwise 63*a9fa9459Szrjbe lost. If the data is then written back in the same format, the back 64*a9fa9459Szrjend routine will be able to use the canonical form provided by the 65*a9fa9459SzrjBFD core as well as the information it prepared earlier. Since 66*a9fa9459Szrjthere is a great deal of commonality between back ends, 67*a9fa9459Szrjthere is no information lost when 68*a9fa9459Szrjlinking or copying big endian COFF to little endian COFF, or @code{a.out} to 69*a9fa9459Szrj@code{b.out}. When a mixture of formats is linked, the information is 70*a9fa9459Szrjonly lost from the files whose format differs from the destination. 71*a9fa9459Szrj 72*a9fa9459Szrj@node Canonical format 73*a9fa9459Szrj@subsection The BFD canonical object-file format 74*a9fa9459Szrj 75*a9fa9459SzrjThe greatest potential for loss of information occurs when there is the least 76*a9fa9459Szrjoverlap between the information provided by the source format, that 77*a9fa9459Szrjstored by the canonical format, and that needed by the 78*a9fa9459Szrjdestination format. A brief description of the canonical form may help 79*a9fa9459Szrjyou understand which kinds of data you can count on preserving across 80*a9fa9459Szrjconversions. 81*a9fa9459Szrj@cindex BFD canonical format 82*a9fa9459Szrj@cindex internal object-file format 83*a9fa9459Szrj 84*a9fa9459Szrj@table @emph 85*a9fa9459Szrj@item files 86*a9fa9459SzrjInformation stored on a per-file basis includes target machine 87*a9fa9459Szrjarchitecture, particular implementation format type, a demand pageable 88*a9fa9459Szrjbit, and a write protected bit. Information like Unix magic numbers is 89*a9fa9459Szrjnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC} 90*a9fa9459Szrjfile would have both the demand pageable bit and the write protected 91*a9fa9459Szrjtext bit set. The byte order of the target is stored on a per-file 92*a9fa9459Szrjbasis, so that big- and little-endian object files may be used with one 93*a9fa9459Szrjanother. 94*a9fa9459Szrj 95*a9fa9459Szrj@item sections 96*a9fa9459SzrjEach section in the input file contains the name of the section, the 97*a9fa9459Szrjsection's original address in the object file, size and alignment 98*a9fa9459Szrjinformation, various flags, and pointers into other BFD data 99*a9fa9459Szrjstructures. 100*a9fa9459Szrj 101*a9fa9459Szrj@item symbols 102*a9fa9459SzrjEach symbol contains a pointer to the information for the object file 103*a9fa9459Szrjwhich originally defined it, its name, its value, and various flag 104*a9fa9459Szrjbits. When a BFD back end reads in a symbol table, it relocates all 105*a9fa9459Szrjsymbols to make them relative to the base of the section where they were 106*a9fa9459Szrjdefined. Doing this ensures that each symbol points to its containing 107*a9fa9459Szrjsection. Each symbol also has a varying amount of hidden private data 108*a9fa9459Szrjfor the BFD back end. Since the symbol points to the original file, the 109*a9fa9459Szrjprivate data format for that symbol is accessible. @code{ld} can 110*a9fa9459Szrjoperate on a collection of symbols of wildly different formats without 111*a9fa9459Szrjproblems. 112*a9fa9459Szrj 113*a9fa9459SzrjNormal global and simple local symbols are maintained on output, so an 114*a9fa9459Szrjoutput file (no matter its format) will retain symbols pointing to 115*a9fa9459Szrjfunctions and to global, static, and common variables. Some symbol 116*a9fa9459Szrjinformation is not worth retaining; in @code{a.out}, type information is 117*a9fa9459Szrjstored in the symbol table as long symbol names. This information would 118*a9fa9459Szrjbe useless to most COFF debuggers; the linker has command line switches 119*a9fa9459Szrjto allow users to throw it away. 120*a9fa9459Szrj 121*a9fa9459SzrjThere is one word of type information within the symbol, so if the 122*a9fa9459Szrjformat supports symbol type information within symbols (for example, COFF, 123*a9fa9459SzrjIEEE, Oasys) and the type is simple enough to fit within one word 124*a9fa9459Szrj(nearly everything but aggregates), the information will be preserved. 125*a9fa9459Szrj 126*a9fa9459Szrj@item relocation level 127*a9fa9459SzrjEach canonical BFD relocation record contains a pointer to the symbol to 128*a9fa9459Szrjrelocate to, the offset of the data to relocate, the section the data 129*a9fa9459Szrjis in, and a pointer to a relocation type descriptor. Relocation is 130*a9fa9459Szrjperformed by passing messages through the relocation type 131*a9fa9459Szrjdescriptor and the symbol pointer. Therefore, relocations can be performed 132*a9fa9459Szrjon output data using a relocation method that is only available in one of the 133*a9fa9459Szrjinput formats. For instance, Oasys provides a byte relocation format. 134*a9fa9459SzrjA relocation record requesting this relocation type would point 135*a9fa9459Szrjindirectly to a routine to perform this, so the relocation may be 136*a9fa9459Szrjperformed on a byte being written to a 68k COFF file, even though 68k COFF 137*a9fa9459Szrjhas no such relocation type. 138*a9fa9459Szrj 139*a9fa9459Szrj@item line numbers 140*a9fa9459SzrjObject formats can contain, for debugging purposes, some form of mapping 141*a9fa9459Szrjbetween symbols, source line numbers, and addresses in the output file. 142*a9fa9459SzrjThese addresses have to be relocated along with the symbol information. 143*a9fa9459SzrjEach symbol with an associated list of line number records points to the 144*a9fa9459Szrjfirst record of the list. The head of a line number list consists of a 145*a9fa9459Szrjpointer to the symbol, which allows finding out the address of the 146*a9fa9459Szrjfunction whose line number is being described. The rest of the list is 147*a9fa9459Szrjmade up of pairs: offsets into the section and line numbers. Any format 148*a9fa9459Szrjwhich can simply derive this information can pass it successfully 149*a9fa9459Szrjbetween formats (COFF, IEEE and Oasys). 150*a9fa9459Szrj@end table 151