1377e23a2Schristos@c This summary of BFD is shared by the BFD and LD docs. 2*1424dfb3Schristos@c Copyright (C) 2012-2020 Free Software Foundation, Inc. 348596154Schristos 4377e23a2SchristosWhen an object file is opened, BFD subroutines automatically determine 5377e23a2Schristosthe format of the input object file. They then build a descriptor in 6377e23a2Schristosmemory with pointers to routines that will be used to access elements of 7377e23a2Schristosthe object file's data structures. 8377e23a2Schristos 9377e23a2SchristosAs different information from the object files is required, 10377e23a2SchristosBFD reads from different sections of the file and processes them. 11377e23a2SchristosFor example, a very common operation for the linker is processing symbol 12377e23a2Schristostables. Each BFD back end provides a routine for converting 13377e23a2Schristosbetween the object file's representation of symbols and an internal 14377e23a2Schristoscanonical format. When the linker asks for the symbol table of an object 15377e23a2Schristosfile, it calls through a memory pointer to the routine from the 16377e23a2Schristosrelevant BFD back end which reads and converts the table into a canonical 17377e23a2Schristosform. The linker then operates upon the canonical form. When the link is 18377e23a2Schristosfinished and the linker writes the output file's symbol table, 19377e23a2Schristosanother BFD back end routine is called to take the newly 20377e23a2Schristoscreated symbol table and convert it into the chosen output format. 21377e23a2Schristos 22377e23a2Schristos@menu 23377e23a2Schristos* BFD information loss:: Information Loss 24377e23a2Schristos* Canonical format:: The BFD canonical object-file format 25377e23a2Schristos@end menu 26377e23a2Schristos 27377e23a2Schristos@node BFD information loss 28377e23a2Schristos@subsection Information Loss 29377e23a2Schristos 30377e23a2Schristos@emph{Information can be lost during output.} The output formats 31377e23a2Schristossupported by BFD do not provide identical facilities, and 32377e23a2Schristosinformation which can be described in one form has nowhere to go in 33377e23a2Schristosanother format. One example of this is alignment information in 34377e23a2Schristos@code{b.out}. There is nowhere in an @code{a.out} format file to store 35377e23a2Schristosalignment information on the contained data, so when a file is linked 36377e23a2Schristosfrom @code{b.out} and an @code{a.out} image is produced, alignment 37377e23a2Schristosinformation will not propagate to the output file. (The linker will 38377e23a2Schristosstill use the alignment information internally, so the link is performed 39377e23a2Schristoscorrectly). 40377e23a2Schristos 41377e23a2SchristosAnother example is COFF section names. COFF files may contain an 42377e23a2Schristosunlimited number of sections, each one with a textual section name. If 43377e23a2Schristosthe target of the link is a format which does not have many sections (e.g., 44377e23a2Schristos@code{a.out}) or has sections without names (e.g., the Oasys format), the 45377e23a2Schristoslink cannot be done simply. You can circumvent this problem by 46377e23a2Schristosdescribing the desired input-to-output section mapping with the linker command 47377e23a2Schristoslanguage. 48377e23a2Schristos 49377e23a2Schristos@emph{Information can be lost during canonicalization.} The BFD 50377e23a2Schristosinternal canonical form of the external formats is not exhaustive; there 51377e23a2Schristosare structures in input formats for which there is no direct 52377e23a2Schristosrepresentation internally. This means that the BFD back ends 53377e23a2Schristoscannot maintain all possible data richness through the transformation 54377e23a2Schristosbetween external to internal and back to external formats. 55377e23a2Schristos 56377e23a2SchristosThis limitation is only a problem when an application reads one 57377e23a2Schristosformat and writes another. Each BFD back end is responsible for 58377e23a2Schristosmaintaining as much data as possible, and the internal BFD 59377e23a2Schristoscanonical form has structures which are opaque to the BFD core, 60377e23a2Schristosand exported only to the back ends. When a file is read in one format, 61377e23a2Schristosthe canonical form is generated for BFD and the application. At the 62377e23a2Schristossame time, the back end saves away any information which may otherwise 63377e23a2Schristosbe lost. If the data is then written back in the same format, the back 64377e23a2Schristosend routine will be able to use the canonical form provided by the 65377e23a2SchristosBFD core as well as the information it prepared earlier. Since 66377e23a2Schristosthere is a great deal of commonality between back ends, 67377e23a2Schristosthere is no information lost when 68377e23a2Schristoslinking or copying big endian COFF to little endian COFF, or @code{a.out} to 69377e23a2Schristos@code{b.out}. When a mixture of formats is linked, the information is 70377e23a2Schristosonly lost from the files whose format differs from the destination. 71377e23a2Schristos 72377e23a2Schristos@node Canonical format 73377e23a2Schristos@subsection The BFD canonical object-file format 74377e23a2Schristos 75377e23a2SchristosThe greatest potential for loss of information occurs when there is the least 76377e23a2Schristosoverlap between the information provided by the source format, that 77377e23a2Schristosstored by the canonical format, and that needed by the 78377e23a2Schristosdestination format. A brief description of the canonical form may help 79377e23a2Schristosyou understand which kinds of data you can count on preserving across 80377e23a2Schristosconversions. 81377e23a2Schristos@cindex BFD canonical format 82377e23a2Schristos@cindex internal object-file format 83377e23a2Schristos 84377e23a2Schristos@table @emph 85377e23a2Schristos@item files 86377e23a2SchristosInformation stored on a per-file basis includes target machine 87377e23a2Schristosarchitecture, particular implementation format type, a demand pageable 88377e23a2Schristosbit, and a write protected bit. Information like Unix magic numbers is 89377e23a2Schristosnot stored here---only the magic numbers' meaning, so a @code{ZMAGIC} 90377e23a2Schristosfile would have both the demand pageable bit and the write protected 91377e23a2Schristostext bit set. The byte order of the target is stored on a per-file 92377e23a2Schristosbasis, so that big- and little-endian object files may be used with one 93377e23a2Schristosanother. 94377e23a2Schristos 95377e23a2Schristos@item sections 96377e23a2SchristosEach section in the input file contains the name of the section, the 97377e23a2Schristossection's original address in the object file, size and alignment 98377e23a2Schristosinformation, various flags, and pointers into other BFD data 99377e23a2Schristosstructures. 100377e23a2Schristos 101377e23a2Schristos@item symbols 102377e23a2SchristosEach symbol contains a pointer to the information for the object file 103377e23a2Schristoswhich originally defined it, its name, its value, and various flag 104377e23a2Schristosbits. When a BFD back end reads in a symbol table, it relocates all 105377e23a2Schristossymbols to make them relative to the base of the section where they were 106377e23a2Schristosdefined. Doing this ensures that each symbol points to its containing 107377e23a2Schristossection. Each symbol also has a varying amount of hidden private data 108377e23a2Schristosfor the BFD back end. Since the symbol points to the original file, the 109377e23a2Schristosprivate data format for that symbol is accessible. @code{ld} can 110377e23a2Schristosoperate on a collection of symbols of wildly different formats without 111377e23a2Schristosproblems. 112377e23a2Schristos 113377e23a2SchristosNormal global and simple local symbols are maintained on output, so an 114377e23a2Schristosoutput file (no matter its format) will retain symbols pointing to 115377e23a2Schristosfunctions and to global, static, and common variables. Some symbol 116377e23a2Schristosinformation is not worth retaining; in @code{a.out}, type information is 117377e23a2Schristosstored in the symbol table as long symbol names. This information would 11807163879Schristosbe useless to most COFF debuggers; the linker has command-line switches 119377e23a2Schristosto allow users to throw it away. 120377e23a2Schristos 121377e23a2SchristosThere is one word of type information within the symbol, so if the 122377e23a2Schristosformat supports symbol type information within symbols (for example, COFF, 12307163879SchristosOasys) and the type is simple enough to fit within one word 124377e23a2Schristos(nearly everything but aggregates), the information will be preserved. 125377e23a2Schristos 126377e23a2Schristos@item relocation level 127377e23a2SchristosEach canonical BFD relocation record contains a pointer to the symbol to 128377e23a2Schristosrelocate to, the offset of the data to relocate, the section the data 129377e23a2Schristosis in, and a pointer to a relocation type descriptor. Relocation is 130377e23a2Schristosperformed by passing messages through the relocation type 131377e23a2Schristosdescriptor and the symbol pointer. Therefore, relocations can be performed 132377e23a2Schristoson output data using a relocation method that is only available in one of the 133377e23a2Schristosinput formats. For instance, Oasys provides a byte relocation format. 134377e23a2SchristosA relocation record requesting this relocation type would point 135377e23a2Schristosindirectly to a routine to perform this, so the relocation may be 136377e23a2Schristosperformed on a byte being written to a 68k COFF file, even though 68k COFF 137377e23a2Schristoshas no such relocation type. 138377e23a2Schristos 139377e23a2Schristos@item line numbers 140377e23a2SchristosObject formats can contain, for debugging purposes, some form of mapping 141377e23a2Schristosbetween symbols, source line numbers, and addresses in the output file. 142377e23a2SchristosThese addresses have to be relocated along with the symbol information. 143377e23a2SchristosEach symbol with an associated list of line number records points to the 144377e23a2Schristosfirst record of the list. The head of a line number list consists of a 145377e23a2Schristospointer to the symbol, which allows finding out the address of the 146377e23a2Schristosfunction whose line number is being described. The rest of the list is 147377e23a2Schristosmade up of pairs: offsets into the section and line numbers. Any format 148377e23a2Schristoswhich can simply derive this information can pass it successfully 14907163879Schristosbetween formats. 150377e23a2Schristos@end table 151