1@c This summary of BFD is shared by the BFD and LD docs. 2When an object file is opened, BFD subroutines automatically determine 3the format of the input object file. They then build a descriptor in 4memory with pointers to routines that will be used to access elements of 5the object file's data structures. 6 7As different information from the object files is required, 8BFD reads from different sections of the file and processes them. 9For example, a very common operation for the linker is processing symbol 10tables. Each BFD back end provides a routine for converting 11between the object file's representation of symbols and an internal 12canonical format. When the linker asks for the symbol table of an object 13file, it calls through a memory pointer to the routine from the 14relevant BFD back end which reads and converts the table into a canonical 15form. The linker then operates upon the canonical form. When the link is 16finished and the linker writes the output file's symbol table, 17another BFD back end routine is called to take the newly 18created symbol table and convert it into the chosen output format. 19 20@menu 21* BFD information loss:: Information Loss 22* Canonical format:: The BFD canonical object-file format 23@end menu 24 25@node BFD information loss 26@subsection Information Loss 27 28@emph{Information can be lost during output.} The output formats 29supported by BFD do not provide identical facilities, and 30information which can be described in one form has nowhere to go in 31another format. One example of this is alignment information in 32@code{b.out}. There is nowhere in an @code{a.out} format file to store 33alignment information on the contained data, so when a file is linked 34from @code{b.out} and an @code{a.out} image is produced, alignment 35information will not propagate to the output file. (The linker will 36still use the alignment information internally, so the link is performed 37correctly). 38 39Another example is COFF section names. COFF files may contain an 40unlimited number of sections, each one with a textual section name. If 41the target of the link is a format which does not have many sections (e.g., 42@code{a.out}) or has sections without names (e.g., the Oasys format), the 43link cannot be done simply. You can circumvent this problem by 44describing the desired input-to-output section mapping with the linker command 45language. 46 47@emph{Information can be lost during canonicalization.} The BFD 48internal canonical form of the external formats is not exhaustive; there 49are structures in input formats for which there is no direct 50representation internally. This means that the BFD back ends 51cannot maintain all possible data richness through the transformation 52between external to internal and back to external formats. 53 54This limitation is only a problem when an application reads one 55format and writes another. Each BFD back end is responsible for 56maintaining as much data as possible, and the internal BFD 57canonical form has structures which are opaque to the BFD core, 58and exported only to the back ends. When a file is read in one format, 59the canonical form is generated for BFD and the application. At the 60same time, the back end saves away any information which may otherwise 61be lost. If the data is then written back in the same format, the back 62end routine will be able to use the canonical form provided by the 63BFD core as well as the information it prepared earlier. Since 64there is a great deal of commonality between back ends, 65there is no information lost when 66linking or copying big endian COFF to little endian COFF, or @code{a.out} to 67@code{b.out}. When a mixture of formats is linked, the information is 68only lost from the files whose format differs from the destination. 69 70@node Canonical format 71@subsection The BFD canonical object-file format 72 73The greatest potential for loss of information occurs when there is the least 74overlap between the information provided by the source format, that 75stored by the canonical format, and that needed by the 76destination format. A brief description of the canonical form may help 77you understand which kinds of data you can count on preserving across 78conversions. 79@cindex BFD canonical format 80@cindex internal object-file format 81 82@table @emph 83@item files 84Information stored on a per-file basis includes target machine 85architecture, particular implementation format type, a demand pageable 86bit, and a write protected bit. Information like Unix magic numbers is 87not stored here---only the magic numbers' meaning, so a @code{ZMAGIC} 88file would have both the demand pageable bit and the write protected 89text bit set. The byte order of the target is stored on a per-file 90basis, so that big- and little-endian object files may be used with one 91another. 92 93@item sections 94Each section in the input file contains the name of the section, the 95section's original address in the object file, size and alignment 96information, various flags, and pointers into other BFD data 97structures. 98 99@item symbols 100Each symbol contains a pointer to the information for the object file 101which originally defined it, its name, its value, and various flag 102bits. When a BFD back end reads in a symbol table, it relocates all 103symbols to make them relative to the base of the section where they were 104defined. Doing this ensures that each symbol points to its containing 105section. Each symbol also has a varying amount of hidden private data 106for the BFD back end. Since the symbol points to the original file, the 107private data format for that symbol is accessible. @code{ld} can 108operate on a collection of symbols of wildly different formats without 109problems. 110 111Normal global and simple local symbols are maintained on output, so an 112output file (no matter its format) will retain symbols pointing to 113functions and to global, static, and common variables. Some symbol 114information is not worth retaining; in @code{a.out}, type information is 115stored in the symbol table as long symbol names. This information would 116be useless to most COFF debuggers; the linker has command line switches 117to allow users to throw it away. 118 119There is one word of type information within the symbol, so if the 120format supports symbol type information within symbols (for example, COFF, 121IEEE, Oasys) and the type is simple enough to fit within one word 122(nearly everything but aggregates), the information will be preserved. 123 124@item relocation level 125Each canonical BFD relocation record contains a pointer to the symbol to 126relocate to, the offset of the data to relocate, the section the data 127is in, and a pointer to a relocation type descriptor. Relocation is 128performed by passing messages through the relocation type 129descriptor and the symbol pointer. Therefore, relocations can be performed 130on output data using a relocation method that is only available in one of the 131input formats. For instance, Oasys provides a byte relocation format. 132A relocation record requesting this relocation type would point 133indirectly to a routine to perform this, so the relocation may be 134performed on a byte being written to a 68k COFF file, even though 68k COFF 135has no such relocation type. 136 137@item line numbers 138Object formats can contain, for debugging purposes, some form of mapping 139between symbols, source line numbers, and addresses in the output file. 140These addresses have to be relocated along with the symbol information. 141Each symbol with an associated list of line number records points to the 142first record of the list. The head of a line number list consists of a 143pointer to the symbol, which allows finding out the address of the 144function whose line number is being described. The rest of the list is 145made up of pairs: offsets into the section and line numbers. Any format 146which can simply derive this information can pass it successfully 147between formats (COFF, IEEE and Oasys). 148@end table 149