1\input texinfo 2@setfilename ldint.info 3@c Copyright (C) 1992-2021 Free Software Foundation, Inc. 4 5@ifnottex 6@dircategory Software development 7@direntry 8* Ld-Internals: (ldint). The GNU linker internals. 9@end direntry 10@end ifnottex 11 12@copying 13This file documents the internals of the GNU linker ld. 14 15Copyright @copyright{} 1992-2021 Free Software Foundation, Inc. 16Contributed by Cygnus Support. 17 18Permission is granted to copy, distribute and/or modify this document 19under the terms of the GNU Free Documentation License, Version 1.3 or 20any later version published by the Free Software Foundation; with the 21Invariant Sections being ``GNU General Public License'' and ``Funding 22Free Software'', the Front-Cover texts being (a) (see below), and with 23the Back-Cover Texts being (b) (see below). A copy of the license is 24included in the section entitled ``GNU Free Documentation License''. 25 26(a) The FSF's Front-Cover Text is: 27 28 A GNU Manual 29 30(b) The FSF's Back-Cover Text is: 31 32 You have freedom to copy and modify this GNU Manual, like GNU 33 software. Copies published by the Free Software Foundation raise 34 funds for GNU development. 35@end copying 36 37@iftex 38@finalout 39@setchapternewpage off 40@settitle GNU Linker Internals 41@titlepage 42@title{A guide to the internals of the GNU linker} 43@author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie 44@author Cygnus Support 45@page 46 47@tex 48\def\$#1${{#1}} % Kluge: collect RCS revision info without $...$ 49\xdef\manvers{2.10.91} % For use in headers, footers too 50{\parskip=0pt 51\hfill Cygnus Support\par 52\hfill \manvers\par 53\hfill \TeX{}info \texinfoversion\par 54} 55@end tex 56 57@vskip 0pt plus 1filll 58Copyright @copyright{} 1992-2021 Free Software Foundation, Inc. 59 60 Permission is granted to copy, distribute and/or modify this document 61 under the terms of the GNU Free Documentation License, Version 1.3 62 or any later version published by the Free Software Foundation; 63 with no Invariant Sections, with no Front-Cover Texts, and with no 64 Back-Cover Texts. A copy of the license is included in the 65 section entitled "GNU Free Documentation License". 66 67@end titlepage 68@end iftex 69 70@node Top 71@top 72 73This file documents the internals of the GNU linker @code{ld}. It is a 74collection of miscellaneous information with little form at this point. 75Mostly, it is a repository into which you can put information about 76GNU @code{ld} as you discover it (or as you design changes to @code{ld}). 77 78This document is distributed under the terms of the GNU Free 79Documentation License. A copy of the license is included in the 80section entitled "GNU Free Documentation License". 81 82@menu 83* README:: The README File 84* Emulations:: How linker emulations are generated 85* Emulation Walkthrough:: A Walkthrough of a Typical Emulation 86* Architecture Specific:: Some Architecture Specific Notes 87* GNU Free Documentation License:: GNU Free Documentation License 88@end menu 89 90@node README 91@chapter The @file{README} File 92 93Check the @file{README} file; it often has useful information that does not 94appear anywhere else in the directory. 95 96@node Emulations 97@chapter How linker emulations are generated 98 99Each linker target has an @dfn{emulation}. The emulation includes the 100default linker script, and certain emulations also modify certain types 101of linker behaviour. 102 103Emulations are created during the build process by the shell script 104@file{genscripts.sh}. 105 106The @file{genscripts.sh} script starts by reading a file in the 107@file{emulparams} directory. This is a shell script which sets various 108shell variables used by @file{genscripts.sh} and the other shell scripts 109it invokes. 110 111The @file{genscripts.sh} script will invoke a shell script in the 112@file{scripttempl} directory in order to create default linker scripts 113written in the linker command language. The @file{scripttempl} script 114will be invoked 5 (or, in some cases, 6) times, with different 115assignments to shell variables, to create different default scripts. 116The choice of script is made based on the command-line options. 117 118After creating the scripts, @file{genscripts.sh} will invoke yet another 119shell script, this time in the @file{emultempl} directory. That shell 120script will create the emulation source file, which contains C code. 121This C code permits the linker emulation to override various linker 122behaviours. Most targets use the generic emulation code, which is in 123@file{emultempl/generic.em}. 124 125To summarize, @file{genscripts.sh} reads three shell scripts: an 126emulation parameters script in the @file{emulparams} directory, a linker 127script generation script in the @file{scripttempl} directory, and an 128emulation source file generation script in the @file{emultempl} 129directory. 130 131For example, the Sun 4 linker sets up variables in 132@file{emulparams/sun4.sh}, creates linker scripts using 133@file{scripttempl/aout.sc}, and creates the emulation code using 134@file{emultempl/sunos.em}. 135 136Note that the linker can support several emulations simultaneously, 137depending upon how it is configured. An emulation can be selected with 138the @code{-m} option. The @code{-V} option will list all supported 139emulations. 140 141@menu 142* emulation parameters:: @file{emulparams} scripts 143* linker scripts:: @file{scripttempl} scripts 144* linker emulations:: @file{emultempl} scripts 145@end menu 146 147@node emulation parameters 148@section @file{emulparams} scripts 149 150Each target selects a particular file in the @file{emulparams} directory 151by setting the shell variable @code{targ_emul} in @file{configure.tgt}. 152This shell variable is used by the @file{configure} script to control 153building an emulation source file. 154 155Certain conventions are enforced. Suppose the @code{targ_emul} variable 156is set to @var{emul} in @file{configure.tgt}. The name of the emulation 157shell script will be @file{emulparams/@var{emul}.sh}. The 158@file{Makefile} must have a target named @file{e@var{emul}.c}; this 159target must depend upon @file{emulparams/@var{emul}.sh}, as well as the 160appropriate scripts in the @file{scripttempl} and @file{emultempl} 161directories. The @file{Makefile} target must invoke @code{GENSCRIPTS} 162with two arguments: @var{emul}, and the value of the make variable 163@code{tdir_@var{emul}}. The value of the latter variable will be set by 164the @file{configure} script, and is used to set the default target 165directory to search. 166 167By convention, the @file{emulparams/@var{emul}.sh} shell script should 168only set shell variables. It may set shell variables which are to be 169interpreted by the @file{scripttempl} and the @file{emultempl} scripts. 170Certain shell variables are interpreted directly by the 171@file{genscripts.sh} script. 172 173Here is a list of shell variables interpreted by @file{genscripts.sh}, 174as well as some conventional shell variables interpreted by the 175@file{scripttempl} and @file{emultempl} scripts. 176 177@table @code 178@item SCRIPT_NAME 179This is the name of the @file{scripttempl} script to use. If 180@code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use 181the script @file{scripttempl/@var{script}.sc}. 182 183@item TEMPLATE_NAME 184This is the name of the @file{emultempl} script to use. If 185@code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will 186use the script @file{emultempl/@var{template}.em}. If this variable is 187not set, the default value is @samp{generic}. 188 189@item GENERATE_SHLIB_SCRIPT 190If this is set to a nonempty string, @file{genscripts.sh} will invoke 191the @file{scripttempl} script an extra time to create a shared library 192script. @ref{linker scripts}. 193 194@item OUTPUT_FORMAT 195This is normally set to indicate the BFD output format use (e.g., 196@samp{"a.out-sunos-big"}. The @file{scripttempl} script will normally 197use it in an @code{OUTPUT_FORMAT} expression in the linker script. 198 199@item ARCH 200This is normally set to indicate the architecture to use (e.g., 201@samp{sparc}). The @file{scripttempl} script will normally use it in an 202@code{OUTPUT_ARCH} expression in the linker script. 203 204@item ENTRY 205Some @file{scripttempl} scripts use this to set the entry address, in an 206@code{ENTRY} expression in the linker script. 207 208@item TEXT_START_ADDR 209Some @file{scripttempl} scripts use this to set the start address of the 210@samp{.text} section. 211 212@item SEGMENT_SIZE 213The @file{genscripts.sh} script uses this to set the default value of 214@code{DATA_ALIGNMENT} when running the @file{scripttempl} script. 215 216@item TARGET_PAGE_SIZE 217If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script 218uses this to define it. 219 220@item ALIGNMENT 221Some @file{scripttempl} scripts set this to a number to pass to 222@code{ALIGN} to set the required alignment for the @code{end} symbol. 223@end table 224 225@node linker scripts 226@section @file{scripttempl} scripts 227 228Each linker target uses a @file{scripttempl} script to generate the 229default linker scripts. The name of the @file{scripttempl} script is 230set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script. 231If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will 232invoke @file{scripttempl/@var{script}.sc}. 233 234The @file{genscripts.sh} script will invoke the @file{scripttempl} 235script 5 to 9 times. Each time it will set the shell variable 236@code{LD_FLAG} to a different value. When the linker is run, the 237options used will direct it to select a particular script. (Script 238selection is controlled by the @code{get_script} emulation entry point; 239this describes the conventional behaviour). 240 241The @file{scripttempl} script should just write a linker script, written 242in the linker command language, to standard output. If the emulation 243name--the name of the @file{emulparams} file without the @file{.sc} 244extension--is @var{emul}, then the output will be directed to 245@file{ldscripts/@var{emul}.@var{extension}} in the build directory, 246where @var{extension} changes each time the @file{scripttempl} script is 247invoked. 248 249Here is the list of values assigned to @code{LD_FLAG}. 250 251@table @code 252@item (empty) 253The script generated is used by default (when none of the following 254cases apply). The output has an extension of @file{.x}. 255@item n 256The script generated is used when the linker is invoked with the 257@code{-n} option. The output has an extension of @file{.xn}. 258@item N 259The script generated is used when the linker is invoked with the 260@code{-N} option. The output has an extension of @file{.xbn}. 261@item r 262The script generated is used when the linker is invoked with the 263@code{-r} option. The output has an extension of @file{.xr}. 264@item u 265The script generated is used when the linker is invoked with the 266@code{-Ur} option. The output has an extension of @file{.xu}. 267@item shared 268The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 269this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the 270@file{emulparams} file. The @file{emultempl} script must arrange to use 271this script at the appropriate time, normally when the linker is invoked 272with the @code{-shared} option. The output has an extension of 273@file{.xs}. 274@item c 275The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 276this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the 277@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The 278@file{emultempl} script must arrange to use this script at the appropriate 279time, normally when the linker is invoked with the @code{-z combreloc} 280option. The output has an extension of 281@file{.xc}. 282@item cshared 283The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 284this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the 285@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and 286@code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file. 287The @file{emultempl} script must arrange to use this script at the 288appropriate time, normally when the linker is invoked with the @code{-shared 289-z combreloc} option. The output has an extension of @file{.xsc}. 290@item auto_import 291The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to 292this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the 293@file{emulparams} file. The @file{emultempl} script must arrange to 294use this script at the appropriate time, normally when the linker is 295invoked with the @code{--enable-auto-import} option. The output has 296an extension of @file{.xa}. 297@end table 298 299Besides the shell variables set by the @file{emulparams} script, and the 300@code{LD_FLAG} variable, the @file{genscripts.sh} script will set 301certain variables for each run of the @file{scripttempl} script. 302 303@table @code 304@item RELOCATING 305This will be set to a non-empty string when the linker is doing a final 306relocation (e.g., all scripts other than @code{-r} and @code{-Ur}). 307 308@item CONSTRUCTING 309This will be set to a non-empty string when the linker is building 310global constructor and destructor tables (e.g., all scripts other than 311@code{-r}). 312 313@item DATA_ALIGNMENT 314This will be set to an @code{ALIGN} expression when the output should be 315page aligned, or to @samp{.} when generating the @code{-N} script. 316 317@item CREATE_SHLIB 318This will be set to a non-empty string when generating a @code{-shared} 319script. 320 321@item COMBRELOC 322This will be set to a non-empty string when generating @code{-z combreloc} 323scripts to a temporary file name which can be used during script generation. 324@end table 325 326The conventional way to write a @file{scripttempl} script is to first 327set a few shell variables, and then write out a linker script using 328@code{cat} with a here document. The linker script will use variable 329substitutions, based on the above variables and those set in the 330@file{emulparams} script, to control its behaviour. 331 332When there are parts of the @file{scripttempl} script which should only 333be run when doing a final relocation, they should be enclosed within a 334variable substitution based on @code{RELOCATING}. For example, on many 335targets special symbols such as @code{_end} should be defined when doing 336a final link. Naturally, those symbols should not be defined when doing 337a relocatable link using @code{-r}. The @file{scripttempl} script 338could use a construct like this to define those symbols: 339@smallexample 340 $@{RELOCATING+ _end = .;@} 341@end smallexample 342This will do the symbol assignment only if the @code{RELOCATING} 343variable is defined. 344 345The basic job of the linker script is to put the sections in the correct 346order, and at the correct memory addresses. For some targets, the 347linker script may have to do some other operations. 348 349For example, on most MIPS platforms, the linker is responsible for 350defining the special symbol @code{_gp}, used to initialize the 351@code{$gp} register. It must be set to the start of the small data 352section plus @code{0x8000}. Naturally, it should only be defined when 353doing a final relocation. This will typically be done like this: 354@smallexample 355 $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@} 356@end smallexample 357This line would appear just before the sections which compose the small 358data section (@samp{.sdata}, @samp{.sbss}). All those sections would be 359contiguous in memory. 360 361Many COFF systems build constructor tables in the linker script. The 362compiler will arrange to output the address of each global constructor 363in a @samp{.ctor} section, and the address of each global destructor in 364a @samp{.dtor} section (this is done by defining 365@code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the 366@code{gcc} configuration files). The @code{gcc} runtime support 367routines expect the constructor table to be named @code{__CTOR_LIST__}. 368They expect it to be a list of words, with the first word being the 369count of the number of entries. There should be a trailing zero word. 370(Actually, the count may be -1 if the trailing word is present, and the 371trailing word may be omitted if the count is correct, but, as the 372@code{gcc} behaviour has changed slightly over the years, it is safest 373to provide both). Here is a typical way that might be handled in a 374@file{scripttempl} file. 375@smallexample 376 $@{CONSTRUCTING+ __CTOR_LIST__ = .;@} 377 $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@} 378 $@{CONSTRUCTING+ *(.ctors)@} 379 $@{CONSTRUCTING+ LONG(0)@} 380 $@{CONSTRUCTING+ __CTOR_END__ = .;@} 381 $@{CONSTRUCTING+ __DTOR_LIST__ = .;@} 382 $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@} 383 $@{CONSTRUCTING+ *(.dtors)@} 384 $@{CONSTRUCTING+ LONG(0)@} 385 $@{CONSTRUCTING+ __DTOR_END__ = .;@} 386@end smallexample 387The use of @code{CONSTRUCTING} ensures that these linker script commands 388will only appear when the linker is supposed to be building the 389constructor and destructor tables. This example is written for a target 390which uses 4 byte pointers. 391 392Embedded systems often need to set a stack address. This is normally 393best done by using the @code{PROVIDE} construct with a default stack 394address. This permits the user to easily override the stack address 395using the @code{--defsym} option. Here is an example: 396@smallexample 397 $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@} 398@end smallexample 399The value of the symbol @code{__stack} would then be used in the startup 400code to initialize the stack pointer. 401 402@node linker emulations 403@section @file{emultempl} scripts 404 405Each linker target uses an @file{emultempl} script to generate the 406emulation code. The name of the @file{emultempl} script is set by the 407@code{TEMPLATE_NAME} variable in the @file{emulparams} script. If the 408@code{TEMPLATE_NAME} variable is not set, the default is 409@samp{generic}. If the value of @code{TEMPLATE_NAME} is @var{template}, 410@file{genscripts.sh} will use @file{emultempl/@var{template}.em}. 411 412Most targets use the generic @file{emultempl} script, 413@file{emultempl/generic.em}. A different @file{emultempl} script is 414only needed if the linker must support unusual actions, such as linking 415against shared libraries. 416 417The @file{emultempl} script is normally written as a simple invocation 418of @code{cat} with a here document. The document will use a few 419variable substitutions. Typically each function names uses a 420substitution involving @code{EMULATION_NAME}, for ease of debugging when 421the linker supports multiple emulations. 422 423Every function and variable in the emitted file should be static. The 424only globally visible object must be named 425@code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is 426the name of the emulation set in @file{configure.tgt} (this is also the 427name of the @file{emulparams} file without the @file{.sh} extension). 428The @file{genscripts.sh} script will set the shell variable 429@code{EMULATION_NAME} before invoking the @file{emultempl} script. 430 431The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a 432@code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}. 433It defines a set of function pointers which are invoked by the linker, 434as well as strings for the emulation name (normally set from the shell 435variable @code{EMULATION_NAME} and the default BFD target name (normally 436set from the shell variable @code{OUTPUT_FORMAT} which is normally set 437by the @file{emulparams} file). 438 439The @file{genscripts.sh} script will set the shell variable 440@code{COMPILE_IN} when it invokes the @file{emultempl} script for the 441default emulation. In this case, the @file{emultempl} script should 442include the linker scripts directly, and return them from the 443@code{get_scripts} entry point. When the emulation is not the default, 444the @code{get_scripts} entry point should just return a file name. See 445@file{emultempl/generic.em} for an example of how this is done. 446 447At some point, the linker emulation entry points should be documented. 448 449@node Emulation Walkthrough 450@chapter A Walkthrough of a Typical Emulation 451 452This chapter is to help people who are new to the way emulations 453interact with the linker, or who are suddenly thrust into the position 454of having to work with existing emulations. It will discuss the files 455you need to be aware of. It will tell you when the given "hooks" in 456the emulation will be called. It will, hopefully, give you enough 457information about when and how things happen that you'll be able to 458get by. As always, the source is the definitive reference to this. 459 460The starting point for the linker is in @file{ldmain.c} where 461@code{main} is defined. The bulk of the code that's emulation 462specific will initially be in @code{emultempl/@var{emulation}.em} but 463will end up in @code{e@var{emulation}.c} when the build is done. 464Most of the work to select and interface with emulations is in 465@code{ldemul.h} and @code{ldemul.c}. Specifically, @code{ldemul.h} 466defines the @code{ld_emulation_xfer_struct} structure your emulation 467exports. 468 469Your emulation file exports a symbol 470@code{ld_@var{EMULATION_NAME}_emulation}. If your emulation is 471selected (it usually is, since usually there's only one), 472@code{ldemul.c} sets the variable @var{ld_emulation} to point to it. 473@code{ldemul.c} also defines a number of API functions that interface 474to your emulation, like @code{ldemul_after_parse} which simply calls 475your @code{ld_@var{EMULATION}_emulation.after_parse} function. For 476the rest of this section, the functions will be mentioned, but you 477should assume the indirect reference to your emulation also. 478 479We will also skip or gloss over parts of the link process that don't 480relate to emulations, like setting up internationalization. 481 482After initialization, @code{main} selects an emulation by pre-scanning 483the command-line arguments. It calls @code{ldemul_choose_target} to 484choose a target. If you set @code{choose_target} to 485@code{ldemul_default_target}, it picks your @code{target_name} by 486default. 487 488@code{main} calls @code{ldemul_before_parse}, then @code{parse_args}. 489@code{parse_args} calls @code{ldemul_parse_args} for each arg, which 490must update the @code{getopt} globals if it recognizes the argument. 491If the emulation doesn't recognize it, then parse_args checks to see 492if it recognizes it. 493 494Now that the emulation has had access to all its command-line options, 495@code{main} calls @code{ldemul_set_symbols}. This can be used for any 496initialization that may be affected by options. It is also supposed 497to set up any variables needed by the emulation script. 498 499@code{main} now calls @code{ldemul_get_script} to get the emulation 500script to use (based on arguments, no doubt, @pxref{Emulations}) and 501runs it. While parsing, @code{ldgram.y} may call @code{ldemul_hll} or 502@code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB} 503commands. It may call @code{ldemul_unrecognized_file} if you asked 504the linker to link a file it doesn't recognize. It will call 505@code{ldemul_recognized_file} for each file it does recognize, in case 506the emulation wants to handle some files specially. All the while, 507it's loading the files (possibly calling 508@code{ldemul_open_dynamic_archive}) and symbols and stuff. After it's 509done reading the script, @code{main} calls @code{ldemul_after_parse}. 510Use the after-parse hook to set up anything that depends on stuff the 511script might have set up, like the entry point. 512 513@code{main} next calls @code{lang_process} in @code{ldlang.c}. This 514appears to be the main core of the linking itself, as far as emulation 515hooks are concerned(*). It first opens the output file's BFD, calling 516@code{ldemul_set_output_arch}, and calls 517@code{ldemul_create_output_section_statements} in case you need to use 518other means to find or create object files (i.e. shared libraries 519found on a path, or fake stub objects). Despite the name, nobody 520creates output sections here. 521 522(*) In most cases, the BFD library does the bulk of the actual 523linking, handling symbol tables, symbol resolution, relocations, and 524building the final output file. See the BFD reference for all the 525details. Your emulation is usually concerned more with managing 526things at the file and section level, like "put this here, add this 527section", etc. 528 529Next, the objects to be linked are opened and BFDs created for them, 530and @code{ldemul_after_open} is called. At this point, you have all 531the objects and symbols loaded, but none of the data has been placed 532yet. 533 534Next comes the Big Linking Thingy (except for the parts BFD does). 535All input sections are mapped to output sections according to the 536script. If a section doesn't get mapped by default, 537@code{ldemul_place_orphan} will get called to figure out where it goes. 538Next it figures out the offsets for each section, calling 539@code{ldemul_before_allocation} before and 540@code{ldemul_after_allocation} after deciding where each input section 541ends up in the output sections. 542 543The last part of @code{lang_process} is to figure out all the symbols' 544values. After assigning final values to the symbols, 545@code{ldemul_finish} is called, and after that, any undefined symbols 546are turned into fatal errors. 547 548OK, back to @code{main}, which calls @code{ldwrite} in 549@file{ldwrite.c}. @code{ldwrite} calls BFD's final_link, which does 550all the relocation fixups and writes the output bfd to disk, and we're 551done. 552 553In summary, 554 555@itemize @bullet 556 557@item @code{main()} in @file{ldmain.c} 558@item @file{emultempl/@var{EMULATION}.em} has your code 559@item @code{ldemul_choose_target} (defaults to your @code{target_name}) 560@item @code{ldemul_before_parse} 561@item Parse argv, calls @code{ldemul_parse_args} for each 562@item @code{ldemul_set_symbols} 563@item @code{ldemul_get_script} 564@item parse script 565 566@itemize @bullet 567@item may call @code{ldemul_hll} or @code{ldemul_syslib} 568@item may call @code{ldemul_open_dynamic_archive} 569@end itemize 570 571@item @code{ldemul_after_parse} 572@item @code{lang_process()} in @file{ldlang.c} 573 574@itemize @bullet 575@item create @code{output_bfd} 576@item @code{ldemul_set_output_arch} 577@item @code{ldemul_create_output_section_statements} 578@item read objects, create input bfds - all symbols exist, but have no values 579@item may call @code{ldemul_unrecognized_file} 580@item will call @code{ldemul_recognized_file} 581@item @code{ldemul_after_open} 582@item map input sections to output sections 583@item may call @code{ldemul_place_orphan} for remaining sections 584@item @code{ldemul_before_allocation} 585@item gives input sections offsets into output sections, places output sections 586@item @code{ldemul_after_allocation} - section addresses valid 587@item assigns values to symbols 588@item @code{ldemul_finish} - symbol values valid 589@end itemize 590 591@item output bfd is written to disk 592 593@end itemize 594 595@node Architecture Specific 596@chapter Some Architecture Specific Notes 597 598This is the place for notes on the behavior of @code{ld} on 599specific platforms. Currently, only Intel x86 is documented (and 600of that, only the auto-import behavior for DLLs). 601 602@menu 603* ix86:: Intel x86 604@end menu 605 606@node ix86 607@section Intel x86 608 609@table @emph 610@code{ld} can create DLLs that operate with various runtimes available 611on a common x86 operating system. These runtimes include native (using 612the mingw "platform"), cygwin, and pw. 613 614@item auto-import from DLLs 615@enumerate 616@item 617With this feature on, DLL clients can import variables from DLL 618without any concern from their side (for example, without any source 619code modifications). Auto-import can be enabled using the 620@code{--enable-auto-import} flag, or disabled via the 621@code{--disable-auto-import} flag. Auto-import is disabled by default. 622 623@item 624This is done completely in bounds of the PE specification (to be fair, 625there's a minor violation of the spec at one point, but in practice 626auto-import works on all known variants of that common x86 operating 627system) So, the resulting DLL can be used with any other PE 628compiler/linker. 629 630@item 631Auto-import is fully compatible with standard import method, in which 632variables are decorated using attribute modifiers. Libraries of either 633type may be mixed together. 634 635@item 636Overhead (space): 8 bytes per imported symbol, plus 20 for each 637reference to it; Overhead (load time): negligible; Overhead 638(virtual/physical memory): should be less than effect of DLL 639relocation. 640@end enumerate 641 642Motivation 643 644The obvious and only way to get rid of dllimport insanity is 645to make client access variable directly in the DLL, bypassing 646the extra dereference imposed by ordinary DLL runtime linking. 647I.e., whenever client contains something like 648 649@code{mov dll_var,%eax,} 650 651address of dll_var in the command should be relocated to point 652into loaded DLL. The aim is to make OS loader do so, and than 653make ld help with that. Import section of PE made following 654way: there's a vector of structures each describing imports 655from particular DLL. Each such structure points to two other 656parallel vectors: one holding imported names, and one which 657will hold address of corresponding imported name. So, the 658solution is de-vectorize these structures, making import 659locations be sparse and pointing directly into code. 660 661Implementation 662 663For each reference of data symbol to be imported from DLL (to 664set of which belong symbols with name <sym>, if __imp_<sym> is 665found in implib), the import fixup entry is generated. That 666entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3 667subsection. Each fixup entry contains pointer to symbol's address 668within .text section (marked with __fuN_<sym> symbol, where N is 669integer), pointer to DLL name (so, DLL name is referenced by 670multiple entries), and pointer to symbol name thunk. Symbol name 671thunk is singleton vector (__nm_th_<symbol>) pointing to 672IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing 673imported name. Here comes that "om the edge" problem mentioned above: 674PE specification rambles that name vector (OriginalFirstThunk) should 675run in parallel with addresses vector (FirstThunk), i.e. that they 676should have same number of elements and terminated with zero. We violate 677this, since FirstThunk points directly into machine code. But in 678practice, OS loader implemented the sane way: it goes thru 679OriginalFirstThunk and puts addresses to FirstThunk, not something 680else. It once again should be noted that dll and symbol name 681structures are reused across fixup entries and should be there 682anyway to support standard import stuff, so sustained overhead is 68320 bytes per reference. Other question is whether having several 684IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes, 685it is done even by native compiler/linker (libth32's functions are in 686fact resident in windows9x kernel32.dll, so if you use it, you have 687two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is 688whether referencing the same PE structures several times is valid. 689The answer is why not, prohibiting that (detecting violation) would 690require more work on behalf of loader than not doing it. 691 692@end table 693 694@node GNU Free Documentation License 695@chapter GNU Free Documentation License 696 697@include fdl.texi 698 699@contents 700@bye 701