1\input texinfo
2@setfilename ldint.info
3@c Copyright (C) 1992-2021 Free Software Foundation, Inc.
4
5@ifnottex
6@dircategory Software development
7@direntry
8* Ld-Internals: (ldint).	The GNU linker internals.
9@end direntry
10@end ifnottex
11
12@copying
13This file documents the internals of the GNU linker ld.
14
15Copyright @copyright{} 1992-2021 Free Software Foundation, Inc.
16Contributed by Cygnus Support.
17
18Permission is granted to copy, distribute and/or modify this document
19under the terms of the GNU Free Documentation License, Version 1.3 or
20any later version published by the Free Software Foundation; with the
21Invariant Sections being ``GNU General Public License'' and ``Funding
22Free Software'', the Front-Cover texts being (a) (see below), and with
23the Back-Cover Texts being (b) (see below).  A copy of the license is
24included in the section entitled ``GNU Free Documentation License''.
25
26(a) The FSF's Front-Cover Text is:
27
28     A GNU Manual
29
30(b) The FSF's Back-Cover Text is:
31
32     You have freedom to copy and modify this GNU Manual, like GNU
33     software.  Copies published by the Free Software Foundation raise
34     funds for GNU development.
35@end copying
36
37@iftex
38@finalout
39@setchapternewpage off
40@settitle GNU Linker Internals
41@titlepage
42@title{A guide to the internals of the GNU linker}
43@author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
44@author Cygnus Support
45@page
46
47@tex
48\def\$#1${{#1}}  % Kluge: collect RCS revision info without $...$
49\xdef\manvers{2.10.91}  % For use in headers, footers too
50{\parskip=0pt
51\hfill Cygnus Support\par
52\hfill \manvers\par
53\hfill \TeX{}info \texinfoversion\par
54}
55@end tex
56
57@vskip 0pt plus 1filll
58Copyright @copyright{} 1992-2021 Free Software Foundation, Inc.
59
60      Permission is granted to copy, distribute and/or modify this document
61      under the terms of the GNU Free Documentation License, Version 1.3
62      or any later version published by the Free Software Foundation;
63      with no Invariant Sections, with no Front-Cover Texts, and with no
64      Back-Cover Texts.  A copy of the license is included in the
65      section entitled "GNU Free Documentation License".
66
67@end titlepage
68@end iftex
69
70@node Top
71@top
72
73This file documents the internals of the GNU linker @code{ld}.  It is a
74collection of miscellaneous information with little form at this point.
75Mostly, it is a repository into which you can put information about
76GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
77
78This document is distributed under the terms of the GNU Free
79Documentation License.  A copy of the license is included in the
80section entitled "GNU Free Documentation License".
81
82@menu
83* README::			The README File
84* Emulations::			How linker emulations are generated
85* Emulation Walkthrough::	A Walkthrough of a Typical Emulation
86* Architecture Specific::	Some Architecture Specific Notes
87* GNU Free Documentation License::  GNU Free Documentation License
88@end menu
89
90@node README
91@chapter The @file{README} File
92
93Check the @file{README} file; it often has useful information that does not
94appear anywhere else in the directory.
95
96@node Emulations
97@chapter How linker emulations are generated
98
99Each linker target has an @dfn{emulation}.  The emulation includes the
100default linker script, and certain emulations also modify certain types
101of linker behaviour.
102
103Emulations are created during the build process by the shell script
104@file{genscripts.sh}.
105
106The @file{genscripts.sh} script starts by reading a file in the
107@file{emulparams} directory.  This is a shell script which sets various
108shell variables used by @file{genscripts.sh} and the other shell scripts
109it invokes.
110
111The @file{genscripts.sh} script will invoke a shell script in the
112@file{scripttempl} directory in order to create default linker scripts
113written in the linker command language.  The @file{scripttempl} script
114will be invoked 5 (or, in some cases, 6) times, with different
115assignments to shell variables, to create different default scripts.
116The choice of script is made based on the command-line options.
117
118After creating the scripts, @file{genscripts.sh} will invoke yet another
119shell script, this time in the @file{emultempl} directory.  That shell
120script will create the emulation source file, which contains C code.
121This C code permits the linker emulation to override various linker
122behaviours.  Most targets use the generic emulation code, which is in
123@file{emultempl/generic.em}.
124
125To summarize, @file{genscripts.sh} reads three shell scripts: an
126emulation parameters script in the @file{emulparams} directory, a linker
127script generation script in the @file{scripttempl} directory, and an
128emulation source file generation script in the @file{emultempl}
129directory.
130
131For example, the Sun 4 linker sets up variables in
132@file{emulparams/sun4.sh}, creates linker scripts using
133@file{scripttempl/aout.sc}, and creates the emulation code using
134@file{emultempl/sunos.em}.
135
136Note that the linker can support several emulations simultaneously,
137depending upon how it is configured.  An emulation can be selected with
138the @code{-m} option.  The @code{-V} option will list all supported
139emulations.
140
141@menu
142* emulation parameters::        @file{emulparams} scripts
143* linker scripts::              @file{scripttempl} scripts
144* linker emulations::           @file{emultempl} scripts
145@end menu
146
147@node emulation parameters
148@section @file{emulparams} scripts
149
150Each target selects a particular file in the @file{emulparams} directory
151by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
152This shell variable is used by the @file{configure} script to control
153building an emulation source file.
154
155Certain conventions are enforced.  Suppose the @code{targ_emul} variable
156is set to @var{emul} in @file{configure.tgt}.  The name of the emulation
157shell script will be @file{emulparams/@var{emul}.sh}.  The
158@file{Makefile} must have a target named @file{e@var{emul}.c}; this
159target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
160appropriate scripts in the @file{scripttempl} and @file{emultempl}
161directories.  The @file{Makefile} target must invoke @code{GENSCRIPTS}
162with two arguments: @var{emul}, and the value of the make variable
163@code{tdir_@var{emul}}.  The value of the latter variable will be set by
164the @file{configure} script, and is used to set the default target
165directory to search.
166
167By convention, the @file{emulparams/@var{emul}.sh} shell script should
168only set shell variables.  It may set shell variables which are to be
169interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
170Certain shell variables are interpreted directly by the
171@file{genscripts.sh} script.
172
173Here is a list of shell variables interpreted by @file{genscripts.sh},
174as well as some conventional shell variables interpreted by the
175@file{scripttempl} and @file{emultempl} scripts.
176
177@table @code
178@item SCRIPT_NAME
179This is the name of the @file{scripttempl} script to use.  If
180@code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
181the script @file{scripttempl/@var{script}.sc}.
182
183@item TEMPLATE_NAME
184This is the name of the @file{emultempl} script to use.  If
185@code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
186use the script @file{emultempl/@var{template}.em}.  If this variable is
187not set, the default value is @samp{generic}.
188
189@item GENERATE_SHLIB_SCRIPT
190If this is set to a nonempty string, @file{genscripts.sh} will invoke
191the @file{scripttempl} script an extra time to create a shared library
192script.  @ref{linker scripts}.
193
194@item OUTPUT_FORMAT
195This is normally set to indicate the BFD output format use (e.g.,
196@samp{"a.out-sunos-big"}.  The @file{scripttempl} script will normally
197use it in an @code{OUTPUT_FORMAT} expression in the linker script.
198
199@item ARCH
200This is normally set to indicate the architecture to use (e.g.,
201@samp{sparc}).  The @file{scripttempl} script will normally use it in an
202@code{OUTPUT_ARCH} expression in the linker script.
203
204@item ENTRY
205Some @file{scripttempl} scripts use this to set the entry address, in an
206@code{ENTRY} expression in the linker script.
207
208@item TEXT_START_ADDR
209Some @file{scripttempl} scripts use this to set the start address of the
210@samp{.text} section.
211
212@item SEGMENT_SIZE
213The @file{genscripts.sh} script uses this to set the default value of
214@code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
215
216@item TARGET_PAGE_SIZE
217If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
218uses this to define it.
219
220@item ALIGNMENT
221Some @file{scripttempl} scripts set this to a number to pass to
222@code{ALIGN} to set the required alignment for the @code{end} symbol.
223@end table
224
225@node linker scripts
226@section @file{scripttempl} scripts
227
228Each linker target uses a @file{scripttempl} script to generate the
229default linker scripts.  The name of the @file{scripttempl} script is
230set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
231If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
232invoke @file{scripttempl/@var{script}.sc}.
233
234The @file{genscripts.sh} script will invoke the @file{scripttempl}
235script 5 to 9 times.  Each time it will set the shell variable
236@code{LD_FLAG} to a different value.  When the linker is run, the
237options used will direct it to select a particular script.  (Script
238selection is controlled by the @code{get_script} emulation entry point;
239this describes the conventional behaviour).
240
241The @file{scripttempl} script should just write a linker script, written
242in the linker command language, to standard output.  If the emulation
243name--the name of the @file{emulparams} file without the @file{.sc}
244extension--is @var{emul}, then the output will be directed to
245@file{ldscripts/@var{emul}.@var{extension}} in the build directory,
246where @var{extension} changes each time the @file{scripttempl} script is
247invoked.
248
249Here is the list of values assigned to @code{LD_FLAG}.
250
251@table @code
252@item (empty)
253The script generated is used by default (when none of the following
254cases apply).  The output has an extension of @file{.x}.
255@item n
256The script generated is used when the linker is invoked with the
257@code{-n} option.  The output has an extension of @file{.xn}.
258@item N
259The script generated is used when the linker is invoked with the
260@code{-N} option.  The output has an extension of @file{.xbn}.
261@item r
262The script generated is used when the linker is invoked with the
263@code{-r} option.  The output has an extension of @file{.xr}.
264@item u
265The script generated is used when the linker is invoked with the
266@code{-Ur} option.  The output has an extension of @file{.xu}.
267@item shared
268The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
269this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
270@file{emulparams} file.  The @file{emultempl} script must arrange to use
271this script at the appropriate time, normally when the linker is invoked
272with the @code{-shared} option.  The output has an extension of
273@file{.xs}.
274@item c
275The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
276this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
277@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The
278@file{emultempl} script must arrange to use this script at the appropriate
279time, normally when the linker is invoked with the @code{-z combreloc}
280option.  The output has an extension of
281@file{.xc}.
282@item cshared
283The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
284this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
285@file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and
286@code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file.
287The @file{emultempl} script must arrange to use this script at the
288appropriate time, normally when the linker is invoked with the @code{-shared
289-z combreloc} option.  The output has an extension of @file{.xsc}.
290@item auto_import
291The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
292this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the
293@file{emulparams} file.  The @file{emultempl} script must arrange to
294use this script at the appropriate time, normally when the linker is
295invoked with the @code{--enable-auto-import} option.  The output has
296an extension of @file{.xa}.
297@end table
298
299Besides the shell variables set by the @file{emulparams} script, and the
300@code{LD_FLAG} variable, the @file{genscripts.sh} script will set
301certain variables for each run of the @file{scripttempl} script.
302
303@table @code
304@item RELOCATING
305This will be set to a non-empty string when the linker is doing a final
306relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
307
308@item CONSTRUCTING
309This will be set to a non-empty string when the linker is building
310global constructor and destructor tables (e.g., all scripts other than
311@code{-r}).
312
313@item DATA_ALIGNMENT
314This will be set to an @code{ALIGN} expression when the output should be
315page aligned, or to @samp{.} when generating the @code{-N} script.
316
317@item CREATE_SHLIB
318This will be set to a non-empty string when generating a @code{-shared}
319script.
320
321@item COMBRELOC
322This will be set to a non-empty string when generating @code{-z combreloc}
323scripts to a temporary file name which can be used during script generation.
324@end table
325
326The conventional way to write a @file{scripttempl} script is to first
327set a few shell variables, and then write out a linker script using
328@code{cat} with a here document.  The linker script will use variable
329substitutions, based on the above variables and those set in the
330@file{emulparams} script, to control its behaviour.
331
332When there are parts of the @file{scripttempl} script which should only
333be run when doing a final relocation, they should be enclosed within a
334variable substitution based on @code{RELOCATING}.  For example, on many
335targets special symbols such as @code{_end} should be defined when doing
336a final link.  Naturally, those symbols should not be defined when doing
337a relocatable link using @code{-r}.  The @file{scripttempl} script
338could use a construct like this to define those symbols:
339@smallexample
340  $@{RELOCATING+ _end = .;@}
341@end smallexample
342This will do the symbol assignment only if the @code{RELOCATING}
343variable is defined.
344
345The basic job of the linker script is to put the sections in the correct
346order, and at the correct memory addresses.  For some targets, the
347linker script may have to do some other operations.
348
349For example, on most MIPS platforms, the linker is responsible for
350defining the special symbol @code{_gp}, used to initialize the
351@code{$gp} register.  It must be set to the start of the small data
352section plus @code{0x8000}.  Naturally, it should only be defined when
353doing a final relocation.  This will typically be done like this:
354@smallexample
355  $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
356@end smallexample
357This line would appear just before the sections which compose the small
358data section (@samp{.sdata}, @samp{.sbss}).  All those sections would be
359contiguous in memory.
360
361Many COFF systems build constructor tables in the linker script.  The
362compiler will arrange to output the address of each global constructor
363in a @samp{.ctor} section, and the address of each global destructor in
364a @samp{.dtor} section (this is done by defining
365@code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
366@code{gcc} configuration files).  The @code{gcc} runtime support
367routines expect the constructor table to be named @code{__CTOR_LIST__}.
368They expect it to be a list of words, with the first word being the
369count of the number of entries.  There should be a trailing zero word.
370(Actually, the count may be -1 if the trailing word is present, and the
371trailing word may be omitted if the count is correct, but, as the
372@code{gcc} behaviour has changed slightly over the years, it is safest
373to provide both).  Here is a typical way that might be handled in a
374@file{scripttempl} file.
375@smallexample
376    $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
377    $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
378    $@{CONSTRUCTING+ *(.ctors)@}
379    $@{CONSTRUCTING+ LONG(0)@}
380    $@{CONSTRUCTING+ __CTOR_END__ = .;@}
381    $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
382    $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
383    $@{CONSTRUCTING+ *(.dtors)@}
384    $@{CONSTRUCTING+ LONG(0)@}
385    $@{CONSTRUCTING+ __DTOR_END__ = .;@}
386@end smallexample
387The use of @code{CONSTRUCTING} ensures that these linker script commands
388will only appear when the linker is supposed to be building the
389constructor and destructor tables.  This example is written for a target
390which uses 4 byte pointers.
391
392Embedded systems often need to set a stack address.  This is normally
393best done by using the @code{PROVIDE} construct with a default stack
394address.  This permits the user to easily override the stack address
395using the @code{--defsym} option.  Here is an example:
396@smallexample
397  $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
398@end smallexample
399The value of the symbol @code{__stack} would then be used in the startup
400code to initialize the stack pointer.
401
402@node linker emulations
403@section @file{emultempl} scripts
404
405Each linker target uses an @file{emultempl} script to generate the
406emulation code.  The name of the @file{emultempl} script is set by the
407@code{TEMPLATE_NAME} variable in the @file{emulparams} script.  If the
408@code{TEMPLATE_NAME} variable is not set, the default is
409@samp{generic}.  If the value of @code{TEMPLATE_NAME} is @var{template},
410@file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
411
412Most targets use the generic @file{emultempl} script,
413@file{emultempl/generic.em}.  A different @file{emultempl} script is
414only needed if the linker must support unusual actions, such as linking
415against shared libraries.
416
417The @file{emultempl} script is normally written as a simple invocation
418of @code{cat} with a here document.  The document will use a few
419variable substitutions.  Typically each function names uses a
420substitution involving @code{EMULATION_NAME}, for ease of debugging when
421the linker supports multiple emulations.
422
423Every function and variable in the emitted file should be static.  The
424only globally visible object must be named
425@code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
426the name of the emulation set in @file{configure.tgt} (this is also the
427name of the @file{emulparams} file without the @file{.sh} extension).
428The @file{genscripts.sh} script will set the shell variable
429@code{EMULATION_NAME} before invoking the @file{emultempl} script.
430
431The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
432@code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
433It defines a set of function pointers which are invoked by the linker,
434as well as strings for the emulation name (normally set from the shell
435variable @code{EMULATION_NAME} and the default BFD target name (normally
436set from the shell variable @code{OUTPUT_FORMAT} which is normally set
437by the @file{emulparams} file).
438
439The @file{genscripts.sh} script will set the shell variable
440@code{COMPILE_IN} when it invokes the @file{emultempl} script for the
441default emulation.  In this case, the @file{emultempl} script should
442include the linker scripts directly, and return them from the
443@code{get_scripts} entry point.  When the emulation is not the default,
444the @code{get_scripts} entry point should just return a file name.  See
445@file{emultempl/generic.em} for an example of how this is done.
446
447At some point, the linker emulation entry points should be documented.
448
449@node Emulation Walkthrough
450@chapter A Walkthrough of a Typical Emulation
451
452This chapter is to help people who are new to the way emulations
453interact with the linker, or who are suddenly thrust into the position
454of having to work with existing emulations.  It will discuss the files
455you need to be aware of.  It will tell you when the given "hooks" in
456the emulation will be called.  It will, hopefully, give you enough
457information about when and how things happen that you'll be able to
458get by.  As always, the source is the definitive reference to this.
459
460The starting point for the linker is in @file{ldmain.c} where
461@code{main} is defined.  The bulk of the code that's emulation
462specific will initially be in @code{emultempl/@var{emulation}.em} but
463will end up in @code{e@var{emulation}.c} when the build is done.
464Most of the work to select and interface with emulations is in
465@code{ldemul.h} and @code{ldemul.c}.  Specifically, @code{ldemul.h}
466defines the @code{ld_emulation_xfer_struct} structure your emulation
467exports.
468
469Your emulation file exports a symbol
470@code{ld_@var{EMULATION_NAME}_emulation}.  If your emulation is
471selected (it usually is, since usually there's only one),
472@code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
473@code{ldemul.c} also defines a number of API functions that interface
474to your emulation, like @code{ldemul_after_parse} which simply calls
475your @code{ld_@var{EMULATION}_emulation.after_parse} function.  For
476the rest of this section, the functions will be mentioned, but you
477should assume the indirect reference to your emulation also.
478
479We will also skip or gloss over parts of the link process that don't
480relate to emulations, like setting up internationalization.
481
482After initialization, @code{main} selects an emulation by pre-scanning
483the command-line arguments.  It calls @code{ldemul_choose_target} to
484choose a target.  If you set @code{choose_target} to
485@code{ldemul_default_target}, it picks your @code{target_name} by
486default.
487
488@code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
489@code{parse_args} calls @code{ldemul_parse_args} for each arg, which
490must update the @code{getopt} globals if it recognizes the argument.
491If the emulation doesn't recognize it, then parse_args checks to see
492if it recognizes it.
493
494Now that the emulation has had access to all its command-line options,
495@code{main} calls @code{ldemul_set_symbols}.  This can be used for any
496initialization that may be affected by options.  It is also supposed
497to set up any variables needed by the emulation script.
498
499@code{main} now calls @code{ldemul_get_script} to get the emulation
500script to use (based on arguments, no doubt, @pxref{Emulations}) and
501runs it.  While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
502@code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
503commands.  It may call @code{ldemul_unrecognized_file} if you asked
504the linker to link a file it doesn't recognize.  It will call
505@code{ldemul_recognized_file} for each file it does recognize, in case
506the emulation wants to handle some files specially.  All the while,
507it's loading the files (possibly calling
508@code{ldemul_open_dynamic_archive}) and symbols and stuff.  After it's
509done reading the script, @code{main} calls @code{ldemul_after_parse}.
510Use the after-parse hook to set up anything that depends on stuff the
511script might have set up, like the entry point.
512
513@code{main} next calls @code{lang_process} in @code{ldlang.c}.  This
514appears to be the main core of the linking itself, as far as emulation
515hooks are concerned(*).  It first opens the output file's BFD, calling
516@code{ldemul_set_output_arch}, and calls
517@code{ldemul_create_output_section_statements} in case you need to use
518other means to find or create object files (i.e. shared libraries
519found on a path, or fake stub objects).  Despite the name, nobody
520creates output sections here.
521
522(*) In most cases, the BFD library does the bulk of the actual
523linking, handling symbol tables, symbol resolution, relocations, and
524building the final output file.  See the BFD reference for all the
525details.  Your emulation is usually concerned more with managing
526things at the file and section level, like "put this here, add this
527section", etc.
528
529Next, the objects to be linked are opened and BFDs created for them,
530and @code{ldemul_after_open} is called.  At this point, you have all
531the objects and symbols loaded, but none of the data has been placed
532yet.
533
534Next comes the Big Linking Thingy (except for the parts BFD does).
535All input sections are mapped to output sections according to the
536script.  If a section doesn't get mapped by default,
537@code{ldemul_place_orphan} will get called to figure out where it goes.
538Next it figures out the offsets for each section, calling
539@code{ldemul_before_allocation} before and
540@code{ldemul_after_allocation} after deciding where each input section
541ends up in the output sections.
542
543The last part of @code{lang_process} is to figure out all the symbols'
544values.  After assigning final values to the symbols,
545@code{ldemul_finish} is called, and after that, any undefined symbols
546are turned into fatal errors.
547
548OK, back to @code{main}, which calls @code{ldwrite} in
549@file{ldwrite.c}.  @code{ldwrite} calls BFD's final_link, which does
550all the relocation fixups and writes the output bfd to disk, and we're
551done.
552
553In summary,
554
555@itemize @bullet
556
557@item @code{main()} in @file{ldmain.c}
558@item @file{emultempl/@var{EMULATION}.em} has your code
559@item @code{ldemul_choose_target} (defaults to your @code{target_name})
560@item @code{ldemul_before_parse}
561@item Parse argv, calls @code{ldemul_parse_args} for each
562@item @code{ldemul_set_symbols}
563@item @code{ldemul_get_script}
564@item parse script
565
566@itemize @bullet
567@item may call @code{ldemul_hll} or @code{ldemul_syslib}
568@item may call @code{ldemul_open_dynamic_archive}
569@end itemize
570
571@item @code{ldemul_after_parse}
572@item @code{lang_process()} in @file{ldlang.c}
573
574@itemize @bullet
575@item create @code{output_bfd}
576@item @code{ldemul_set_output_arch}
577@item @code{ldemul_create_output_section_statements}
578@item read objects, create input bfds - all symbols exist, but have no values
579@item may call @code{ldemul_unrecognized_file}
580@item will call @code{ldemul_recognized_file}
581@item @code{ldemul_after_open}
582@item map input sections to output sections
583@item may call @code{ldemul_place_orphan} for remaining sections
584@item @code{ldemul_before_allocation}
585@item gives input sections offsets into output sections, places output sections
586@item @code{ldemul_after_allocation} - section addresses valid
587@item assigns values to symbols
588@item @code{ldemul_finish} - symbol values valid
589@end itemize
590
591@item output bfd is written to disk
592
593@end itemize
594
595@node Architecture Specific
596@chapter Some Architecture Specific Notes
597
598This is the place for notes on the behavior of @code{ld} on
599specific platforms.  Currently, only Intel x86 is documented (and
600of that, only the auto-import behavior for DLLs).
601
602@menu
603* ix86::                        Intel x86
604@end menu
605
606@node ix86
607@section Intel x86
608
609@table @emph
610@code{ld} can create DLLs that operate with various runtimes available
611on a common x86 operating system.  These runtimes include native (using
612the mingw "platform"), cygwin, and pw.
613
614@item auto-import from DLLs
615@enumerate
616@item
617With this feature on, DLL clients can import variables from DLL
618without any concern from their side (for example, without any source
619code modifications).  Auto-import can be enabled using the
620@code{--enable-auto-import} flag, or disabled via the
621@code{--disable-auto-import} flag.  Auto-import is disabled by default.
622
623@item
624This is done completely in bounds of the PE specification (to be fair,
625there's a minor violation of the spec at one point, but in practice
626auto-import works on all known variants of that common x86 operating
627system)  So, the resulting DLL can be used with any other PE
628compiler/linker.
629
630@item
631Auto-import is fully compatible with standard import method, in which
632variables are decorated using attribute modifiers. Libraries of either
633type may be mixed together.
634
635@item
636Overhead (space): 8 bytes per imported symbol, plus 20 for each
637reference to it; Overhead (load time): negligible; Overhead
638(virtual/physical memory): should be less than effect of DLL
639relocation.
640@end enumerate
641
642Motivation
643
644The obvious and only way to get rid of dllimport insanity is
645to make client access variable directly in the DLL, bypassing
646the extra dereference imposed by ordinary DLL runtime linking.
647I.e., whenever client contains something like
648
649@code{mov dll_var,%eax,}
650
651address of dll_var in the command should be relocated to point
652into loaded DLL. The aim is to make OS loader do so, and than
653make ld help with that.  Import section of PE made following
654way: there's a vector of structures each describing imports
655from particular DLL. Each such structure points to two other
656parallel vectors: one holding imported names, and one which
657will hold address of corresponding imported name. So, the
658solution is de-vectorize these structures, making import
659locations be sparse and pointing directly into code.
660
661Implementation
662
663For each reference of data symbol to be imported from DLL (to
664set of which belong symbols with name <sym>, if __imp_<sym> is
665found in implib), the import fixup entry is generated. That
666entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3
667subsection. Each fixup entry contains pointer to symbol's address
668within .text section (marked with __fuN_<sym> symbol, where N is
669integer), pointer to DLL name (so, DLL name is referenced by
670multiple entries), and pointer to symbol name thunk. Symbol name
671thunk is singleton vector (__nm_th_<symbol>) pointing to
672IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing
673imported name. Here comes that "om the edge" problem mentioned above:
674PE specification rambles that name vector (OriginalFirstThunk) should
675run in parallel with addresses vector (FirstThunk), i.e. that they
676should have same number of elements and terminated with zero. We violate
677this, since FirstThunk points directly into machine code. But in
678practice, OS loader implemented the sane way: it goes thru
679OriginalFirstThunk and puts addresses to FirstThunk, not something
680else. It once again should be noted that dll and symbol name
681structures are reused across fixup entries and should be there
682anyway to support standard import stuff, so sustained overhead is
68320 bytes per reference. Other question is whether having several
684IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes,
685it is done even by native compiler/linker (libth32's functions are in
686fact resident in windows9x kernel32.dll, so if you use it, you have
687two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is
688whether referencing the same PE structures several times is valid.
689The answer is why not, prohibiting that (detecting violation) would
690require more work on behalf of loader than not doing it.
691
692@end table
693
694@node GNU Free Documentation License
695@chapter GNU Free Documentation License
696
697@include fdl.texi
698
699@contents
700@bye
701