1*404b540aSrobert		Arm / Thumb Interworking
2*404b540aSrobert		========================
3*404b540aSrobert
4*404b540aSrobertThe Cygnus GNU Pro Toolkit for the ARM7T processor supports function
5*404b540aSrobertcalls between code compiled for the ARM instruction set and code
6*404b540aSrobertcompiled for the Thumb instruction set and vice versa.  This document
7*404b540aSrobertdescribes how that interworking support operates and explains the
8*404b540aSrobertcommand line switches that should be used in order to produce working
9*404b540aSrobertprograms.
10*404b540aSrobert
11*404b540aSrobertNote:  The Cygnus GNU Pro Toolkit does not support switching between
12*404b540aSrobertcompiling for the ARM instruction set and the Thumb instruction set
13*404b540aSroberton anything other than a per file basis.  There are in fact two
14*404b540aSrobertcompletely separate compilers, one that produces ARM assembler
15*404b540aSrobertinstructions and one that produces Thumb assembler instructions.  The
16*404b540aSroberttwo compilers share the same assembler, linker and so on.
17*404b540aSrobert
18*404b540aSrobert
19*404b540aSrobert1. Explicit interworking support for C and C++ files
20*404b540aSrobert====================================================
21*404b540aSrobert
22*404b540aSrobertBy default if a file is compiled without any special command line
23*404b540aSrobertswitches then the code produced will not support interworking.
24*404b540aSrobertProvided that a program is made up entirely from object files and
25*404b540aSrobertlibraries produced in this way and which contain either exclusively
26*404b540aSrobertARM instructions or exclusively Thumb instructions then this will not
27*404b540aSrobertmatter and a working executable will be created.  If an attempt is
28*404b540aSrobertmade to link together mixed ARM and Thumb object files and libraries,
29*404b540aSrobertthen warning messages will be produced by the linker and a non-working
30*404b540aSrobertexecutable will be created.
31*404b540aSrobert
32*404b540aSrobertIn order to produce code which does support interworking it should be
33*404b540aSrobertcompiled with the
34*404b540aSrobert
35*404b540aSrobert	-mthumb-interwork
36*404b540aSrobert
37*404b540aSrobertcommand line option.  Provided that a program is made up entirely from
38*404b540aSrobertobject files and libraries built with this command line switch a
39*404b540aSrobertworking executable will be produced, even if both ARM and Thumb
40*404b540aSrobertinstructions are used by the various components of the program.  (No
41*404b540aSrobertwarning messages will be produced by the linker either).
42*404b540aSrobert
43*404b540aSrobertNote that specifying -mthumb-interwork does result in slightly larger,
44*404b540aSrobertslower code being produced.  This is why interworking support must be
45*404b540aSrobertspecifically enabled by a switch.
46*404b540aSrobert
47*404b540aSrobert
48*404b540aSrobert2. Explicit interworking support for assembler files
49*404b540aSrobert====================================================
50*404b540aSrobert
51*404b540aSrobertIf assembler files are to be included into an interworking program
52*404b540aSrobertthen the following rules must be obeyed:
53*404b540aSrobert
54*404b540aSrobert	* Any externally visible functions must return by using the BX
55*404b540aSrobert	instruction.
56*404b540aSrobert
57*404b540aSrobert	* Normal function calls can just use the BL instruction.  The
58*404b540aSrobert	linker will automatically insert code to switch between ARM
59*404b540aSrobert	and Thumb modes as necessary.
60*404b540aSrobert
61*404b540aSrobert	* Calls via function pointers should use the BX instruction if
62*404b540aSrobert	the call is made in ARM mode:
63*404b540aSrobert
64*404b540aSrobert		.code 32
65*404b540aSrobert		mov lr, pc
66*404b540aSrobert		bx  rX
67*404b540aSrobert
68*404b540aSrobert	This code sequence will not work in Thumb mode however, since
69*404b540aSrobert	the mov instruction will not set the bottom bit of the lr
70*404b540aSrobert	register.  Instead a branch-and-link to the _call_via_rX
71*404b540aSrobert	functions should be used instead:
72*404b540aSrobert
73*404b540aSrobert		.code 16
74*404b540aSrobert		bl  _call_via_rX
75*404b540aSrobert
76*404b540aSrobert	where rX is replaced by the name of the register containing
77*404b540aSrobert	the function address.
78*404b540aSrobert
79*404b540aSrobert	* All externally visible functions which should be entered in
80*404b540aSrobert	Thumb mode must have the .thumb_func pseudo op specified just
81*404b540aSrobert	before their entry point.  e.g.:
82*404b540aSrobert
83*404b540aSrobert			.code 16
84*404b540aSrobert			.global function
85*404b540aSrobert			.thumb_func
86*404b540aSrobert		function:
87*404b540aSrobert			...start of function....
88*404b540aSrobert
89*404b540aSrobert	* All assembler files must be assembled with the switch
90*404b540aSrobert	-mthumb-interwork specified on the command line.  (If the file
91*404b540aSrobert	is assembled by calling gcc it will automatically pass on the
92*404b540aSrobert	-mthumb-interwork switch to the assembler, provided that it
93*404b540aSrobert	was specified on the gcc command line in the first place.)
94*404b540aSrobert
95*404b540aSrobert
96*404b540aSrobert3. Support for old, non-interworking aware code.
97*404b540aSrobert================================================
98*404b540aSrobert
99*404b540aSrobertIf it is necessary to link together code produced by an older,
100*404b540aSrobertnon-interworking aware compiler, or code produced by the new compiler
101*404b540aSrobertbut without the -mthumb-interwork command line switch specified, then
102*404b540aSrobertthere are two command line switches that can be used to support this.
103*404b540aSrobert
104*404b540aSrobertThe switch
105*404b540aSrobert
106*404b540aSrobert	-mcaller-super-interworking
107*404b540aSrobert
108*404b540aSrobertwill allow calls via function pointers in Thumb mode to work,
109*404b540aSrobertregardless of whether the function pointer points to old,
110*404b540aSrobertnon-interworking aware code or not.  Specifying this switch does
111*404b540aSrobertproduce slightly slower code however.
112*404b540aSrobert
113*404b540aSrobertNote:  There is no switch to allow calls via function pointers in ARM
114*404b540aSrobertmode to be handled specially.  Calls via function pointers from
115*404b540aSrobertinterworking aware ARM code to non-interworking aware ARM code work
116*404b540aSrobertwithout any special considerations by the compiler.  Calls via
117*404b540aSrobertfunction pointers from interworking aware ARM code to non-interworking
118*404b540aSrobertaware Thumb code however will not work.  (Actually under some
119*404b540aSrobertcircumstances they may work, but there are no guarantees).  This is
120*404b540aSrobertbecause only the new compiler is able to produce Thumb code, and this
121*404b540aSrobertcompiler already has a command line switch to produce interworking
122*404b540aSrobertaware code.
123*404b540aSrobert
124*404b540aSrobert
125*404b540aSrobertThe switch
126*404b540aSrobert
127*404b540aSrobert	-mcallee-super-interworking
128*404b540aSrobert
129*404b540aSrobertwill allow non-interworking aware ARM or Thumb code to call Thumb
130*404b540aSrobertfunctions, either directly or via function pointers.  Specifying this
131*404b540aSrobertswitch does produce slightly larger, slower code however.
132*404b540aSrobert
133*404b540aSrobertNote:  There is no switch to allow non-interworking aware ARM or Thumb
134*404b540aSrobertcode to call ARM functions.  There is no need for any special handling
135*404b540aSrobertof calls from non-interworking aware ARM code to interworking aware
136*404b540aSrobertARM functions, they just work normally.  Calls from non-interworking
137*404b540aSrobertaware Thumb functions to ARM code however, will not work.  There is no
138*404b540aSrobertoption to support this, since it is always possible to recompile the
139*404b540aSrobertThumb code to be interworking aware.
140*404b540aSrobert
141*404b540aSrobertAs an alternative to the command line switch
142*404b540aSrobert-mcallee-super-interworking, which affects all externally visible
143*404b540aSrobertfunctions in a file, it is possible to specify an attribute or
144*404b540aSrobertdeclspec for individual functions, indicating that that particular
145*404b540aSrobertfunction should support being called by non-interworking aware code.
146*404b540aSrobertThe function should be defined like this:
147*404b540aSrobert
148*404b540aSrobert	int __attribute__((interfacearm)) function
149*404b540aSrobert	{
150*404b540aSrobert		... body of function ...
151*404b540aSrobert	}
152*404b540aSrobert
153*404b540aSrobertor
154*404b540aSrobert
155*404b540aSrobert	int __declspec(interfacearm) function
156*404b540aSrobert	{
157*404b540aSrobert		... body of function ...
158*404b540aSrobert	}
159*404b540aSrobert
160*404b540aSrobert
161*404b540aSrobert
162*404b540aSrobert4. Interworking support in dlltool
163*404b540aSrobert==================================
164*404b540aSrobert
165*404b540aSrobertIt is possible to create DLLs containing mixed ARM and Thumb code.  It
166*404b540aSrobertis also possible to call Thumb code in a DLL from an ARM program and
167*404b540aSrobertvice versa.  It is even possible to call ARM DLLs that have been compiled
168*404b540aSrobertwithout interworking support (say by an older version of the compiler),
169*404b540aSrobertfrom Thumb programs and still have things work properly.
170*404b540aSrobert
171*404b540aSrobert   A version of the `dlltool' program which supports the `--interwork'
172*404b540aSrobertcommand line switch is needed, as well as the following special
173*404b540aSrobertconsiderations when building programs and DLLs:
174*404b540aSrobert
175*404b540aSrobert*Use `-mthumb-interwork'*
176*404b540aSrobert     When compiling files for a DLL or a program the `-mthumb-interwork'
177*404b540aSrobert     command line switch should be specified if calling between ARM and
178*404b540aSrobert     Thumb code can happen.  If a program is being compiled and the
179*404b540aSrobert     mode of the DLLs that it uses is not known, then it should be
180*404b540aSrobert     assumed that interworking might occur and the switch used.
181*404b540aSrobert
182*404b540aSrobert*Use `-m thumb'*
183*404b540aSrobert     If the exported functions from a DLL are all Thumb encoded then the
184*404b540aSrobert     `-m thumb' command line switch should be given to dlltool when
185*404b540aSrobert     building the stubs.  This will make dlltool create Thumb encoded
186*404b540aSrobert     stubs, rather than its default of ARM encoded stubs.
187*404b540aSrobert
188*404b540aSrobert     If the DLL consists of both exported Thumb functions and exported
189*404b540aSrobert     ARM functions then the `-m thumb' switch should not be used.
190*404b540aSrobert     Instead the Thumb functions in the DLL should be compiled with the
191*404b540aSrobert     `-mcallee-super-interworking' switch, or with the `interfacearm'
192*404b540aSrobert     attribute specified on their prototypes.  In this way they will be
193*404b540aSrobert     given ARM encoded prologues, which will work with the ARM encoded
194*404b540aSrobert     stubs produced by dlltool.
195*404b540aSrobert
196*404b540aSrobert*Use `-mcaller-super-interworking'*
197*404b540aSrobert     If it is possible for Thumb functions in a DLL to call
198*404b540aSrobert     non-interworking aware code via a function pointer, then the Thumb
199*404b540aSrobert     code must be compiled with the `-mcaller-super-interworking'
200*404b540aSrobert     command line switch.  This will force the function pointer calls
201*404b540aSrobert     to use the _interwork_call_via_rX stub functions which will
202*404b540aSrobert     correctly restore Thumb mode upon return from the called function.
203*404b540aSrobert
204*404b540aSrobert*Link with `libgcc.a'*
205*404b540aSrobert     When the dll is built it may have to be linked with the GCC
206*404b540aSrobert     library (`libgcc.a') in order to extract the _call_via_rX functions
207*404b540aSrobert     or the _interwork_call_via_rX functions.  This represents a partial
208*404b540aSrobert     redundancy since the same functions *may* be present in the
209*404b540aSrobert     application itself, but since they only take up 372 bytes this
210*404b540aSrobert     should not be too much of a consideration.
211*404b540aSrobert
212*404b540aSrobert*Use `--support-old-code'*
213*404b540aSrobert     When linking a program with an old DLL which does not support
214*404b540aSrobert     interworking, the `--support-old-code' command line switch to the
215*404b540aSrobert     linker should be used.   This causes the linker to generate special
216*404b540aSrobert     interworking stubs which can cope with old, non-interworking aware
217*404b540aSrobert     ARM code, at the cost of generating bulkier code.  The linker will
218*404b540aSrobert     still generate a warning message along the lines of:
219*404b540aSrobert       "Warning: input file XXX does not support interworking, whereas YYY does."
220*404b540aSrobert     but this can now be ignored because the --support-old-code switch
221*404b540aSrobert     has been used.
222*404b540aSrobert
223*404b540aSrobert
224*404b540aSrobert
225*404b540aSrobert5. How interworking support works
226*404b540aSrobert=================================
227*404b540aSrobert
228*404b540aSrobertSwitching between the ARM and Thumb instruction sets is accomplished
229*404b540aSrobertvia the BX instruction which takes as an argument a register name.
230*404b540aSrobertControl is transfered to the address held in this register (with the
231*404b540aSrobertbottom bit masked out), and if the bottom bit is set, then Thumb
232*404b540aSrobertinstruction processing is enabled, otherwise ARM instruction
233*404b540aSrobertprocessing is enabled.
234*404b540aSrobert
235*404b540aSrobertWhen the -mthumb-interwork command line switch is specified, gcc
236*404b540aSrobertarranges for all functions to return to their caller by using the BX
237*404b540aSrobertinstruction.  Thus provided that the return address has the bottom bit
238*404b540aSrobertcorrectly initialized to indicate the instruction set of the caller,
239*404b540aSrobertcorrect operation will ensue.
240*404b540aSrobert
241*404b540aSrobertWhen a function is called explicitly (rather than via a function
242*404b540aSrobertpointer), the compiler generates a BL instruction to do this.  The
243*404b540aSrobertThumb version of the BL instruction has the special property of
244*404b540aSrobertsetting the bottom bit of the LR register after it has stored the
245*404b540aSrobertreturn address into it, so that a future BX instruction will correctly
246*404b540aSrobertreturn the instruction after the BL instruction, in Thumb mode.
247*404b540aSrobert
248*404b540aSrobertThe BL instruction does not change modes itself however, so if an ARM
249*404b540aSrobertfunction is calling a Thumb function, or vice versa, it is necessary
250*404b540aSrobertto generate some extra instructions to handle this.  This is done in
251*404b540aSrobertthe linker when it is storing the address of the referenced function
252*404b540aSrobertinto the BL instruction.  If the BL instruction is an ARM style BL
253*404b540aSrobertinstruction, but the referenced function is a Thumb function, then the
254*404b540aSrobertlinker automatically generates a calling stub that converts from ARM
255*404b540aSrobertmode to Thumb mode, puts the address of this stub into the BL
256*404b540aSrobertinstruction, and puts the address of the referenced function into the
257*404b540aSrobertstub.  Similarly if the BL instruction is a Thumb BL instruction, and
258*404b540aSrobertthe referenced function is an ARM function, the linker generates a
259*404b540aSrobertstub which converts from Thumb to ARM mode, puts the address of this
260*404b540aSrobertstub into the BL instruction, and the address of the referenced
261*404b540aSrobertfunction into the stub.
262*404b540aSrobert
263*404b540aSrobertThis is why it is necessary to mark Thumb functions with the
264*404b540aSrobert.thumb_func pseudo op when creating assembler files.  This pseudo op
265*404b540aSrobertallows the assembler to distinguish between ARM functions and Thumb
266*404b540aSrobertfunctions.  (The Thumb version of GCC automatically generates these
267*404b540aSrobertpseudo ops for any Thumb functions that it generates).
268*404b540aSrobert
269*404b540aSrobertCalls via function pointers work differently.  Whenever the address of
270*404b540aSroberta function is taken, the linker examines the type of the function
271*404b540aSrobertbeing referenced.  If the function is a Thumb function, then it sets
272*404b540aSrobertthe bottom bit of the address.  Technically this makes the address
273*404b540aSrobertincorrect, since it is now one byte into the start of the function,
274*404b540aSrobertbut this is never a problem because:
275*404b540aSrobert
276*404b540aSrobert	a. with interworking enabled all calls via function pointer
277*404b540aSrobert	   are done using the BX instruction and this ignores the
278*404b540aSrobert	   bottom bit when computing where to go to.
279*404b540aSrobert
280*404b540aSrobert	b. the linker will always set the bottom bit when the address
281*404b540aSrobert	   of the function is taken, so it is never possible to take
282*404b540aSrobert	   the address of the function in two different places and
283*404b540aSrobert	   then compare them and find that they are not equal.
284*404b540aSrobert
285*404b540aSrobertAs already mentioned any call via a function pointer will use the BX
286*404b540aSrobertinstruction (provided that interworking is enabled).  The only problem
287*404b540aSrobertwith this is computing the return address for the return from the
288*404b540aSrobertcalled function.  For ARM code this can easily be done by the code
289*404b540aSrobertsequence:
290*404b540aSrobert
291*404b540aSrobert	mov	lr, pc
292*404b540aSrobert	bx	rX
293*404b540aSrobert
294*404b540aSrobert(where rX is the name of the register containing the function
295*404b540aSrobertpointer).  This code does not work for the Thumb instruction set,
296*404b540aSrobertsince the MOV instruction will not set the bottom bit of the LR
297*404b540aSrobertregister, so that when the called function returns, it will return in
298*404b540aSrobertARM mode not Thumb mode.  Instead the compiler generates this
299*404b540aSrobertsequence:
300*404b540aSrobert
301*404b540aSrobert	bl	_call_via_rX
302*404b540aSrobert
303*404b540aSrobert(again where rX is the name if the register containing the function
304*404b540aSrobertpointer).  The special call_via_rX functions look like this:
305*404b540aSrobert
306*404b540aSrobert	.thumb_func
307*404b540aSrobert_call_via_r0:
308*404b540aSrobert	bx	r0
309*404b540aSrobert	nop
310*404b540aSrobert
311*404b540aSrobertThe BL instruction ensures that the correct return address is stored
312*404b540aSrobertin the LR register and then the BX instruction jumps to the address
313*404b540aSrobertstored in the function pointer, switch modes if necessary.
314*404b540aSrobert
315*404b540aSrobert
316*404b540aSrobert6. How caller-super-interworking support works
317*404b540aSrobert==============================================
318*404b540aSrobert
319*404b540aSrobertWhen the -mcaller-super-interworking command line switch is specified
320*404b540aSrobertit changes the code produced by the Thumb compiler so that all calls
321*404b540aSrobertvia function pointers (including virtual function calls) now go via a
322*404b540aSrobertdifferent stub function.  The code to call via a function pointer now
323*404b540aSrobertlooks like this:
324*404b540aSrobert
325*404b540aSrobert	bl _interwork_call_via_r0
326*404b540aSrobert
327*404b540aSrobertNote: The compiler does not insist that r0 be used to hold the
328*404b540aSrobertfunction address.  Any register will do, and there are a suite of stub
329*404b540aSrobertfunctions, one for each possible register.  The stub functions look
330*404b540aSrobertlike this:
331*404b540aSrobert
332*404b540aSrobert	.code 16
333*404b540aSrobert	.thumb_func
334*404b540aSrobert_interwork_call_via_r0
335*404b540aSrobert	bx 	pc
336*404b540aSrobert	nop
337*404b540aSrobert
338*404b540aSrobert	.code 32
339*404b540aSrobert	tst	r0, #1
340*404b540aSrobert	stmeqdb	r13!, {lr}
341*404b540aSrobert	adreq	lr, _arm_return
342*404b540aSrobert	bx	r0
343*404b540aSrobert
344*404b540aSrobertThe stub first switches to ARM mode, since it is a lot easier to
345*404b540aSrobertperform the necessary operations using ARM instructions.  It then
346*404b540aSroberttests the bottom bit of the register containing the address of the
347*404b540aSrobertfunction to be called.  If this bottom bit is set then the function
348*404b540aSrobertbeing called uses Thumb instructions and the BX instruction to come
349*404b540aSrobertwill switch back into Thumb mode before calling this function.  (Note
350*404b540aSrobertthat it does not matter how this called function chooses to return to
351*404b540aSrobertits caller, since the both the caller and callee are Thumb functions,
352*404b540aSrobertand mode switching is necessary).  If the function being called is an
353*404b540aSrobertARM mode function however, the stub pushes the return address (with
354*404b540aSrobertits bottom bit set) onto the stack, replaces the return address with
355*404b540aSrobertthe address of the a piece of code called '_arm_return' and then
356*404b540aSrobertperforms a BX instruction to call the function.
357*404b540aSrobert
358*404b540aSrobertThe '_arm_return' code looks like this:
359*404b540aSrobert
360*404b540aSrobert	.code 32
361*404b540aSrobert_arm_return:
362*404b540aSrobert	ldmia 	r13!, {r12}
363*404b540aSrobert	bx 	r12
364*404b540aSrobert	.code 16
365*404b540aSrobert
366*404b540aSrobert
367*404b540aSrobertIt simply retrieves the return address from the stack, and then
368*404b540aSrobertperforms a BX operation to return to the caller and switch back into
369*404b540aSrobertThumb mode.
370*404b540aSrobert
371*404b540aSrobert
372*404b540aSrobert7. How callee-super-interworking support works
373*404b540aSrobert==============================================
374*404b540aSrobert
375*404b540aSrobertWhen -mcallee-super-interworking is specified on the command line the
376*404b540aSrobertThumb compiler behaves as if every externally visible function that it
377*404b540aSrobertcompiles has had the (interfacearm) attribute specified for it.  What
378*404b540aSrobertthis attribute does is to put a special, ARM mode header onto the
379*404b540aSrobertfunction which forces a switch into Thumb mode:
380*404b540aSrobert
381*404b540aSrobert  without __attribute__((interfacearm)):
382*404b540aSrobert
383*404b540aSrobert		.code 16
384*404b540aSrobert		.thumb_func
385*404b540aSrobert	function:
386*404b540aSrobert		... start of function ...
387*404b540aSrobert
388*404b540aSrobert  with __attribute__((interfacearm)):
389*404b540aSrobert
390*404b540aSrobert		.code 32
391*404b540aSrobert	function:
392*404b540aSrobert		orr	r12, pc, #1
393*404b540aSrobert		bx	r12
394*404b540aSrobert
395*404b540aSrobert		.code 16
396*404b540aSrobert                .thumb_func
397*404b540aSrobert        .real_start_of_function:
398*404b540aSrobert
399*404b540aSrobert		... start of function ...
400*404b540aSrobert
401*404b540aSrobertNote that since the function now expects to be entered in ARM mode, it
402*404b540aSrobertno longer has the .thumb_func pseudo op specified for its name.
403*404b540aSrobertInstead the pseudo op is attached to a new label .real_start_of_<name>
404*404b540aSrobert(where <name> is the name of the function) which indicates the start
405*404b540aSrobertof the Thumb code.  This does have the interesting side effect in that
406*404b540aSrobertif this function is now called from a Thumb mode piece of code
407*404b540aSrobertoutside of the current file, the linker will generate a calling stub
408*404b540aSrobertto switch from Thumb mode into ARM mode, and then this is immediately
409*404b540aSrobertoverridden by the function's header which switches back into Thumb
410*404b540aSrobertmode.
411*404b540aSrobert
412*404b540aSrobertIn addition the (interfacearm) attribute also forces the function to
413*404b540aSrobertreturn by using the BX instruction, even if has not been compiled with
414*404b540aSrobertthe -mthumb-interwork command line flag, so that the correct mode will
415*404b540aSrobertbe restored upon exit from the function.
416*404b540aSrobert
417*404b540aSrobert
418*404b540aSrobert8. Some examples
419*404b540aSrobert================
420*404b540aSrobert
421*404b540aSrobert   Given these two test files:
422*404b540aSrobert
423*404b540aSrobert             int arm (void) { return 1 + thumb (); }
424*404b540aSrobert
425*404b540aSrobert             int thumb (void) { return 2 + arm (); }
426*404b540aSrobert
427*404b540aSrobert   The following pieces of assembler are produced by the ARM and Thumb
428*404b540aSrobertversion of GCC depending upon the command line options used:
429*404b540aSrobert
430*404b540aSrobert   `-O2':
431*404b540aSrobert             .code 32                               .code 16
432*404b540aSrobert             .global _arm                           .global _thumb
433*404b540aSrobert                                                    .thumb_func
434*404b540aSrobert     _arm:                                    _thumb:
435*404b540aSrobert             mov     ip, sp
436*404b540aSrobert             stmfd   sp!, {fp, ip, lr, pc}          push    {lr}
437*404b540aSrobert             sub     fp, ip, #4
438*404b540aSrobert             bl      _thumb                          bl      _arm
439*404b540aSrobert             add     r0, r0, #1                      add     r0, r0, #2
440*404b540aSrobert             ldmea   fp, {fp, sp, pc}                pop     {pc}
441*404b540aSrobert
442*404b540aSrobert   Note how the functions return without using the BX instruction.  If
443*404b540aSrobertthese files were assembled and linked together they would fail to work
444*404b540aSrobertbecause they do not change mode when returning to their caller.
445*404b540aSrobert
446*404b540aSrobert   `-O2 -mthumb-interwork':
447*404b540aSrobert
448*404b540aSrobert             .code 32                               .code 16
449*404b540aSrobert             .global _arm                           .global _thumb
450*404b540aSrobert                                                    .thumb_func
451*404b540aSrobert     _arm:                                    _thumb:
452*404b540aSrobert             mov     ip, sp
453*404b540aSrobert             stmfd   sp!, {fp, ip, lr, pc}          push    {lr}
454*404b540aSrobert             sub     fp, ip, #4
455*404b540aSrobert             bl      _thumb                         bl       _arm
456*404b540aSrobert             add     r0, r0, #1                     add      r0, r0, #2
457*404b540aSrobert             ldmea   fp, {fp, sp, lr}               pop      {r1}
458*404b540aSrobert             bx      lr                             bx       r1
459*404b540aSrobert
460*404b540aSrobert   Now the functions use BX to return their caller.  They have grown by
461*404b540aSrobert4 and 2 bytes respectively, but they can now successfully be linked
462*404b540aSroberttogether and be expect to work.  The linker will replace the
463*404b540aSrobertdestinations of the two BL instructions with the addresses of calling
464*404b540aSrobertstubs which convert to the correct mode before jumping to the called
465*404b540aSrobertfunction.
466*404b540aSrobert
467*404b540aSrobert   `-O2 -mcallee-super-interworking':
468*404b540aSrobert
469*404b540aSrobert             .code 32                               .code 32
470*404b540aSrobert             .global _arm                           .global _thumb
471*404b540aSrobert     _arm:                                    _thumb:
472*404b540aSrobert                                                    orr      r12, pc, #1
473*404b540aSrobert                                                    bx       r12
474*404b540aSrobert             mov     ip, sp                         .code 16
475*404b540aSrobert             stmfd   sp!, {fp, ip, lr, pc}          push     {lr}
476*404b540aSrobert             sub     fp, ip, #4
477*404b540aSrobert             bl      _thumb                         bl       _arm
478*404b540aSrobert             add     r0, r0, #1                     add      r0, r0, #2
479*404b540aSrobert             ldmea   fp, {fp, sp, lr}               pop      {r1}
480*404b540aSrobert             bx      lr                             bx       r1
481*404b540aSrobert
482*404b540aSrobert   The thumb function now has an ARM encoded prologue, and it no longer
483*404b540aSroberthas the `.thumb-func' pseudo op attached to it.  The linker will not
484*404b540aSrobertgenerate a calling stub for the call from arm() to thumb(), but it will
485*404b540aSrobertstill have to generate a stub for the call from thumb() to arm().  Also
486*404b540aSrobertnote how specifying `--mcallee-super-interworking' automatically
487*404b540aSrobertimplies `-mthumb-interworking'.
488*404b540aSrobert
489*404b540aSrobert
490*404b540aSrobert9. Some Function Pointer Examples
491*404b540aSrobert=================================
492*404b540aSrobert
493*404b540aSrobert   Given this test file:
494*404b540aSrobert
495*404b540aSrobert     	int func (void) { return 1; }
496*404b540aSrobert
497*404b540aSrobert     	int call (int (* ptr)(void)) { return ptr (); }
498*404b540aSrobert
499*404b540aSrobert   The following varying pieces of assembler are produced by the Thumb
500*404b540aSrobertversion of GCC depending upon the command line options used:
501*404b540aSrobert
502*404b540aSrobert   `-O2':
503*404b540aSrobert     		.code	16
504*404b540aSrobert     		.globl	_func
505*404b540aSrobert     		.thumb_func
506*404b540aSrobert     	_func:
507*404b540aSrobert     		mov	r0, #1
508*404b540aSrobert     		bx	lr
509*404b540aSrobert
510*404b540aSrobert     		.globl	_call
511*404b540aSrobert     		.thumb_func
512*404b540aSrobert     	_call:
513*404b540aSrobert     		push	{lr}
514*404b540aSrobert     		bl	__call_via_r0
515*404b540aSrobert     		pop	{pc}
516*404b540aSrobert
517*404b540aSrobert   Note how the two functions have different exit sequences.  In
518*404b540aSrobertparticular call() uses pop {pc} to return, which would not work if the
519*404b540aSrobertcaller was in ARM mode.  func() however, uses the BX instruction, even
520*404b540aSrobertthough `-mthumb-interwork' has not been specified, as this is the most
521*404b540aSrobertefficient way to exit a function when the return address is held in the
522*404b540aSrobertlink register.
523*404b540aSrobert
524*404b540aSrobert   `-O2 -mthumb-interwork':
525*404b540aSrobert
526*404b540aSrobert     		.code	16
527*404b540aSrobert     		.globl	_func
528*404b540aSrobert     		.thumb_func
529*404b540aSrobert     	_func:
530*404b540aSrobert     		mov	r0, #1
531*404b540aSrobert     		bx	lr
532*404b540aSrobert
533*404b540aSrobert     		.globl	_call
534*404b540aSrobert     		.thumb_func
535*404b540aSrobert     	_call:
536*404b540aSrobert     		push	{lr}
537*404b540aSrobert     		bl	__call_via_r0
538*404b540aSrobert     		pop	{r1}
539*404b540aSrobert     		bx	r1
540*404b540aSrobert
541*404b540aSrobert   This time both functions return by using the BX instruction.  This
542*404b540aSrobertmeans that call() is now two bytes longer and several cycles slower
543*404b540aSrobertthan the previous version.
544*404b540aSrobert
545*404b540aSrobert   `-O2 -mcaller-super-interworking':
546*404b540aSrobert     		.code	16
547*404b540aSrobert     		.globl	_func
548*404b540aSrobert     		.thumb_func
549*404b540aSrobert     	_func:
550*404b540aSrobert     		mov	r0, #1
551*404b540aSrobert     		bx	lr
552*404b540aSrobert
553*404b540aSrobert     		.globl	_call
554*404b540aSrobert     		.thumb_func
555*404b540aSrobert     	_call:
556*404b540aSrobert     		push	{lr}
557*404b540aSrobert     		bl	__interwork_call_via_r0
558*404b540aSrobert     		pop	{pc}
559*404b540aSrobert
560*404b540aSrobert   Very similar to the first (non-interworking) version, except that a
561*404b540aSrobertdifferent stub is used to call via the function pointer.  This new stub
562*404b540aSrobertwill work even if the called function is not interworking aware, and
563*404b540aSroberttries to return to call() in ARM mode.  Note that the assembly code for
564*404b540aSrobertcall() is still not interworking aware itself, and so should not be
565*404b540aSrobertcalled from ARM code.
566*404b540aSrobert
567*404b540aSrobert   `-O2 -mcallee-super-interworking':
568*404b540aSrobert
569*404b540aSrobert     		.code	32
570*404b540aSrobert     		.globl	_func
571*404b540aSrobert     	_func:
572*404b540aSrobert     		orr	r12, pc, #1
573*404b540aSrobert     		bx	r12
574*404b540aSrobert
575*404b540aSrobert     		.code	16
576*404b540aSrobert     		.globl .real_start_of_func
577*404b540aSrobert     		.thumb_func
578*404b540aSrobert     	.real_start_of_func:
579*404b540aSrobert     		mov	r0, #1
580*404b540aSrobert     		bx	lr
581*404b540aSrobert
582*404b540aSrobert     		.code	32
583*404b540aSrobert     		.globl	_call
584*404b540aSrobert     	_call:
585*404b540aSrobert     		orr	r12, pc, #1
586*404b540aSrobert     		bx	r12
587*404b540aSrobert
588*404b540aSrobert     		.code	16
589*404b540aSrobert     		.globl .real_start_of_call
590*404b540aSrobert     		.thumb_func
591*404b540aSrobert     	.real_start_of_call:
592*404b540aSrobert     		push	{lr}
593*404b540aSrobert     		bl	__call_via_r0
594*404b540aSrobert     		pop	{r1}
595*404b540aSrobert     		bx	r1
596*404b540aSrobert
597*404b540aSrobert   Now both functions have an ARM coded prologue, and both functions
598*404b540aSrobertreturn by using the BX instruction.  These functions are interworking
599*404b540aSrobertaware therefore and can safely be called from ARM code.  The code for
600*404b540aSrobertthe call() function is now 10 bytes longer than the original, non
601*404b540aSrobertinterworking aware version, an increase of over 200%.
602*404b540aSrobert
603*404b540aSrobert   If a prototype for call() is added to the source code, and this
604*404b540aSrobertprototype includes the `interfacearm' attribute:
605*404b540aSrobert
606*404b540aSrobert     	int __attribute__((interfacearm)) call (int (* ptr)(void));
607*404b540aSrobert
608*404b540aSrobert   then this code is produced (with only -O2 specified on the command
609*404b540aSrobertline):
610*404b540aSrobert
611*404b540aSrobert     		.code	16
612*404b540aSrobert     		.globl	_func
613*404b540aSrobert     		.thumb_func
614*404b540aSrobert     	_func:
615*404b540aSrobert     		mov	r0, #1
616*404b540aSrobert     		bx	lr
617*404b540aSrobert
618*404b540aSrobert     		.globl	_call
619*404b540aSrobert     		.code	32
620*404b540aSrobert     	_call:
621*404b540aSrobert     		orr	r12, pc, #1
622*404b540aSrobert     		bx	r12
623*404b540aSrobert
624*404b540aSrobert     		.code	16
625*404b540aSrobert     		.globl .real_start_of_call
626*404b540aSrobert     		.thumb_func
627*404b540aSrobert     	.real_start_of_call:
628*404b540aSrobert     		push	{lr}
629*404b540aSrobert     		bl	__call_via_r0
630*404b540aSrobert     		pop	{r1}
631*404b540aSrobert     		bx	r1
632*404b540aSrobert
633*404b540aSrobert   So now both call() and func() can be safely called via
634*404b540aSrobertnon-interworking aware ARM code.  If, when such a file is assembled,
635*404b540aSrobertthe assembler detects the fact that call() is being called by another
636*404b540aSrobertfunction in the same file, it will automatically adjust the target of
637*404b540aSrobertthe BL instruction to point to .real_start_of_call.  In this way there
638*404b540aSrobertis no need for the linker to generate a Thumb-to-ARM calling stub so
639*404b540aSrobertthat call can be entered in ARM mode.
640*404b540aSrobert
641*404b540aSrobert
642*404b540aSrobert10. How to use dlltool to build ARM/Thumb DLLs
643*404b540aSrobert==============================================
644*404b540aSrobert   Given a program (`prog.c') like this:
645*404b540aSrobert
646*404b540aSrobert             extern int func_in_dll (void);
647*404b540aSrobert
648*404b540aSrobert             int main (void) { return func_in_dll(); }
649*404b540aSrobert
650*404b540aSrobert   And a DLL source file (`dll.c') like this:
651*404b540aSrobert
652*404b540aSrobert             int func_in_dll (void) { return 1; }
653*404b540aSrobert
654*404b540aSrobert   Here is how to build the DLL and the program for a purely ARM based
655*404b540aSrobertenvironment:
656*404b540aSrobert
657*404b540aSrobert*Step One
658*404b540aSrobert     Build a `.def' file describing the DLL:
659*404b540aSrobert
660*404b540aSrobert             ; example.def
661*404b540aSrobert             ; This file describes the contents of the DLL
662*404b540aSrobert             LIBRARY     example
663*404b540aSrobert             HEAPSIZE    0x40000, 0x2000
664*404b540aSrobert             EXPORTS
665*404b540aSrobert                          func_in_dll  1
666*404b540aSrobert
667*404b540aSrobert*Step Two
668*404b540aSrobert     Compile the DLL source code:
669*404b540aSrobert
670*404b540aSrobert            arm-pe-gcc -O2 -c dll.c
671*404b540aSrobert
672*404b540aSrobert*Step Three
673*404b540aSrobert     Use `dlltool' to create an exports file and a library file:
674*404b540aSrobert
675*404b540aSrobert            dlltool --def example.def --output-exp example.o --output-lib example.a
676*404b540aSrobert
677*404b540aSrobert*Step Four
678*404b540aSrobert     Link together the complete DLL:
679*404b540aSrobert
680*404b540aSrobert            arm-pe-ld dll.o example.o -o example.dll
681*404b540aSrobert
682*404b540aSrobert*Step Five
683*404b540aSrobert     Compile the program's source code:
684*404b540aSrobert
685*404b540aSrobert            arm-pe-gcc -O2 -c prog.c
686*404b540aSrobert
687*404b540aSrobert*Step Six
688*404b540aSrobert     Link together the program and the DLL's library file:
689*404b540aSrobert
690*404b540aSrobert            arm-pe-gcc prog.o example.a -o prog
691*404b540aSrobert
692*404b540aSrobert   If instead this was a Thumb DLL being called from an ARM program, the
693*404b540aSrobertsteps would look like this.  (To save space only those steps that are
694*404b540aSrobertdifferent from the previous version are shown):
695*404b540aSrobert
696*404b540aSrobert*Step Two
697*404b540aSrobert     Compile the DLL source code (using the Thumb compiler):
698*404b540aSrobert
699*404b540aSrobert            thumb-pe-gcc -O2 -c dll.c -mthumb-interwork
700*404b540aSrobert
701*404b540aSrobert*Step Three
702*404b540aSrobert     Build the exports and library files (and support interworking):
703*404b540aSrobert
704*404b540aSrobert            dlltool -d example.def -z example.o -l example.a --interwork -m thumb
705*404b540aSrobert
706*404b540aSrobert*Step Five
707*404b540aSrobert     Compile the program's source code (and support interworking):
708*404b540aSrobert
709*404b540aSrobert            arm-pe-gcc -O2 -c prog.c -mthumb-interwork
710*404b540aSrobert
711*404b540aSrobert   If instead, the DLL was an old, ARM DLL which does not support
712*404b540aSrobertinterworking, and which cannot be rebuilt, then these steps would be
713*404b540aSrobertused.
714*404b540aSrobert
715*404b540aSrobert*Step One
716*404b540aSrobert     Skip.  If you do not have access to the sources of a DLL, there is
717*404b540aSrobert     no point in building a `.def' file for it.
718*404b540aSrobert
719*404b540aSrobert*Step Two
720*404b540aSrobert     Skip.  With no DLL sources there is nothing to compile.
721*404b540aSrobert
722*404b540aSrobert*Step Three
723*404b540aSrobert     Skip.  Without a `.def' file you cannot use dlltool to build an
724*404b540aSrobert     exports file or a library file.
725*404b540aSrobert
726*404b540aSrobert*Step Four
727*404b540aSrobert     Skip.  Without a set of DLL object files you cannot build the DLL.
728*404b540aSrobert     Besides it has already been built for you by somebody else.
729*404b540aSrobert
730*404b540aSrobert*Step Five
731*404b540aSrobert     Compile the program's source code, this is the same as before:
732*404b540aSrobert
733*404b540aSrobert            arm-pe-gcc -O2 -c prog.c
734*404b540aSrobert
735*404b540aSrobert*Step Six
736*404b540aSrobert     Link together the program and the DLL's library file, passing the
737*404b540aSrobert     `--support-old-code' option to the linker:
738*404b540aSrobert
739*404b540aSrobert            arm-pe-gcc prog.o example.a -Wl,--support-old-code -o prog
740*404b540aSrobert
741*404b540aSrobert     Ignore the warning message about the input file not supporting
742*404b540aSrobert     interworking as the --support-old-code switch has taken care if this.
743