1*404b540aSrobert Arm / Thumb Interworking 2*404b540aSrobert ======================== 3*404b540aSrobert 4*404b540aSrobertThe Cygnus GNU Pro Toolkit for the ARM7T processor supports function 5*404b540aSrobertcalls between code compiled for the ARM instruction set and code 6*404b540aSrobertcompiled for the Thumb instruction set and vice versa. This document 7*404b540aSrobertdescribes how that interworking support operates and explains the 8*404b540aSrobertcommand line switches that should be used in order to produce working 9*404b540aSrobertprograms. 10*404b540aSrobert 11*404b540aSrobertNote: The Cygnus GNU Pro Toolkit does not support switching between 12*404b540aSrobertcompiling for the ARM instruction set and the Thumb instruction set 13*404b540aSroberton anything other than a per file basis. There are in fact two 14*404b540aSrobertcompletely separate compilers, one that produces ARM assembler 15*404b540aSrobertinstructions and one that produces Thumb assembler instructions. The 16*404b540aSroberttwo compilers share the same assembler, linker and so on. 17*404b540aSrobert 18*404b540aSrobert 19*404b540aSrobert1. Explicit interworking support for C and C++ files 20*404b540aSrobert==================================================== 21*404b540aSrobert 22*404b540aSrobertBy default if a file is compiled without any special command line 23*404b540aSrobertswitches then the code produced will not support interworking. 24*404b540aSrobertProvided that a program is made up entirely from object files and 25*404b540aSrobertlibraries produced in this way and which contain either exclusively 26*404b540aSrobertARM instructions or exclusively Thumb instructions then this will not 27*404b540aSrobertmatter and a working executable will be created. If an attempt is 28*404b540aSrobertmade to link together mixed ARM and Thumb object files and libraries, 29*404b540aSrobertthen warning messages will be produced by the linker and a non-working 30*404b540aSrobertexecutable will be created. 31*404b540aSrobert 32*404b540aSrobertIn order to produce code which does support interworking it should be 33*404b540aSrobertcompiled with the 34*404b540aSrobert 35*404b540aSrobert -mthumb-interwork 36*404b540aSrobert 37*404b540aSrobertcommand line option. Provided that a program is made up entirely from 38*404b540aSrobertobject files and libraries built with this command line switch a 39*404b540aSrobertworking executable will be produced, even if both ARM and Thumb 40*404b540aSrobertinstructions are used by the various components of the program. (No 41*404b540aSrobertwarning messages will be produced by the linker either). 42*404b540aSrobert 43*404b540aSrobertNote that specifying -mthumb-interwork does result in slightly larger, 44*404b540aSrobertslower code being produced. This is why interworking support must be 45*404b540aSrobertspecifically enabled by a switch. 46*404b540aSrobert 47*404b540aSrobert 48*404b540aSrobert2. Explicit interworking support for assembler files 49*404b540aSrobert==================================================== 50*404b540aSrobert 51*404b540aSrobertIf assembler files are to be included into an interworking program 52*404b540aSrobertthen the following rules must be obeyed: 53*404b540aSrobert 54*404b540aSrobert * Any externally visible functions must return by using the BX 55*404b540aSrobert instruction. 56*404b540aSrobert 57*404b540aSrobert * Normal function calls can just use the BL instruction. The 58*404b540aSrobert linker will automatically insert code to switch between ARM 59*404b540aSrobert and Thumb modes as necessary. 60*404b540aSrobert 61*404b540aSrobert * Calls via function pointers should use the BX instruction if 62*404b540aSrobert the call is made in ARM mode: 63*404b540aSrobert 64*404b540aSrobert .code 32 65*404b540aSrobert mov lr, pc 66*404b540aSrobert bx rX 67*404b540aSrobert 68*404b540aSrobert This code sequence will not work in Thumb mode however, since 69*404b540aSrobert the mov instruction will not set the bottom bit of the lr 70*404b540aSrobert register. Instead a branch-and-link to the _call_via_rX 71*404b540aSrobert functions should be used instead: 72*404b540aSrobert 73*404b540aSrobert .code 16 74*404b540aSrobert bl _call_via_rX 75*404b540aSrobert 76*404b540aSrobert where rX is replaced by the name of the register containing 77*404b540aSrobert the function address. 78*404b540aSrobert 79*404b540aSrobert * All externally visible functions which should be entered in 80*404b540aSrobert Thumb mode must have the .thumb_func pseudo op specified just 81*404b540aSrobert before their entry point. e.g.: 82*404b540aSrobert 83*404b540aSrobert .code 16 84*404b540aSrobert .global function 85*404b540aSrobert .thumb_func 86*404b540aSrobert function: 87*404b540aSrobert ...start of function.... 88*404b540aSrobert 89*404b540aSrobert * All assembler files must be assembled with the switch 90*404b540aSrobert -mthumb-interwork specified on the command line. (If the file 91*404b540aSrobert is assembled by calling gcc it will automatically pass on the 92*404b540aSrobert -mthumb-interwork switch to the assembler, provided that it 93*404b540aSrobert was specified on the gcc command line in the first place.) 94*404b540aSrobert 95*404b540aSrobert 96*404b540aSrobert3. Support for old, non-interworking aware code. 97*404b540aSrobert================================================ 98*404b540aSrobert 99*404b540aSrobertIf it is necessary to link together code produced by an older, 100*404b540aSrobertnon-interworking aware compiler, or code produced by the new compiler 101*404b540aSrobertbut without the -mthumb-interwork command line switch specified, then 102*404b540aSrobertthere are two command line switches that can be used to support this. 103*404b540aSrobert 104*404b540aSrobertThe switch 105*404b540aSrobert 106*404b540aSrobert -mcaller-super-interworking 107*404b540aSrobert 108*404b540aSrobertwill allow calls via function pointers in Thumb mode to work, 109*404b540aSrobertregardless of whether the function pointer points to old, 110*404b540aSrobertnon-interworking aware code or not. Specifying this switch does 111*404b540aSrobertproduce slightly slower code however. 112*404b540aSrobert 113*404b540aSrobertNote: There is no switch to allow calls via function pointers in ARM 114*404b540aSrobertmode to be handled specially. Calls via function pointers from 115*404b540aSrobertinterworking aware ARM code to non-interworking aware ARM code work 116*404b540aSrobertwithout any special considerations by the compiler. Calls via 117*404b540aSrobertfunction pointers from interworking aware ARM code to non-interworking 118*404b540aSrobertaware Thumb code however will not work. (Actually under some 119*404b540aSrobertcircumstances they may work, but there are no guarantees). This is 120*404b540aSrobertbecause only the new compiler is able to produce Thumb code, and this 121*404b540aSrobertcompiler already has a command line switch to produce interworking 122*404b540aSrobertaware code. 123*404b540aSrobert 124*404b540aSrobert 125*404b540aSrobertThe switch 126*404b540aSrobert 127*404b540aSrobert -mcallee-super-interworking 128*404b540aSrobert 129*404b540aSrobertwill allow non-interworking aware ARM or Thumb code to call Thumb 130*404b540aSrobertfunctions, either directly or via function pointers. Specifying this 131*404b540aSrobertswitch does produce slightly larger, slower code however. 132*404b540aSrobert 133*404b540aSrobertNote: There is no switch to allow non-interworking aware ARM or Thumb 134*404b540aSrobertcode to call ARM functions. There is no need for any special handling 135*404b540aSrobertof calls from non-interworking aware ARM code to interworking aware 136*404b540aSrobertARM functions, they just work normally. Calls from non-interworking 137*404b540aSrobertaware Thumb functions to ARM code however, will not work. There is no 138*404b540aSrobertoption to support this, since it is always possible to recompile the 139*404b540aSrobertThumb code to be interworking aware. 140*404b540aSrobert 141*404b540aSrobertAs an alternative to the command line switch 142*404b540aSrobert-mcallee-super-interworking, which affects all externally visible 143*404b540aSrobertfunctions in a file, it is possible to specify an attribute or 144*404b540aSrobertdeclspec for individual functions, indicating that that particular 145*404b540aSrobertfunction should support being called by non-interworking aware code. 146*404b540aSrobertThe function should be defined like this: 147*404b540aSrobert 148*404b540aSrobert int __attribute__((interfacearm)) function 149*404b540aSrobert { 150*404b540aSrobert ... body of function ... 151*404b540aSrobert } 152*404b540aSrobert 153*404b540aSrobertor 154*404b540aSrobert 155*404b540aSrobert int __declspec(interfacearm) function 156*404b540aSrobert { 157*404b540aSrobert ... body of function ... 158*404b540aSrobert } 159*404b540aSrobert 160*404b540aSrobert 161*404b540aSrobert 162*404b540aSrobert4. Interworking support in dlltool 163*404b540aSrobert================================== 164*404b540aSrobert 165*404b540aSrobertIt is possible to create DLLs containing mixed ARM and Thumb code. It 166*404b540aSrobertis also possible to call Thumb code in a DLL from an ARM program and 167*404b540aSrobertvice versa. It is even possible to call ARM DLLs that have been compiled 168*404b540aSrobertwithout interworking support (say by an older version of the compiler), 169*404b540aSrobertfrom Thumb programs and still have things work properly. 170*404b540aSrobert 171*404b540aSrobert A version of the `dlltool' program which supports the `--interwork' 172*404b540aSrobertcommand line switch is needed, as well as the following special 173*404b540aSrobertconsiderations when building programs and DLLs: 174*404b540aSrobert 175*404b540aSrobert*Use `-mthumb-interwork'* 176*404b540aSrobert When compiling files for a DLL or a program the `-mthumb-interwork' 177*404b540aSrobert command line switch should be specified if calling between ARM and 178*404b540aSrobert Thumb code can happen. If a program is being compiled and the 179*404b540aSrobert mode of the DLLs that it uses is not known, then it should be 180*404b540aSrobert assumed that interworking might occur and the switch used. 181*404b540aSrobert 182*404b540aSrobert*Use `-m thumb'* 183*404b540aSrobert If the exported functions from a DLL are all Thumb encoded then the 184*404b540aSrobert `-m thumb' command line switch should be given to dlltool when 185*404b540aSrobert building the stubs. This will make dlltool create Thumb encoded 186*404b540aSrobert stubs, rather than its default of ARM encoded stubs. 187*404b540aSrobert 188*404b540aSrobert If the DLL consists of both exported Thumb functions and exported 189*404b540aSrobert ARM functions then the `-m thumb' switch should not be used. 190*404b540aSrobert Instead the Thumb functions in the DLL should be compiled with the 191*404b540aSrobert `-mcallee-super-interworking' switch, or with the `interfacearm' 192*404b540aSrobert attribute specified on their prototypes. In this way they will be 193*404b540aSrobert given ARM encoded prologues, which will work with the ARM encoded 194*404b540aSrobert stubs produced by dlltool. 195*404b540aSrobert 196*404b540aSrobert*Use `-mcaller-super-interworking'* 197*404b540aSrobert If it is possible for Thumb functions in a DLL to call 198*404b540aSrobert non-interworking aware code via a function pointer, then the Thumb 199*404b540aSrobert code must be compiled with the `-mcaller-super-interworking' 200*404b540aSrobert command line switch. This will force the function pointer calls 201*404b540aSrobert to use the _interwork_call_via_rX stub functions which will 202*404b540aSrobert correctly restore Thumb mode upon return from the called function. 203*404b540aSrobert 204*404b540aSrobert*Link with `libgcc.a'* 205*404b540aSrobert When the dll is built it may have to be linked with the GCC 206*404b540aSrobert library (`libgcc.a') in order to extract the _call_via_rX functions 207*404b540aSrobert or the _interwork_call_via_rX functions. This represents a partial 208*404b540aSrobert redundancy since the same functions *may* be present in the 209*404b540aSrobert application itself, but since they only take up 372 bytes this 210*404b540aSrobert should not be too much of a consideration. 211*404b540aSrobert 212*404b540aSrobert*Use `--support-old-code'* 213*404b540aSrobert When linking a program with an old DLL which does not support 214*404b540aSrobert interworking, the `--support-old-code' command line switch to the 215*404b540aSrobert linker should be used. This causes the linker to generate special 216*404b540aSrobert interworking stubs which can cope with old, non-interworking aware 217*404b540aSrobert ARM code, at the cost of generating bulkier code. The linker will 218*404b540aSrobert still generate a warning message along the lines of: 219*404b540aSrobert "Warning: input file XXX does not support interworking, whereas YYY does." 220*404b540aSrobert but this can now be ignored because the --support-old-code switch 221*404b540aSrobert has been used. 222*404b540aSrobert 223*404b540aSrobert 224*404b540aSrobert 225*404b540aSrobert5. How interworking support works 226*404b540aSrobert================================= 227*404b540aSrobert 228*404b540aSrobertSwitching between the ARM and Thumb instruction sets is accomplished 229*404b540aSrobertvia the BX instruction which takes as an argument a register name. 230*404b540aSrobertControl is transfered to the address held in this register (with the 231*404b540aSrobertbottom bit masked out), and if the bottom bit is set, then Thumb 232*404b540aSrobertinstruction processing is enabled, otherwise ARM instruction 233*404b540aSrobertprocessing is enabled. 234*404b540aSrobert 235*404b540aSrobertWhen the -mthumb-interwork command line switch is specified, gcc 236*404b540aSrobertarranges for all functions to return to their caller by using the BX 237*404b540aSrobertinstruction. Thus provided that the return address has the bottom bit 238*404b540aSrobertcorrectly initialized to indicate the instruction set of the caller, 239*404b540aSrobertcorrect operation will ensue. 240*404b540aSrobert 241*404b540aSrobertWhen a function is called explicitly (rather than via a function 242*404b540aSrobertpointer), the compiler generates a BL instruction to do this. The 243*404b540aSrobertThumb version of the BL instruction has the special property of 244*404b540aSrobertsetting the bottom bit of the LR register after it has stored the 245*404b540aSrobertreturn address into it, so that a future BX instruction will correctly 246*404b540aSrobertreturn the instruction after the BL instruction, in Thumb mode. 247*404b540aSrobert 248*404b540aSrobertThe BL instruction does not change modes itself however, so if an ARM 249*404b540aSrobertfunction is calling a Thumb function, or vice versa, it is necessary 250*404b540aSrobertto generate some extra instructions to handle this. This is done in 251*404b540aSrobertthe linker when it is storing the address of the referenced function 252*404b540aSrobertinto the BL instruction. If the BL instruction is an ARM style BL 253*404b540aSrobertinstruction, but the referenced function is a Thumb function, then the 254*404b540aSrobertlinker automatically generates a calling stub that converts from ARM 255*404b540aSrobertmode to Thumb mode, puts the address of this stub into the BL 256*404b540aSrobertinstruction, and puts the address of the referenced function into the 257*404b540aSrobertstub. Similarly if the BL instruction is a Thumb BL instruction, and 258*404b540aSrobertthe referenced function is an ARM function, the linker generates a 259*404b540aSrobertstub which converts from Thumb to ARM mode, puts the address of this 260*404b540aSrobertstub into the BL instruction, and the address of the referenced 261*404b540aSrobertfunction into the stub. 262*404b540aSrobert 263*404b540aSrobertThis is why it is necessary to mark Thumb functions with the 264*404b540aSrobert.thumb_func pseudo op when creating assembler files. This pseudo op 265*404b540aSrobertallows the assembler to distinguish between ARM functions and Thumb 266*404b540aSrobertfunctions. (The Thumb version of GCC automatically generates these 267*404b540aSrobertpseudo ops for any Thumb functions that it generates). 268*404b540aSrobert 269*404b540aSrobertCalls via function pointers work differently. Whenever the address of 270*404b540aSroberta function is taken, the linker examines the type of the function 271*404b540aSrobertbeing referenced. If the function is a Thumb function, then it sets 272*404b540aSrobertthe bottom bit of the address. Technically this makes the address 273*404b540aSrobertincorrect, since it is now one byte into the start of the function, 274*404b540aSrobertbut this is never a problem because: 275*404b540aSrobert 276*404b540aSrobert a. with interworking enabled all calls via function pointer 277*404b540aSrobert are done using the BX instruction and this ignores the 278*404b540aSrobert bottom bit when computing where to go to. 279*404b540aSrobert 280*404b540aSrobert b. the linker will always set the bottom bit when the address 281*404b540aSrobert of the function is taken, so it is never possible to take 282*404b540aSrobert the address of the function in two different places and 283*404b540aSrobert then compare them and find that they are not equal. 284*404b540aSrobert 285*404b540aSrobertAs already mentioned any call via a function pointer will use the BX 286*404b540aSrobertinstruction (provided that interworking is enabled). The only problem 287*404b540aSrobertwith this is computing the return address for the return from the 288*404b540aSrobertcalled function. For ARM code this can easily be done by the code 289*404b540aSrobertsequence: 290*404b540aSrobert 291*404b540aSrobert mov lr, pc 292*404b540aSrobert bx rX 293*404b540aSrobert 294*404b540aSrobert(where rX is the name of the register containing the function 295*404b540aSrobertpointer). This code does not work for the Thumb instruction set, 296*404b540aSrobertsince the MOV instruction will not set the bottom bit of the LR 297*404b540aSrobertregister, so that when the called function returns, it will return in 298*404b540aSrobertARM mode not Thumb mode. Instead the compiler generates this 299*404b540aSrobertsequence: 300*404b540aSrobert 301*404b540aSrobert bl _call_via_rX 302*404b540aSrobert 303*404b540aSrobert(again where rX is the name if the register containing the function 304*404b540aSrobertpointer). The special call_via_rX functions look like this: 305*404b540aSrobert 306*404b540aSrobert .thumb_func 307*404b540aSrobert_call_via_r0: 308*404b540aSrobert bx r0 309*404b540aSrobert nop 310*404b540aSrobert 311*404b540aSrobertThe BL instruction ensures that the correct return address is stored 312*404b540aSrobertin the LR register and then the BX instruction jumps to the address 313*404b540aSrobertstored in the function pointer, switch modes if necessary. 314*404b540aSrobert 315*404b540aSrobert 316*404b540aSrobert6. How caller-super-interworking support works 317*404b540aSrobert============================================== 318*404b540aSrobert 319*404b540aSrobertWhen the -mcaller-super-interworking command line switch is specified 320*404b540aSrobertit changes the code produced by the Thumb compiler so that all calls 321*404b540aSrobertvia function pointers (including virtual function calls) now go via a 322*404b540aSrobertdifferent stub function. The code to call via a function pointer now 323*404b540aSrobertlooks like this: 324*404b540aSrobert 325*404b540aSrobert bl _interwork_call_via_r0 326*404b540aSrobert 327*404b540aSrobertNote: The compiler does not insist that r0 be used to hold the 328*404b540aSrobertfunction address. Any register will do, and there are a suite of stub 329*404b540aSrobertfunctions, one for each possible register. The stub functions look 330*404b540aSrobertlike this: 331*404b540aSrobert 332*404b540aSrobert .code 16 333*404b540aSrobert .thumb_func 334*404b540aSrobert_interwork_call_via_r0 335*404b540aSrobert bx pc 336*404b540aSrobert nop 337*404b540aSrobert 338*404b540aSrobert .code 32 339*404b540aSrobert tst r0, #1 340*404b540aSrobert stmeqdb r13!, {lr} 341*404b540aSrobert adreq lr, _arm_return 342*404b540aSrobert bx r0 343*404b540aSrobert 344*404b540aSrobertThe stub first switches to ARM mode, since it is a lot easier to 345*404b540aSrobertperform the necessary operations using ARM instructions. It then 346*404b540aSroberttests the bottom bit of the register containing the address of the 347*404b540aSrobertfunction to be called. If this bottom bit is set then the function 348*404b540aSrobertbeing called uses Thumb instructions and the BX instruction to come 349*404b540aSrobertwill switch back into Thumb mode before calling this function. (Note 350*404b540aSrobertthat it does not matter how this called function chooses to return to 351*404b540aSrobertits caller, since the both the caller and callee are Thumb functions, 352*404b540aSrobertand mode switching is necessary). If the function being called is an 353*404b540aSrobertARM mode function however, the stub pushes the return address (with 354*404b540aSrobertits bottom bit set) onto the stack, replaces the return address with 355*404b540aSrobertthe address of the a piece of code called '_arm_return' and then 356*404b540aSrobertperforms a BX instruction to call the function. 357*404b540aSrobert 358*404b540aSrobertThe '_arm_return' code looks like this: 359*404b540aSrobert 360*404b540aSrobert .code 32 361*404b540aSrobert_arm_return: 362*404b540aSrobert ldmia r13!, {r12} 363*404b540aSrobert bx r12 364*404b540aSrobert .code 16 365*404b540aSrobert 366*404b540aSrobert 367*404b540aSrobertIt simply retrieves the return address from the stack, and then 368*404b540aSrobertperforms a BX operation to return to the caller and switch back into 369*404b540aSrobertThumb mode. 370*404b540aSrobert 371*404b540aSrobert 372*404b540aSrobert7. How callee-super-interworking support works 373*404b540aSrobert============================================== 374*404b540aSrobert 375*404b540aSrobertWhen -mcallee-super-interworking is specified on the command line the 376*404b540aSrobertThumb compiler behaves as if every externally visible function that it 377*404b540aSrobertcompiles has had the (interfacearm) attribute specified for it. What 378*404b540aSrobertthis attribute does is to put a special, ARM mode header onto the 379*404b540aSrobertfunction which forces a switch into Thumb mode: 380*404b540aSrobert 381*404b540aSrobert without __attribute__((interfacearm)): 382*404b540aSrobert 383*404b540aSrobert .code 16 384*404b540aSrobert .thumb_func 385*404b540aSrobert function: 386*404b540aSrobert ... start of function ... 387*404b540aSrobert 388*404b540aSrobert with __attribute__((interfacearm)): 389*404b540aSrobert 390*404b540aSrobert .code 32 391*404b540aSrobert function: 392*404b540aSrobert orr r12, pc, #1 393*404b540aSrobert bx r12 394*404b540aSrobert 395*404b540aSrobert .code 16 396*404b540aSrobert .thumb_func 397*404b540aSrobert .real_start_of_function: 398*404b540aSrobert 399*404b540aSrobert ... start of function ... 400*404b540aSrobert 401*404b540aSrobertNote that since the function now expects to be entered in ARM mode, it 402*404b540aSrobertno longer has the .thumb_func pseudo op specified for its name. 403*404b540aSrobertInstead the pseudo op is attached to a new label .real_start_of_<name> 404*404b540aSrobert(where <name> is the name of the function) which indicates the start 405*404b540aSrobertof the Thumb code. This does have the interesting side effect in that 406*404b540aSrobertif this function is now called from a Thumb mode piece of code 407*404b540aSrobertoutside of the current file, the linker will generate a calling stub 408*404b540aSrobertto switch from Thumb mode into ARM mode, and then this is immediately 409*404b540aSrobertoverridden by the function's header which switches back into Thumb 410*404b540aSrobertmode. 411*404b540aSrobert 412*404b540aSrobertIn addition the (interfacearm) attribute also forces the function to 413*404b540aSrobertreturn by using the BX instruction, even if has not been compiled with 414*404b540aSrobertthe -mthumb-interwork command line flag, so that the correct mode will 415*404b540aSrobertbe restored upon exit from the function. 416*404b540aSrobert 417*404b540aSrobert 418*404b540aSrobert8. Some examples 419*404b540aSrobert================ 420*404b540aSrobert 421*404b540aSrobert Given these two test files: 422*404b540aSrobert 423*404b540aSrobert int arm (void) { return 1 + thumb (); } 424*404b540aSrobert 425*404b540aSrobert int thumb (void) { return 2 + arm (); } 426*404b540aSrobert 427*404b540aSrobert The following pieces of assembler are produced by the ARM and Thumb 428*404b540aSrobertversion of GCC depending upon the command line options used: 429*404b540aSrobert 430*404b540aSrobert `-O2': 431*404b540aSrobert .code 32 .code 16 432*404b540aSrobert .global _arm .global _thumb 433*404b540aSrobert .thumb_func 434*404b540aSrobert _arm: _thumb: 435*404b540aSrobert mov ip, sp 436*404b540aSrobert stmfd sp!, {fp, ip, lr, pc} push {lr} 437*404b540aSrobert sub fp, ip, #4 438*404b540aSrobert bl _thumb bl _arm 439*404b540aSrobert add r0, r0, #1 add r0, r0, #2 440*404b540aSrobert ldmea fp, {fp, sp, pc} pop {pc} 441*404b540aSrobert 442*404b540aSrobert Note how the functions return without using the BX instruction. If 443*404b540aSrobertthese files were assembled and linked together they would fail to work 444*404b540aSrobertbecause they do not change mode when returning to their caller. 445*404b540aSrobert 446*404b540aSrobert `-O2 -mthumb-interwork': 447*404b540aSrobert 448*404b540aSrobert .code 32 .code 16 449*404b540aSrobert .global _arm .global _thumb 450*404b540aSrobert .thumb_func 451*404b540aSrobert _arm: _thumb: 452*404b540aSrobert mov ip, sp 453*404b540aSrobert stmfd sp!, {fp, ip, lr, pc} push {lr} 454*404b540aSrobert sub fp, ip, #4 455*404b540aSrobert bl _thumb bl _arm 456*404b540aSrobert add r0, r0, #1 add r0, r0, #2 457*404b540aSrobert ldmea fp, {fp, sp, lr} pop {r1} 458*404b540aSrobert bx lr bx r1 459*404b540aSrobert 460*404b540aSrobert Now the functions use BX to return their caller. They have grown by 461*404b540aSrobert4 and 2 bytes respectively, but they can now successfully be linked 462*404b540aSroberttogether and be expect to work. The linker will replace the 463*404b540aSrobertdestinations of the two BL instructions with the addresses of calling 464*404b540aSrobertstubs which convert to the correct mode before jumping to the called 465*404b540aSrobertfunction. 466*404b540aSrobert 467*404b540aSrobert `-O2 -mcallee-super-interworking': 468*404b540aSrobert 469*404b540aSrobert .code 32 .code 32 470*404b540aSrobert .global _arm .global _thumb 471*404b540aSrobert _arm: _thumb: 472*404b540aSrobert orr r12, pc, #1 473*404b540aSrobert bx r12 474*404b540aSrobert mov ip, sp .code 16 475*404b540aSrobert stmfd sp!, {fp, ip, lr, pc} push {lr} 476*404b540aSrobert sub fp, ip, #4 477*404b540aSrobert bl _thumb bl _arm 478*404b540aSrobert add r0, r0, #1 add r0, r0, #2 479*404b540aSrobert ldmea fp, {fp, sp, lr} pop {r1} 480*404b540aSrobert bx lr bx r1 481*404b540aSrobert 482*404b540aSrobert The thumb function now has an ARM encoded prologue, and it no longer 483*404b540aSroberthas the `.thumb-func' pseudo op attached to it. The linker will not 484*404b540aSrobertgenerate a calling stub for the call from arm() to thumb(), but it will 485*404b540aSrobertstill have to generate a stub for the call from thumb() to arm(). Also 486*404b540aSrobertnote how specifying `--mcallee-super-interworking' automatically 487*404b540aSrobertimplies `-mthumb-interworking'. 488*404b540aSrobert 489*404b540aSrobert 490*404b540aSrobert9. Some Function Pointer Examples 491*404b540aSrobert================================= 492*404b540aSrobert 493*404b540aSrobert Given this test file: 494*404b540aSrobert 495*404b540aSrobert int func (void) { return 1; } 496*404b540aSrobert 497*404b540aSrobert int call (int (* ptr)(void)) { return ptr (); } 498*404b540aSrobert 499*404b540aSrobert The following varying pieces of assembler are produced by the Thumb 500*404b540aSrobertversion of GCC depending upon the command line options used: 501*404b540aSrobert 502*404b540aSrobert `-O2': 503*404b540aSrobert .code 16 504*404b540aSrobert .globl _func 505*404b540aSrobert .thumb_func 506*404b540aSrobert _func: 507*404b540aSrobert mov r0, #1 508*404b540aSrobert bx lr 509*404b540aSrobert 510*404b540aSrobert .globl _call 511*404b540aSrobert .thumb_func 512*404b540aSrobert _call: 513*404b540aSrobert push {lr} 514*404b540aSrobert bl __call_via_r0 515*404b540aSrobert pop {pc} 516*404b540aSrobert 517*404b540aSrobert Note how the two functions have different exit sequences. In 518*404b540aSrobertparticular call() uses pop {pc} to return, which would not work if the 519*404b540aSrobertcaller was in ARM mode. func() however, uses the BX instruction, even 520*404b540aSrobertthough `-mthumb-interwork' has not been specified, as this is the most 521*404b540aSrobertefficient way to exit a function when the return address is held in the 522*404b540aSrobertlink register. 523*404b540aSrobert 524*404b540aSrobert `-O2 -mthumb-interwork': 525*404b540aSrobert 526*404b540aSrobert .code 16 527*404b540aSrobert .globl _func 528*404b540aSrobert .thumb_func 529*404b540aSrobert _func: 530*404b540aSrobert mov r0, #1 531*404b540aSrobert bx lr 532*404b540aSrobert 533*404b540aSrobert .globl _call 534*404b540aSrobert .thumb_func 535*404b540aSrobert _call: 536*404b540aSrobert push {lr} 537*404b540aSrobert bl __call_via_r0 538*404b540aSrobert pop {r1} 539*404b540aSrobert bx r1 540*404b540aSrobert 541*404b540aSrobert This time both functions return by using the BX instruction. This 542*404b540aSrobertmeans that call() is now two bytes longer and several cycles slower 543*404b540aSrobertthan the previous version. 544*404b540aSrobert 545*404b540aSrobert `-O2 -mcaller-super-interworking': 546*404b540aSrobert .code 16 547*404b540aSrobert .globl _func 548*404b540aSrobert .thumb_func 549*404b540aSrobert _func: 550*404b540aSrobert mov r0, #1 551*404b540aSrobert bx lr 552*404b540aSrobert 553*404b540aSrobert .globl _call 554*404b540aSrobert .thumb_func 555*404b540aSrobert _call: 556*404b540aSrobert push {lr} 557*404b540aSrobert bl __interwork_call_via_r0 558*404b540aSrobert pop {pc} 559*404b540aSrobert 560*404b540aSrobert Very similar to the first (non-interworking) version, except that a 561*404b540aSrobertdifferent stub is used to call via the function pointer. This new stub 562*404b540aSrobertwill work even if the called function is not interworking aware, and 563*404b540aSroberttries to return to call() in ARM mode. Note that the assembly code for 564*404b540aSrobertcall() is still not interworking aware itself, and so should not be 565*404b540aSrobertcalled from ARM code. 566*404b540aSrobert 567*404b540aSrobert `-O2 -mcallee-super-interworking': 568*404b540aSrobert 569*404b540aSrobert .code 32 570*404b540aSrobert .globl _func 571*404b540aSrobert _func: 572*404b540aSrobert orr r12, pc, #1 573*404b540aSrobert bx r12 574*404b540aSrobert 575*404b540aSrobert .code 16 576*404b540aSrobert .globl .real_start_of_func 577*404b540aSrobert .thumb_func 578*404b540aSrobert .real_start_of_func: 579*404b540aSrobert mov r0, #1 580*404b540aSrobert bx lr 581*404b540aSrobert 582*404b540aSrobert .code 32 583*404b540aSrobert .globl _call 584*404b540aSrobert _call: 585*404b540aSrobert orr r12, pc, #1 586*404b540aSrobert bx r12 587*404b540aSrobert 588*404b540aSrobert .code 16 589*404b540aSrobert .globl .real_start_of_call 590*404b540aSrobert .thumb_func 591*404b540aSrobert .real_start_of_call: 592*404b540aSrobert push {lr} 593*404b540aSrobert bl __call_via_r0 594*404b540aSrobert pop {r1} 595*404b540aSrobert bx r1 596*404b540aSrobert 597*404b540aSrobert Now both functions have an ARM coded prologue, and both functions 598*404b540aSrobertreturn by using the BX instruction. These functions are interworking 599*404b540aSrobertaware therefore and can safely be called from ARM code. The code for 600*404b540aSrobertthe call() function is now 10 bytes longer than the original, non 601*404b540aSrobertinterworking aware version, an increase of over 200%. 602*404b540aSrobert 603*404b540aSrobert If a prototype for call() is added to the source code, and this 604*404b540aSrobertprototype includes the `interfacearm' attribute: 605*404b540aSrobert 606*404b540aSrobert int __attribute__((interfacearm)) call (int (* ptr)(void)); 607*404b540aSrobert 608*404b540aSrobert then this code is produced (with only -O2 specified on the command 609*404b540aSrobertline): 610*404b540aSrobert 611*404b540aSrobert .code 16 612*404b540aSrobert .globl _func 613*404b540aSrobert .thumb_func 614*404b540aSrobert _func: 615*404b540aSrobert mov r0, #1 616*404b540aSrobert bx lr 617*404b540aSrobert 618*404b540aSrobert .globl _call 619*404b540aSrobert .code 32 620*404b540aSrobert _call: 621*404b540aSrobert orr r12, pc, #1 622*404b540aSrobert bx r12 623*404b540aSrobert 624*404b540aSrobert .code 16 625*404b540aSrobert .globl .real_start_of_call 626*404b540aSrobert .thumb_func 627*404b540aSrobert .real_start_of_call: 628*404b540aSrobert push {lr} 629*404b540aSrobert bl __call_via_r0 630*404b540aSrobert pop {r1} 631*404b540aSrobert bx r1 632*404b540aSrobert 633*404b540aSrobert So now both call() and func() can be safely called via 634*404b540aSrobertnon-interworking aware ARM code. If, when such a file is assembled, 635*404b540aSrobertthe assembler detects the fact that call() is being called by another 636*404b540aSrobertfunction in the same file, it will automatically adjust the target of 637*404b540aSrobertthe BL instruction to point to .real_start_of_call. In this way there 638*404b540aSrobertis no need for the linker to generate a Thumb-to-ARM calling stub so 639*404b540aSrobertthat call can be entered in ARM mode. 640*404b540aSrobert 641*404b540aSrobert 642*404b540aSrobert10. How to use dlltool to build ARM/Thumb DLLs 643*404b540aSrobert============================================== 644*404b540aSrobert Given a program (`prog.c') like this: 645*404b540aSrobert 646*404b540aSrobert extern int func_in_dll (void); 647*404b540aSrobert 648*404b540aSrobert int main (void) { return func_in_dll(); } 649*404b540aSrobert 650*404b540aSrobert And a DLL source file (`dll.c') like this: 651*404b540aSrobert 652*404b540aSrobert int func_in_dll (void) { return 1; } 653*404b540aSrobert 654*404b540aSrobert Here is how to build the DLL and the program for a purely ARM based 655*404b540aSrobertenvironment: 656*404b540aSrobert 657*404b540aSrobert*Step One 658*404b540aSrobert Build a `.def' file describing the DLL: 659*404b540aSrobert 660*404b540aSrobert ; example.def 661*404b540aSrobert ; This file describes the contents of the DLL 662*404b540aSrobert LIBRARY example 663*404b540aSrobert HEAPSIZE 0x40000, 0x2000 664*404b540aSrobert EXPORTS 665*404b540aSrobert func_in_dll 1 666*404b540aSrobert 667*404b540aSrobert*Step Two 668*404b540aSrobert Compile the DLL source code: 669*404b540aSrobert 670*404b540aSrobert arm-pe-gcc -O2 -c dll.c 671*404b540aSrobert 672*404b540aSrobert*Step Three 673*404b540aSrobert Use `dlltool' to create an exports file and a library file: 674*404b540aSrobert 675*404b540aSrobert dlltool --def example.def --output-exp example.o --output-lib example.a 676*404b540aSrobert 677*404b540aSrobert*Step Four 678*404b540aSrobert Link together the complete DLL: 679*404b540aSrobert 680*404b540aSrobert arm-pe-ld dll.o example.o -o example.dll 681*404b540aSrobert 682*404b540aSrobert*Step Five 683*404b540aSrobert Compile the program's source code: 684*404b540aSrobert 685*404b540aSrobert arm-pe-gcc -O2 -c prog.c 686*404b540aSrobert 687*404b540aSrobert*Step Six 688*404b540aSrobert Link together the program and the DLL's library file: 689*404b540aSrobert 690*404b540aSrobert arm-pe-gcc prog.o example.a -o prog 691*404b540aSrobert 692*404b540aSrobert If instead this was a Thumb DLL being called from an ARM program, the 693*404b540aSrobertsteps would look like this. (To save space only those steps that are 694*404b540aSrobertdifferent from the previous version are shown): 695*404b540aSrobert 696*404b540aSrobert*Step Two 697*404b540aSrobert Compile the DLL source code (using the Thumb compiler): 698*404b540aSrobert 699*404b540aSrobert thumb-pe-gcc -O2 -c dll.c -mthumb-interwork 700*404b540aSrobert 701*404b540aSrobert*Step Three 702*404b540aSrobert Build the exports and library files (and support interworking): 703*404b540aSrobert 704*404b540aSrobert dlltool -d example.def -z example.o -l example.a --interwork -m thumb 705*404b540aSrobert 706*404b540aSrobert*Step Five 707*404b540aSrobert Compile the program's source code (and support interworking): 708*404b540aSrobert 709*404b540aSrobert arm-pe-gcc -O2 -c prog.c -mthumb-interwork 710*404b540aSrobert 711*404b540aSrobert If instead, the DLL was an old, ARM DLL which does not support 712*404b540aSrobertinterworking, and which cannot be rebuilt, then these steps would be 713*404b540aSrobertused. 714*404b540aSrobert 715*404b540aSrobert*Step One 716*404b540aSrobert Skip. If you do not have access to the sources of a DLL, there is 717*404b540aSrobert no point in building a `.def' file for it. 718*404b540aSrobert 719*404b540aSrobert*Step Two 720*404b540aSrobert Skip. With no DLL sources there is nothing to compile. 721*404b540aSrobert 722*404b540aSrobert*Step Three 723*404b540aSrobert Skip. Without a `.def' file you cannot use dlltool to build an 724*404b540aSrobert exports file or a library file. 725*404b540aSrobert 726*404b540aSrobert*Step Four 727*404b540aSrobert Skip. Without a set of DLL object files you cannot build the DLL. 728*404b540aSrobert Besides it has already been built for you by somebody else. 729*404b540aSrobert 730*404b540aSrobert*Step Five 731*404b540aSrobert Compile the program's source code, this is the same as before: 732*404b540aSrobert 733*404b540aSrobert arm-pe-gcc -O2 -c prog.c 734*404b540aSrobert 735*404b540aSrobert*Step Six 736*404b540aSrobert Link together the program and the DLL's library file, passing the 737*404b540aSrobert `--support-old-code' option to the linker: 738*404b540aSrobert 739*404b540aSrobert arm-pe-gcc prog.o example.a -Wl,--support-old-code -o prog 740*404b540aSrobert 741*404b540aSrobert Ignore the warning message about the input file not supporting 742*404b540aSrobert interworking as the --support-old-code switch has taken care if this. 743