1@c Copyright 1991, 1992, 1993, 1994, 1995, 1997, 1998, 1999, 2000, 2@c 2001, 2003, 2004 3@c Free Software Foundation, Inc. 4@c This is part of the GAS manual. 5@c For copying conditions, see the file as.texinfo. 6@ifset GENERIC 7@page 8@node i386-Dependent 9@chapter 80386 Dependent Features 10@end ifset 11@ifclear GENERIC 12@node Machine Dependencies 13@chapter 80386 Dependent Features 14@end ifclear 15 16@cindex i386 support 17@cindex i80306 support 18@cindex x86-64 support 19 20The i386 version @code{@value{AS}} supports both the original Intel 386 21architecture in both 16 and 32-bit mode as well as AMD x86-64 architecture 22extending the Intel architecture to 64-bits. 23 24@menu 25* i386-Options:: Options 26* i386-Syntax:: AT&T Syntax versus Intel Syntax 27* i386-Mnemonics:: Instruction Naming 28* i386-Regs:: Register Naming 29* i386-Prefixes:: Instruction Prefixes 30* i386-Memory:: Memory References 31* i386-Jumps:: Handling of Jump Instructions 32* i386-Float:: Floating Point 33* i386-SIMD:: Intel's MMX and AMD's 3DNow! SIMD Operations 34* i386-16bit:: Writing 16-bit Code 35* i386-Arch:: Specifying an x86 CPU architecture 36* i386-Bugs:: AT&T Syntax bugs 37* i386-Notes:: Notes 38@end menu 39 40@node i386-Options 41@section Options 42 43@cindex options for i386 44@cindex options for x86-64 45@cindex i386 options 46@cindex x86-64 options 47 48The i386 version of @code{@value{AS}} has a few machine 49dependent options: 50 51@table @code 52@cindex @samp{--32} option, i386 53@cindex @samp{--32} option, x86-64 54@cindex @samp{--64} option, i386 55@cindex @samp{--64} option, x86-64 56@item --32 | --64 57Select the word size, either 32 bits or 64 bits. Selecting 32-bit 58implies Intel i386 architecture, while 64-bit implies AMD x86-64 59architecture. 60 61These options are only available with the ELF object file format, and 62require that the necessary BFD support has been included (on a 32-bit 63platform you have to add --enable-64-bit-bfd to configure enable 64-bit 64usage and use x86-64 as target platform). 65 66@item -n 67By default, x86 GAS replaces multiple nop instructions used for 68alignment within code sections with multi-byte nop instructions such 69as leal 0(%esi,1),%esi. This switch disables the optimization. 70 71@cindex @samp{--divide} option, i386 72@item --divide 73On SVR4-derived platforms, the character @samp{/} is treated as a comment 74character, which means that it cannot be used in expressions. The 75@samp{--divide} option turns @samp{/} into a normal character. This does 76not disable @samp{/} at the beginning of a line starting a comment, or 77affect using @samp{#} for starting a comment. 78 79@end table 80 81@node i386-Syntax 82@section AT&T Syntax versus Intel Syntax 83 84@cindex i386 intel_syntax pseudo op 85@cindex intel_syntax pseudo op, i386 86@cindex i386 att_syntax pseudo op 87@cindex att_syntax pseudo op, i386 88@cindex i386 syntax compatibility 89@cindex syntax compatibility, i386 90@cindex x86-64 intel_syntax pseudo op 91@cindex intel_syntax pseudo op, x86-64 92@cindex x86-64 att_syntax pseudo op 93@cindex att_syntax pseudo op, x86-64 94@cindex x86-64 syntax compatibility 95@cindex syntax compatibility, x86-64 96 97@code{@value{AS}} now supports assembly using Intel assembler syntax. 98@code{.intel_syntax} selects Intel mode, and @code{.att_syntax} switches 99back to the usual AT&T mode for compatibility with the output of 100@code{@value{GCC}}. Either of these directives may have an optional 101argument, @code{prefix}, or @code{noprefix} specifying whether registers 102require a @samp{%} prefix. AT&T System V/386 assembler syntax is quite 103different from Intel syntax. We mention these differences because 104almost all 80386 documents use Intel syntax. Notable differences 105between the two syntaxes are: 106 107@cindex immediate operands, i386 108@cindex i386 immediate operands 109@cindex register operands, i386 110@cindex i386 register operands 111@cindex jump/call operands, i386 112@cindex i386 jump/call operands 113@cindex operand delimiters, i386 114 115@cindex immediate operands, x86-64 116@cindex x86-64 immediate operands 117@cindex register operands, x86-64 118@cindex x86-64 register operands 119@cindex jump/call operands, x86-64 120@cindex x86-64 jump/call operands 121@cindex operand delimiters, x86-64 122@itemize @bullet 123@item 124AT&T immediate operands are preceded by @samp{$}; Intel immediate 125operands are undelimited (Intel @samp{push 4} is AT&T @samp{pushl $4}). 126AT&T register operands are preceded by @samp{%}; Intel register operands 127are undelimited. AT&T absolute (as opposed to PC relative) jump/call 128operands are prefixed by @samp{*}; they are undelimited in Intel syntax. 129 130@cindex i386 source, destination operands 131@cindex source, destination operands; i386 132@cindex x86-64 source, destination operands 133@cindex source, destination operands; x86-64 134@item 135AT&T and Intel syntax use the opposite order for source and destination 136operands. Intel @samp{add eax, 4} is @samp{addl $4, %eax}. The 137@samp{source, dest} convention is maintained for compatibility with 138previous Unix assemblers. Note that instructions with more than one 139source operand, such as the @samp{enter} instruction, do @emph{not} have 140reversed order. @ref{i386-Bugs}. 141 142@cindex mnemonic suffixes, i386 143@cindex sizes operands, i386 144@cindex i386 size suffixes 145@cindex mnemonic suffixes, x86-64 146@cindex sizes operands, x86-64 147@cindex x86-64 size suffixes 148@item 149In AT&T syntax the size of memory operands is determined from the last 150character of the instruction mnemonic. Mnemonic suffixes of @samp{b}, 151@samp{w}, @samp{l} and @samp{q} specify byte (8-bit), word (16-bit), long 152(32-bit) and quadruple word (64-bit) memory references. Intel syntax accomplishes 153this by prefixing memory operands (@emph{not} the instruction mnemonics) with 154@samp{byte ptr}, @samp{word ptr}, @samp{dword ptr} and @samp{qword ptr}. Thus, 155Intel @samp{mov al, byte ptr @var{foo}} is @samp{movb @var{foo}, %al} in AT&T 156syntax. 157 158@cindex return instructions, i386 159@cindex i386 jump, call, return 160@cindex return instructions, x86-64 161@cindex x86-64 jump, call, return 162@item 163Immediate form long jumps and calls are 164@samp{lcall/ljmp $@var{section}, $@var{offset}} in AT&T syntax; the 165Intel syntax is 166@samp{call/jmp far @var{section}:@var{offset}}. Also, the far return 167instruction 168is @samp{lret $@var{stack-adjust}} in AT&T syntax; Intel syntax is 169@samp{ret far @var{stack-adjust}}. 170 171@cindex sections, i386 172@cindex i386 sections 173@cindex sections, x86-64 174@cindex x86-64 sections 175@item 176The AT&T assembler does not provide support for multiple section 177programs. Unix style systems expect all programs to be single sections. 178@end itemize 179 180@node i386-Mnemonics 181@section Instruction Naming 182 183@cindex i386 instruction naming 184@cindex instruction naming, i386 185@cindex x86-64 instruction naming 186@cindex instruction naming, x86-64 187 188Instruction mnemonics are suffixed with one character modifiers which 189specify the size of operands. The letters @samp{b}, @samp{w}, @samp{l} 190and @samp{q} specify byte, word, long and quadruple word operands. If 191no suffix is specified by an instruction then @code{@value{AS}} tries to 192fill in the missing suffix based on the destination register operand 193(the last one by convention). Thus, @samp{mov %ax, %bx} is equivalent 194to @samp{movw %ax, %bx}; also, @samp{mov $1, %bx} is equivalent to 195@samp{movw $1, bx}. Note that this is incompatible with the AT&T Unix 196assembler which assumes that a missing mnemonic suffix implies long 197operand size. (This incompatibility does not affect compiler output 198since compilers always explicitly specify the mnemonic suffix.) 199 200Almost all instructions have the same names in AT&T and Intel format. 201There are a few exceptions. The sign extend and zero extend 202instructions need two sizes to specify them. They need a size to 203sign/zero extend @emph{from} and a size to zero extend @emph{to}. This 204is accomplished by using two instruction mnemonic suffixes in AT&T 205syntax. Base names for sign extend and zero extend are 206@samp{movs@dots{}} and @samp{movz@dots{}} in AT&T syntax (@samp{movsx} 207and @samp{movzx} in Intel syntax). The instruction mnemonic suffixes 208are tacked on to this base name, the @emph{from} suffix before the 209@emph{to} suffix. Thus, @samp{movsbl %al, %edx} is AT&T syntax for 210``move sign extend @emph{from} %al @emph{to} %edx.'' Possible suffixes, 211thus, are @samp{bl} (from byte to long), @samp{bw} (from byte to word), 212@samp{wl} (from word to long), @samp{bq} (from byte to quadruple word), 213@samp{wq} (from word to quadruple word), and @samp{lq} (from long to 214quadruple word). 215 216@cindex conversion instructions, i386 217@cindex i386 conversion instructions 218@cindex conversion instructions, x86-64 219@cindex x86-64 conversion instructions 220The Intel-syntax conversion instructions 221 222@itemize @bullet 223@item 224@samp{cbw} --- sign-extend byte in @samp{%al} to word in @samp{%ax}, 225 226@item 227@samp{cwde} --- sign-extend word in @samp{%ax} to long in @samp{%eax}, 228 229@item 230@samp{cwd} --- sign-extend word in @samp{%ax} to long in @samp{%dx:%ax}, 231 232@item 233@samp{cdq} --- sign-extend dword in @samp{%eax} to quad in @samp{%edx:%eax}, 234 235@item 236@samp{cdqe} --- sign-extend dword in @samp{%eax} to quad in @samp{%rax} 237(x86-64 only), 238 239@item 240@samp{cqo} --- sign-extend quad in @samp{%rax} to octuple in 241@samp{%rdx:%rax} (x86-64 only), 242@end itemize 243 244@noindent 245are called @samp{cbtw}, @samp{cwtl}, @samp{cwtd}, @samp{cltd}, @samp{cltq}, and 246@samp{cqto} in AT&T naming. @code{@value{AS}} accepts either naming for these 247instructions. 248 249@cindex jump instructions, i386 250@cindex call instructions, i386 251@cindex jump instructions, x86-64 252@cindex call instructions, x86-64 253Far call/jump instructions are @samp{lcall} and @samp{ljmp} in 254AT&T syntax, but are @samp{call far} and @samp{jump far} in Intel 255convention. 256 257@node i386-Regs 258@section Register Naming 259 260@cindex i386 registers 261@cindex registers, i386 262@cindex x86-64 registers 263@cindex registers, x86-64 264Register operands are always prefixed with @samp{%}. The 80386 registers 265consist of 266 267@itemize @bullet 268@item 269the 8 32-bit registers @samp{%eax} (the accumulator), @samp{%ebx}, 270@samp{%ecx}, @samp{%edx}, @samp{%edi}, @samp{%esi}, @samp{%ebp} (the 271frame pointer), and @samp{%esp} (the stack pointer). 272 273@item 274the 8 16-bit low-ends of these: @samp{%ax}, @samp{%bx}, @samp{%cx}, 275@samp{%dx}, @samp{%di}, @samp{%si}, @samp{%bp}, and @samp{%sp}. 276 277@item 278the 8 8-bit registers: @samp{%ah}, @samp{%al}, @samp{%bh}, 279@samp{%bl}, @samp{%ch}, @samp{%cl}, @samp{%dh}, and @samp{%dl} (These 280are the high-bytes and low-bytes of @samp{%ax}, @samp{%bx}, 281@samp{%cx}, and @samp{%dx}) 282 283@item 284the 6 section registers @samp{%cs} (code section), @samp{%ds} 285(data section), @samp{%ss} (stack section), @samp{%es}, @samp{%fs}, 286and @samp{%gs}. 287 288@item 289the 3 processor control registers @samp{%cr0}, @samp{%cr2}, and 290@samp{%cr3}. 291 292@item 293the 6 debug registers @samp{%db0}, @samp{%db1}, @samp{%db2}, 294@samp{%db3}, @samp{%db6}, and @samp{%db7}. 295 296@item 297the 2 test registers @samp{%tr6} and @samp{%tr7}. 298 299@item 300the 8 floating point register stack @samp{%st} or equivalently 301@samp{%st(0)}, @samp{%st(1)}, @samp{%st(2)}, @samp{%st(3)}, 302@samp{%st(4)}, @samp{%st(5)}, @samp{%st(6)}, and @samp{%st(7)}. 303These registers are overloaded by 8 MMX registers @samp{%mm0}, 304@samp{%mm1}, @samp{%mm2}, @samp{%mm3}, @samp{%mm4}, @samp{%mm5}, 305@samp{%mm6} and @samp{%mm7}. 306 307@item 308the 8 SSE registers registers @samp{%xmm0}, @samp{%xmm1}, @samp{%xmm2}, 309@samp{%xmm3}, @samp{%xmm4}, @samp{%xmm5}, @samp{%xmm6} and @samp{%xmm7}. 310@end itemize 311 312The AMD x86-64 architecture extends the register set by: 313 314@itemize @bullet 315@item 316enhancing the 8 32-bit registers to 64-bit: @samp{%rax} (the 317accumulator), @samp{%rbx}, @samp{%rcx}, @samp{%rdx}, @samp{%rdi}, 318@samp{%rsi}, @samp{%rbp} (the frame pointer), @samp{%rsp} (the stack 319pointer) 320 321@item 322the 8 extended registers @samp{%r8}--@samp{%r15}. 323 324@item 325the 8 32-bit low ends of the extended registers: @samp{%r8d}--@samp{%r15d} 326 327@item 328the 8 16-bit low ends of the extended registers: @samp{%r8w}--@samp{%r15w} 329 330@item 331the 8 8-bit low ends of the extended registers: @samp{%r8b}--@samp{%r15b} 332 333@item 334the 4 8-bit registers: @samp{%sil}, @samp{%dil}, @samp{%bpl}, @samp{%spl}. 335 336@item 337the 8 debug registers: @samp{%db8}--@samp{%db15}. 338 339@item 340the 8 SSE registers: @samp{%xmm8}--@samp{%xmm15}. 341@end itemize 342 343@node i386-Prefixes 344@section Instruction Prefixes 345 346@cindex i386 instruction prefixes 347@cindex instruction prefixes, i386 348@cindex prefixes, i386 349Instruction prefixes are used to modify the following instruction. They 350are used to repeat string instructions, to provide section overrides, to 351perform bus lock operations, and to change operand and address sizes. 352(Most instructions that normally operate on 32-bit operands will use 35316-bit operands if the instruction has an ``operand size'' prefix.) 354Instruction prefixes are best written on the same line as the instruction 355they act upon. For example, the @samp{scas} (scan string) instruction is 356repeated with: 357 358@smallexample 359 repne scas %es:(%edi),%al 360@end smallexample 361 362You may also place prefixes on the lines immediately preceding the 363instruction, but this circumvents checks that @code{@value{AS}} does 364with prefixes, and will not work with all prefixes. 365 366Here is a list of instruction prefixes: 367 368@cindex section override prefixes, i386 369@itemize @bullet 370@item 371Section override prefixes @samp{cs}, @samp{ds}, @samp{ss}, @samp{es}, 372@samp{fs}, @samp{gs}. These are automatically added by specifying 373using the @var{section}:@var{memory-operand} form for memory references. 374 375@cindex size prefixes, i386 376@item 377Operand/Address size prefixes @samp{data16} and @samp{addr16} 378change 32-bit operands/addresses into 16-bit operands/addresses, 379while @samp{data32} and @samp{addr32} change 16-bit ones (in a 380@code{.code16} section) into 32-bit operands/addresses. These prefixes 381@emph{must} appear on the same line of code as the instruction they 382modify. For example, in a 16-bit @code{.code16} section, you might 383write: 384 385@smallexample 386 addr32 jmpl *(%ebx) 387@end smallexample 388 389@cindex bus lock prefixes, i386 390@cindex inhibiting interrupts, i386 391@item 392The bus lock prefix @samp{lock} inhibits interrupts during execution of 393the instruction it precedes. (This is only valid with certain 394instructions; see a 80386 manual for details). 395 396@cindex coprocessor wait, i386 397@item 398The wait for coprocessor prefix @samp{wait} waits for the coprocessor to 399complete the current instruction. This should never be needed for the 40080386/80387 combination. 401 402@cindex repeat prefixes, i386 403@item 404The @samp{rep}, @samp{repe}, and @samp{repne} prefixes are added 405to string instructions to make them repeat @samp{%ecx} times (@samp{%cx} 406times if the current address size is 16-bits). 407@cindex REX prefixes, i386 408@item 409The @samp{rex} family of prefixes is used by x86-64 to encode 410extensions to i386 instruction set. The @samp{rex} prefix has four 411bits --- an operand size overwrite (@code{64}) used to change operand size 412from 32-bit to 64-bit and X, Y and Z extensions bits used to extend the 413register set. 414 415You may write the @samp{rex} prefixes directly. The @samp{rex64xyz} 416instruction emits @samp{rex} prefix with all the bits set. By omitting 417the @code{64}, @code{x}, @code{y} or @code{z} you may write other 418prefixes as well. Normally, there is no need to write the prefixes 419explicitly, since gas will automatically generate them based on the 420instruction operands. 421@end itemize 422 423@node i386-Memory 424@section Memory References 425 426@cindex i386 memory references 427@cindex memory references, i386 428@cindex x86-64 memory references 429@cindex memory references, x86-64 430An Intel syntax indirect memory reference of the form 431 432@smallexample 433@var{section}:[@var{base} + @var{index}*@var{scale} + @var{disp}] 434@end smallexample 435 436@noindent 437is translated into the AT&T syntax 438 439@smallexample 440@var{section}:@var{disp}(@var{base}, @var{index}, @var{scale}) 441@end smallexample 442 443@noindent 444where @var{base} and @var{index} are the optional 32-bit base and 445index registers, @var{disp} is the optional displacement, and 446@var{scale}, taking the values 1, 2, 4, and 8, multiplies @var{index} 447to calculate the address of the operand. If no @var{scale} is 448specified, @var{scale} is taken to be 1. @var{section} specifies the 449optional section register for the memory operand, and may override the 450default section register (see a 80386 manual for section register 451defaults). Note that section overrides in AT&T syntax @emph{must} 452be preceded by a @samp{%}. If you specify a section override which 453coincides with the default section register, @code{@value{AS}} does @emph{not} 454output any section register override prefixes to assemble the given 455instruction. Thus, section overrides can be specified to emphasize which 456section register is used for a given memory operand. 457 458Here are some examples of Intel and AT&T style memory references: 459 460@table @asis 461@item AT&T: @samp{-4(%ebp)}, Intel: @samp{[ebp - 4]} 462@var{base} is @samp{%ebp}; @var{disp} is @samp{-4}. @var{section} is 463missing, and the default section is used (@samp{%ss} for addressing with 464@samp{%ebp} as the base register). @var{index}, @var{scale} are both missing. 465 466@item AT&T: @samp{foo(,%eax,4)}, Intel: @samp{[foo + eax*4]} 467@var{index} is @samp{%eax} (scaled by a @var{scale} 4); @var{disp} is 468@samp{foo}. All other fields are missing. The section register here 469defaults to @samp{%ds}. 470 471@item AT&T: @samp{foo(,1)}; Intel @samp{[foo]} 472This uses the value pointed to by @samp{foo} as a memory operand. 473Note that @var{base} and @var{index} are both missing, but there is only 474@emph{one} @samp{,}. This is a syntactic exception. 475 476@item AT&T: @samp{%gs:foo}; Intel @samp{gs:foo} 477This selects the contents of the variable @samp{foo} with section 478register @var{section} being @samp{%gs}. 479@end table 480 481Absolute (as opposed to PC relative) call and jump operands must be 482prefixed with @samp{*}. If no @samp{*} is specified, @code{@value{AS}} 483always chooses PC relative addressing for jump/call labels. 484 485Any instruction that has a memory operand, but no register operand, 486@emph{must} specify its size (byte, word, long, or quadruple) with an 487instruction mnemonic suffix (@samp{b}, @samp{w}, @samp{l} or @samp{q}, 488respectively). 489 490The x86-64 architecture adds an RIP (instruction pointer relative) 491addressing. This addressing mode is specified by using @samp{rip} as a 492base register. Only constant offsets are valid. For example: 493 494@table @asis 495@item AT&T: @samp{1234(%rip)}, Intel: @samp{[rip + 1234]} 496Points to the address 1234 bytes past the end of the current 497instruction. 498 499@item AT&T: @samp{symbol(%rip)}, Intel: @samp{[rip + symbol]} 500Points to the @code{symbol} in RIP relative way, this is shorter than 501the default absolute addressing. 502@end table 503 504Other addressing modes remain unchanged in x86-64 architecture, except 505registers used are 64-bit instead of 32-bit. 506 507@node i386-Jumps 508@section Handling of Jump Instructions 509 510@cindex jump optimization, i386 511@cindex i386 jump optimization 512@cindex jump optimization, x86-64 513@cindex x86-64 jump optimization 514Jump instructions are always optimized to use the smallest possible 515displacements. This is accomplished by using byte (8-bit) displacement 516jumps whenever the target is sufficiently close. If a byte displacement 517is insufficient a long displacement is used. We do not support 518word (16-bit) displacement jumps in 32-bit mode (i.e. prefixing the jump 519instruction with the @samp{data16} instruction prefix), since the 80386 520insists upon masking @samp{%eip} to 16 bits after the word displacement 521is added. (See also @pxref{i386-Arch}) 522 523Note that the @samp{jcxz}, @samp{jecxz}, @samp{loop}, @samp{loopz}, 524@samp{loope}, @samp{loopnz} and @samp{loopne} instructions only come in byte 525displacements, so that if you use these instructions (@code{@value{GCC}} does 526not use them) you may get an error message (and incorrect code). The AT&T 52780386 assembler tries to get around this problem by expanding @samp{jcxz foo} 528to 529 530@smallexample 531 jcxz cx_zero 532 jmp cx_nonzero 533cx_zero: jmp foo 534cx_nonzero: 535@end smallexample 536 537@node i386-Float 538@section Floating Point 539 540@cindex i386 floating point 541@cindex floating point, i386 542@cindex x86-64 floating point 543@cindex floating point, x86-64 544All 80387 floating point types except packed BCD are supported. 545(BCD support may be added without much difficulty). These data 546types are 16-, 32-, and 64- bit integers, and single (32-bit), 547double (64-bit), and extended (80-bit) precision floating point. 548Each supported type has an instruction mnemonic suffix and a constructor 549associated with it. Instruction mnemonic suffixes specify the operand's 550data type. Constructors build these data types into memory. 551 552@cindex @code{float} directive, i386 553@cindex @code{single} directive, i386 554@cindex @code{double} directive, i386 555@cindex @code{tfloat} directive, i386 556@cindex @code{float} directive, x86-64 557@cindex @code{single} directive, x86-64 558@cindex @code{double} directive, x86-64 559@cindex @code{tfloat} directive, x86-64 560@itemize @bullet 561@item 562Floating point constructors are @samp{.float} or @samp{.single}, 563@samp{.double}, and @samp{.tfloat} for 32-, 64-, and 80-bit formats. 564These correspond to instruction mnemonic suffixes @samp{s}, @samp{l}, 565and @samp{t}. @samp{t} stands for 80-bit (ten byte) real. The 80387 566only supports this format via the @samp{fldt} (load 80-bit real to stack 567top) and @samp{fstpt} (store 80-bit real and pop stack) instructions. 568 569@cindex @code{word} directive, i386 570@cindex @code{long} directive, i386 571@cindex @code{int} directive, i386 572@cindex @code{quad} directive, i386 573@cindex @code{word} directive, x86-64 574@cindex @code{long} directive, x86-64 575@cindex @code{int} directive, x86-64 576@cindex @code{quad} directive, x86-64 577@item 578Integer constructors are @samp{.word}, @samp{.long} or @samp{.int}, and 579@samp{.quad} for the 16-, 32-, and 64-bit integer formats. The 580corresponding instruction mnemonic suffixes are @samp{s} (single), 581@samp{l} (long), and @samp{q} (quad). As with the 80-bit real format, 582the 64-bit @samp{q} format is only present in the @samp{fildq} (load 583quad integer to stack top) and @samp{fistpq} (store quad integer and pop 584stack) instructions. 585@end itemize 586 587Register to register operations should not use instruction mnemonic suffixes. 588@samp{fstl %st, %st(1)} will give a warning, and be assembled as if you 589wrote @samp{fst %st, %st(1)}, since all register to register operations 590use 80-bit floating point operands. (Contrast this with @samp{fstl %st, mem}, 591which converts @samp{%st} from 80-bit to 64-bit floating point format, 592then stores the result in the 4 byte location @samp{mem}) 593 594@node i386-SIMD 595@section Intel's MMX and AMD's 3DNow! SIMD Operations 596 597@cindex MMX, i386 598@cindex 3DNow!, i386 599@cindex SIMD, i386 600@cindex MMX, x86-64 601@cindex 3DNow!, x86-64 602@cindex SIMD, x86-64 603 604@code{@value{AS}} supports Intel's MMX instruction set (SIMD 605instructions for integer data), available on Intel's Pentium MMX 606processors and Pentium II processors, AMD's K6 and K6-2 processors, 607Cyrix' M2 processor, and probably others. It also supports AMD's 3DNow! 608instruction set (SIMD instructions for 32-bit floating point data) 609available on AMD's K6-2 processor and possibly others in the future. 610 611Currently, @code{@value{AS}} does not support Intel's floating point 612SIMD, Katmai (KNI). 613 614The eight 64-bit MMX operands, also used by 3DNow!, are called @samp{%mm0}, 615@samp{%mm1}, ... @samp{%mm7}. They contain eight 8-bit integers, four 61616-bit integers, two 32-bit integers, one 64-bit integer, or two 32-bit 617floating point values. The MMX registers cannot be used at the same time 618as the floating point stack. 619 620See Intel and AMD documentation, keeping in mind that the operand order in 621instructions is reversed from the Intel syntax. 622 623@node i386-16bit 624@section Writing 16-bit Code 625 626@cindex i386 16-bit code 627@cindex 16-bit code, i386 628@cindex real-mode code, i386 629@cindex @code{code16gcc} directive, i386 630@cindex @code{code16} directive, i386 631@cindex @code{code32} directive, i386 632@cindex @code{code64} directive, i386 633@cindex @code{code64} directive, x86-64 634While @code{@value{AS}} normally writes only ``pure'' 32-bit i386 code 635or 64-bit x86-64 code depending on the default configuration, 636it also supports writing code to run in real mode or in 16-bit protected 637mode code segments. To do this, put a @samp{.code16} or 638@samp{.code16gcc} directive before the assembly language instructions to 639be run in 16-bit mode. You can switch @code{@value{AS}} back to writing 640normal 32-bit code with the @samp{.code32} directive. 641 642@samp{.code16gcc} provides experimental support for generating 16-bit 643code from gcc, and differs from @samp{.code16} in that @samp{call}, 644@samp{ret}, @samp{enter}, @samp{leave}, @samp{push}, @samp{pop}, 645@samp{pusha}, @samp{popa}, @samp{pushf}, and @samp{popf} instructions 646default to 32-bit size. This is so that the stack pointer is 647manipulated in the same way over function calls, allowing access to 648function parameters at the same stack offsets as in 32-bit mode. 649@samp{.code16gcc} also automatically adds address size prefixes where 650necessary to use the 32-bit addressing modes that gcc generates. 651 652The code which @code{@value{AS}} generates in 16-bit mode will not 653necessarily run on a 16-bit pre-80386 processor. To write code that 654runs on such a processor, you must refrain from using @emph{any} 32-bit 655constructs which require @code{@value{AS}} to output address or operand 656size prefixes. 657 658Note that writing 16-bit code instructions by explicitly specifying a 659prefix or an instruction mnemonic suffix within a 32-bit code section 660generates different machine instructions than those generated for a 66116-bit code segment. In a 32-bit code section, the following code 662generates the machine opcode bytes @samp{66 6a 04}, which pushes the 663value @samp{4} onto the stack, decrementing @samp{%esp} by 2. 664 665@smallexample 666 pushw $4 667@end smallexample 668 669The same code in a 16-bit code section would generate the machine 670opcode bytes @samp{6a 04} (ie. without the operand size prefix), which 671is correct since the processor default operand size is assumed to be 16 672bits in a 16-bit code section. 673 674@node i386-Bugs 675@section AT&T Syntax bugs 676 677The UnixWare assembler, and probably other AT&T derived ix86 Unix 678assemblers, generate floating point instructions with reversed source 679and destination registers in certain cases. Unfortunately, gcc and 680possibly many other programs use this reversed syntax, so we're stuck 681with it. 682 683For example 684 685@smallexample 686 fsub %st,%st(3) 687@end smallexample 688@noindent 689results in @samp{%st(3)} being updated to @samp{%st - %st(3)} rather 690than the expected @samp{%st(3) - %st}. This happens with all the 691non-commutative arithmetic floating point operations with two register 692operands where the source register is @samp{%st} and the destination 693register is @samp{%st(i)}. 694 695@node i386-Arch 696@section Specifying CPU Architecture 697 698@cindex arch directive, i386 699@cindex i386 arch directive 700@cindex arch directive, x86-64 701@cindex x86-64 arch directive 702 703@code{@value{AS}} may be told to assemble for a particular CPU 704(sub-)architecture with the @code{.arch @var{cpu_type}} directive. This 705directive enables a warning when gas detects an instruction that is not 706supported on the CPU specified. The choices for @var{cpu_type} are: 707 708@multitable @columnfractions .20 .20 .20 .20 709@item @samp{i8086} @tab @samp{i186} @tab @samp{i286} @tab @samp{i386} 710@item @samp{i486} @tab @samp{i586} @tab @samp{i686} @tab @samp{pentium} 711@item @samp{pentiumpro} @tab @samp{pentiumii} @tab @samp{pentiumiii} @tab @samp{pentium4} 712@item @samp{k6} @tab @samp{athlon} @samp{sledgehammer} 713@item @samp{.mmx} @samp{.sse} @samp{.sse2} @samp{.sse3} @samp{.3dnow} 714@end multitable 715 716Apart from the warning, there are only two other effects on 717@code{@value{AS}} operation; Firstly, if you specify a CPU other than 718@samp{i486}, then shift by one instructions such as @samp{sarl $1, %eax} 719will automatically use a two byte opcode sequence. The larger three 720byte opcode sequence is used on the 486 (and when no architecture is 721specified) because it executes faster on the 486. Note that you can 722explicitly request the two byte opcode by writing @samp{sarl %eax}. 723Secondly, if you specify @samp{i8086}, @samp{i186}, or @samp{i286}, 724@emph{and} @samp{.code16} or @samp{.code16gcc} then byte offset 725conditional jumps will be promoted when necessary to a two instruction 726sequence consisting of a conditional jump of the opposite sense around 727an unconditional jump to the target. 728 729Following the CPU architecture (but not a sub-architecture, which are those 730starting with a dot), you may specify @samp{jumps} or @samp{nojumps} to 731control automatic promotion of conditional jumps. @samp{jumps} is the 732default, and enables jump promotion; All external jumps will be of the long 733variety, and file-local jumps will be promoted as necessary. 734(@pxref{i386-Jumps}) @samp{nojumps} leaves external conditional jumps as 735byte offset jumps, and warns about file-local conditional jumps that 736@code{@value{AS}} promotes. 737Unconditional jumps are treated as for @samp{jumps}. 738 739For example 740 741@smallexample 742 .arch i8086,nojumps 743@end smallexample 744 745@node i386-Notes 746@section Notes 747 748@cindex i386 @code{mul}, @code{imul} instructions 749@cindex @code{mul} instruction, i386 750@cindex @code{imul} instruction, i386 751@cindex @code{mul} instruction, x86-64 752@cindex @code{imul} instruction, x86-64 753There is some trickery concerning the @samp{mul} and @samp{imul} 754instructions that deserves mention. The 16-, 32-, 64- and 128-bit expanding 755multiplies (base opcode @samp{0xf6}; extension 4 for @samp{mul} and 5 756for @samp{imul}) can be output only in the one operand form. Thus, 757@samp{imul %ebx, %eax} does @emph{not} select the expanding multiply; 758the expanding multiply would clobber the @samp{%edx} register, and this 759would confuse @code{@value{GCC}} output. Use @samp{imul %ebx} to get the 76064-bit product in @samp{%edx:%eax}. 761 762We have added a two operand form of @samp{imul} when the first operand 763is an immediate mode expression and the second operand is a register. 764This is just a shorthand, so that, multiplying @samp{%eax} by 69, for 765example, can be done with @samp{imul $69, %eax} rather than @samp{imul 766$69, %eax, %eax}. 767 768