Target/X86/README-FPStack.txt

*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg// Random ideas for the X86 backend: FP stack related stuff
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoergSome targets (e.g. athlons) prefer freep to fstp ST(0):
*06f32e7eSjoerghttp://gcc.gnu.org/ml/gcc-patches/2004-04/msg00659.html
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoergThis should use fiadd on chips where it is profitable:
*06f32e7eSjoergdouble foo(double P, int *I) { return P+*I; }
*06f32e7eSjoerg
*06f32e7eSjoergWe have fiadd patterns now but the followings have the same cost and
*06f32e7eSjoergcomplexity. We need a way to specify the later is more profitable.
*06f32e7eSjoerg
*06f32e7eSjoergdef FpADD32m  : FpI<(ops RFP:$dst, RFP:$src1, f32mem:$src2), OneArgFPRW,
*06f32e7eSjoerg                    [(set RFP:$dst, (fadd RFP:$src1,
*06f32e7eSjoerg                                     (extloadf64f32 addr:$src2)))]>;
*06f32e7eSjoerg                // ST(0) = ST(0) + [mem32]
*06f32e7eSjoerg
*06f32e7eSjoergdef FpIADD32m : FpI<(ops RFP:$dst, RFP:$src1, i32mem:$src2), OneArgFPRW,
*06f32e7eSjoerg                    [(set RFP:$dst, (fadd RFP:$src1,
*06f32e7eSjoerg                                     (X86fild addr:$src2, i32)))]>;
*06f32e7eSjoerg                // ST(0) = ST(0) + [mem32int]
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoergThe FP stackifier should handle simple permutates to reduce number of shuffle
*06f32e7eSjoerginstructions, e.g. turning:
*06f32e7eSjoerg
*06f32e7eSjoergfld P	->		fld Q
*06f32e7eSjoergfld Q			fld P
*06f32e7eSjoergfxch
*06f32e7eSjoerg
*06f32e7eSjoergor:
*06f32e7eSjoerg
*06f32e7eSjoergfxch	->		fucomi
*06f32e7eSjoergfucomi			jl X
*06f32e7eSjoergjg X
*06f32e7eSjoerg
*06f32e7eSjoergIdeas:
*06f32e7eSjoerghttp://gcc.gnu.org/ml/gcc-patches/2004-11/msg02410.html
*06f32e7eSjoerg
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoergAdd a target specific hook to DAG combiner to handle SINT_TO_FP and
*06f32e7eSjoergFP_TO_SINT when the source operand is already in memory.
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoergOpen code rint,floor,ceil,trunc:
*06f32e7eSjoerghttp://gcc.gnu.org/ml/gcc-patches/2004-08/msg02006.html
*06f32e7eSjoerghttp://gcc.gnu.org/ml/gcc-patches/2004-08/msg02011.html
*06f32e7eSjoerg
*06f32e7eSjoergOpencode the sincos[f] libcall.
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoergNone of the FPStack instructions are handled in
*06f32e7eSjoergX86RegisterInfo::foldMemoryOperand, which prevents the spiller from
*06f32e7eSjoergfolding spill code into the instructions.
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//
*06f32e7eSjoerg
*06f32e7eSjoergCurrently the x86 codegen isn't very good at mixing SSE and FPStack
*06f32e7eSjoergcode:
*06f32e7eSjoerg
*06f32e7eSjoergunsigned int foo(double x) { return x; }
*06f32e7eSjoerg
*06f32e7eSjoergfoo:
*06f32e7eSjoerg	subl $20, %esp
*06f32e7eSjoerg	movsd 24(%esp), %xmm0
*06f32e7eSjoerg	movsd %xmm0, 8(%esp)
*06f32e7eSjoerg	fldl 8(%esp)
*06f32e7eSjoerg	fisttpll (%esp)
*06f32e7eSjoerg	movl (%esp), %eax
*06f32e7eSjoerg	addl $20, %esp
*06f32e7eSjoerg	ret
*06f32e7eSjoerg
*06f32e7eSjoergThis just requires being smarter when custom expanding fptoui.
*06f32e7eSjoerg
*06f32e7eSjoerg//===---------------------------------------------------------------------===//