10b57cec5SDimitry AndricCode Generation Notes for MSA
20b57cec5SDimitry Andric=============================
30b57cec5SDimitry Andric
40b57cec5SDimitry AndricIntrinsics are lowered to SelectionDAG nodes where possible in order to enable
50b57cec5SDimitry Andricoptimisation, reduce the size of the ISel matcher, and reduce repetition in
60b57cec5SDimitry Andricthe implementation. In a small number of cases, this can cause different
70b57cec5SDimitry Andric(semantically equivalent) instructions to be used in place of the requested
80b57cec5SDimitry Andricinstruction, even when no optimisation has taken place.
90b57cec5SDimitry Andric
100b57cec5SDimitry AndricInstructions
110b57cec5SDimitry Andric============
120b57cec5SDimitry Andric
130b57cec5SDimitry AndricThis section describes any quirks of instruction selection for MSA. For
140b57cec5SDimitry Andricexample, two instructions might be equally valid for some given IR and one is
150b57cec5SDimitry Andricchosen in preference to the other.
160b57cec5SDimitry Andric
170b57cec5SDimitry Andricbclri.b:
180b57cec5SDimitry Andric        It is not possible to emit bclri.b since andi.b covers exactly the
190b57cec5SDimitry Andric        same cases. andi.b should use fractionally less power than bclri.b in
200b57cec5SDimitry Andric        most hardware implementations so it is used in preference to bclri.b.
210b57cec5SDimitry Andric
220b57cec5SDimitry Andricvshf.w:
230b57cec5SDimitry Andric        It is not possible to emit vshf.w when the shuffle description is
240b57cec5SDimitry Andric        constant since shf.w covers exactly the same cases. shf.w is used
250b57cec5SDimitry Andric        instead. It is also impossible for the shuffle description to be
260b57cec5SDimitry Andric        unknown at compile-time due to the definition of shufflevector in
270b57cec5SDimitry Andric        LLVM IR.
280b57cec5SDimitry Andric
290b57cec5SDimitry Andricvshf.[bhwd]
300b57cec5SDimitry Andric        When the shuffle description describes a splat operation, splat.[bhwd]
310b57cec5SDimitry Andric        instructions will be selected instead of vshf.[bhwd]. Unlike the ilv*,
320b57cec5SDimitry Andric        and pck* instructions, this is matched from MipsISD::VSHF instead of
330b57cec5SDimitry Andric        a special-case MipsISD node.
340b57cec5SDimitry Andric
350b57cec5SDimitry Andricilvl.d, pckev.d:
360b57cec5SDimitry Andric        It is not possible to emit ilvl.d, or pckev.d since ilvev.d covers the
370b57cec5SDimitry Andric        same shuffle. ilvev.d will be emitted instead.
380b57cec5SDimitry Andric
390b57cec5SDimitry Andricilvr.d, ilvod.d, pckod.d:
400b57cec5SDimitry Andric        It is not possible to emit ilvr.d, or pckod.d since ilvod.d covers the
410b57cec5SDimitry Andric        same shuffle. ilvod.d will be emitted instead.
420b57cec5SDimitry Andric
430b57cec5SDimitry Andricsplat.[bhwd]
440b57cec5SDimitry Andric        The intrinsic will work as expected. However, unlike other intrinsics
450b57cec5SDimitry Andric        it lowers directly to MipsISD::VSHF instead of using common IR.
460b57cec5SDimitry Andric
470b57cec5SDimitry Andricsplati.w:
480b57cec5SDimitry Andric        It is not possible to emit splati.w since shf.w covers the same cases.
490b57cec5SDimitry Andric        shf.w will be emitted instead.
500b57cec5SDimitry Andric
510b57cec5SDimitry Andriccopy_s.w:
520b57cec5SDimitry Andric        On MIPS32, the copy_u.d intrinsic will emit this instruction instead of
530b57cec5SDimitry Andric        copy_u.w. This is semantically equivalent since the general-purpose
540b57cec5SDimitry Andric        register file is 32-bits wide.
550b57cec5SDimitry Andric
560b57cec5SDimitry Andricbinsri.[bhwd],  binsli.[bhwd]:
570b57cec5SDimitry Andric        These two operations are equivalent to each other with the operands
580b57cec5SDimitry Andric        swapped and condition inverted. The compiler may use either one as
590b57cec5SDimitry Andric        appropriate.
600b57cec5SDimitry Andric        Furthermore, the compiler may use bsel.[bhwd] for some masks that do
610b57cec5SDimitry Andric        not survive the legalization process (this is a bug and will be fixed).
620b57cec5SDimitry Andric
630b57cec5SDimitry Andricbmnz.v, bmz.v, bsel.v:
640b57cec5SDimitry Andric        These three operations differ only in the operand that is tied to the
650b57cec5SDimitry Andric        result and the order of the operands.
660b57cec5SDimitry Andric        It is (currently) not possible to emit bmz.v, or bsel.v since bmnz.v is
670b57cec5SDimitry Andric        the same operation and will be emitted instead.
680b57cec5SDimitry Andric        In future, the compiler may choose between these three instructions
690b57cec5SDimitry Andric        according to register allocation.
700b57cec5SDimitry Andric        These three operations can be very confusing so here is a mapping
710b57cec5SDimitry Andric        between the instructions and the vselect node in one place:
720b57cec5SDimitry Andric                bmz.v  wd, ws, wt/i8 -> (vselect wt/i8, wd, ws)
730b57cec5SDimitry Andric                bmnz.v wd, ws, wt/i8 -> (vselect wt/i8, ws, wd)
740b57cec5SDimitry Andric                bsel.v wd, ws, wt/i8 -> (vselect wd, wt/i8, ws)
750b57cec5SDimitry Andric
760b57cec5SDimitry Andricbmnzi.b, bmzi.b:
770b57cec5SDimitry Andric        Like their non-immediate counterparts, bmnzi.v and bmzi.v are the same
780b57cec5SDimitry Andric        operation with the operands swapped. bmnzi.v will (currently) be emitted
790b57cec5SDimitry Andric        for both cases.
800b57cec5SDimitry Andric
810b57cec5SDimitry Andricbseli.v:
820b57cec5SDimitry Andric        Unlike the non-immediate versions, bseli.v is distinguishable from
830b57cec5SDimitry Andric        bmnzi.b and bmzi.b and can be emitted.
84