10b57cec5SDimitry AndricCode Generation Notes for MSA 20b57cec5SDimitry Andric============================= 30b57cec5SDimitry Andric 40b57cec5SDimitry AndricIntrinsics are lowered to SelectionDAG nodes where possible in order to enable 50b57cec5SDimitry Andricoptimisation, reduce the size of the ISel matcher, and reduce repetition in 60b57cec5SDimitry Andricthe implementation. In a small number of cases, this can cause different 70b57cec5SDimitry Andric(semantically equivalent) instructions to be used in place of the requested 80b57cec5SDimitry Andricinstruction, even when no optimisation has taken place. 90b57cec5SDimitry Andric 100b57cec5SDimitry AndricInstructions 110b57cec5SDimitry Andric============ 120b57cec5SDimitry Andric 130b57cec5SDimitry AndricThis section describes any quirks of instruction selection for MSA. For 140b57cec5SDimitry Andricexample, two instructions might be equally valid for some given IR and one is 150b57cec5SDimitry Andricchosen in preference to the other. 160b57cec5SDimitry Andric 170b57cec5SDimitry Andricbclri.b: 180b57cec5SDimitry Andric It is not possible to emit bclri.b since andi.b covers exactly the 190b57cec5SDimitry Andric same cases. andi.b should use fractionally less power than bclri.b in 200b57cec5SDimitry Andric most hardware implementations so it is used in preference to bclri.b. 210b57cec5SDimitry Andric 220b57cec5SDimitry Andricvshf.w: 230b57cec5SDimitry Andric It is not possible to emit vshf.w when the shuffle description is 240b57cec5SDimitry Andric constant since shf.w covers exactly the same cases. shf.w is used 250b57cec5SDimitry Andric instead. It is also impossible for the shuffle description to be 260b57cec5SDimitry Andric unknown at compile-time due to the definition of shufflevector in 270b57cec5SDimitry Andric LLVM IR. 280b57cec5SDimitry Andric 290b57cec5SDimitry Andricvshf.[bhwd] 300b57cec5SDimitry Andric When the shuffle description describes a splat operation, splat.[bhwd] 310b57cec5SDimitry Andric instructions will be selected instead of vshf.[bhwd]. Unlike the ilv*, 320b57cec5SDimitry Andric and pck* instructions, this is matched from MipsISD::VSHF instead of 330b57cec5SDimitry Andric a special-case MipsISD node. 340b57cec5SDimitry Andric 350b57cec5SDimitry Andricilvl.d, pckev.d: 360b57cec5SDimitry Andric It is not possible to emit ilvl.d, or pckev.d since ilvev.d covers the 370b57cec5SDimitry Andric same shuffle. ilvev.d will be emitted instead. 380b57cec5SDimitry Andric 390b57cec5SDimitry Andricilvr.d, ilvod.d, pckod.d: 400b57cec5SDimitry Andric It is not possible to emit ilvr.d, or pckod.d since ilvod.d covers the 410b57cec5SDimitry Andric same shuffle. ilvod.d will be emitted instead. 420b57cec5SDimitry Andric 430b57cec5SDimitry Andricsplat.[bhwd] 440b57cec5SDimitry Andric The intrinsic will work as expected. However, unlike other intrinsics 450b57cec5SDimitry Andric it lowers directly to MipsISD::VSHF instead of using common IR. 460b57cec5SDimitry Andric 470b57cec5SDimitry Andricsplati.w: 480b57cec5SDimitry Andric It is not possible to emit splati.w since shf.w covers the same cases. 490b57cec5SDimitry Andric shf.w will be emitted instead. 500b57cec5SDimitry Andric 510b57cec5SDimitry Andriccopy_s.w: 520b57cec5SDimitry Andric On MIPS32, the copy_u.d intrinsic will emit this instruction instead of 530b57cec5SDimitry Andric copy_u.w. This is semantically equivalent since the general-purpose 540b57cec5SDimitry Andric register file is 32-bits wide. 550b57cec5SDimitry Andric 560b57cec5SDimitry Andricbinsri.[bhwd], binsli.[bhwd]: 570b57cec5SDimitry Andric These two operations are equivalent to each other with the operands 580b57cec5SDimitry Andric swapped and condition inverted. The compiler may use either one as 590b57cec5SDimitry Andric appropriate. 600b57cec5SDimitry Andric Furthermore, the compiler may use bsel.[bhwd] for some masks that do 610b57cec5SDimitry Andric not survive the legalization process (this is a bug and will be fixed). 620b57cec5SDimitry Andric 630b57cec5SDimitry Andricbmnz.v, bmz.v, bsel.v: 640b57cec5SDimitry Andric These three operations differ only in the operand that is tied to the 650b57cec5SDimitry Andric result and the order of the operands. 660b57cec5SDimitry Andric It is (currently) not possible to emit bmz.v, or bsel.v since bmnz.v is 670b57cec5SDimitry Andric the same operation and will be emitted instead. 680b57cec5SDimitry Andric In future, the compiler may choose between these three instructions 690b57cec5SDimitry Andric according to register allocation. 700b57cec5SDimitry Andric These three operations can be very confusing so here is a mapping 710b57cec5SDimitry Andric between the instructions and the vselect node in one place: 720b57cec5SDimitry Andric bmz.v wd, ws, wt/i8 -> (vselect wt/i8, wd, ws) 730b57cec5SDimitry Andric bmnz.v wd, ws, wt/i8 -> (vselect wt/i8, ws, wd) 740b57cec5SDimitry Andric bsel.v wd, ws, wt/i8 -> (vselect wd, wt/i8, ws) 750b57cec5SDimitry Andric 760b57cec5SDimitry Andricbmnzi.b, bmzi.b: 770b57cec5SDimitry Andric Like their non-immediate counterparts, bmnzi.v and bmzi.v are the same 780b57cec5SDimitry Andric operation with the operands swapped. bmnzi.v will (currently) be emitted 790b57cec5SDimitry Andric for both cases. 800b57cec5SDimitry Andric 810b57cec5SDimitry Andricbseli.v: 820b57cec5SDimitry Andric Unlike the non-immediate versions, bseli.v is distinguishable from 830b57cec5SDimitry Andric bmnzi.b and bmzi.b and can be emitted. 84