1======================================== 2Machine IR (MIR) Format Reference Manual 3======================================== 4 5.. contents:: 6 :local: 7 8.. warning:: 9 This is a work in progress. 10 11Introduction 12============ 13 14This document is a reference manual for the Machine IR (MIR) serialization 15format. MIR is a human readable serialization format that is used to represent 16LLVM's :ref:`machine specific intermediate representation 17<machine code representation>`. 18 19The MIR serialization format is designed to be used for testing the code 20generation passes in LLVM. 21 22Overview 23======== 24 25The MIR serialization format uses a YAML container. YAML is a standard 26data serialization language, and the full YAML language spec can be read at 27`yaml.org 28<http://www.yaml.org/spec/1.2/spec.html#Introduction>`_. 29 30A MIR file is split up into a series of `YAML documents`_. The first document 31can contain an optional embedded LLVM IR module, and the rest of the documents 32contain the serialized machine functions. 33 34.. _YAML documents: http://www.yaml.org/spec/1.2/spec.html#id2800132 35 36MIR Testing Guide 37================= 38 39You can use the MIR format for testing in two different ways: 40 41- You can write MIR tests that invoke a single code generation pass using the 42 ``-run-pass`` option in llc. 43 44- You can use llc's ``-stop-after`` option with existing or new LLVM assembly 45 tests and check the MIR output of a specific code generation pass. 46 47Testing Individual Code Generation Passes 48----------------------------------------- 49 50The ``-run-pass`` option in llc allows you to create MIR tests that invoke just 51a single code generation pass. When this option is used, llc will parse an 52input MIR file, run the specified code generation pass(es), and output the 53resulting MIR code. 54 55You can generate an input MIR file for the test by using the ``-stop-after`` or 56``-stop-before`` option in llc. For example, if you would like to write a test 57for the post register allocation pseudo instruction expansion pass, you can 58specify the machine copy propagation pass in the ``-stop-after`` option, as it 59runs just before the pass that we are trying to test: 60 61 ``llc -stop-after=machine-cp bug-trigger.ll > test.mir`` 62 63If the same pass is run multiple times, a run index can be included 64after the name with a comma. 65 66 ``llc -stop-after=dead-mi-elimination,1 bug-trigger.ll > test.mir`` 67 68After generating the input MIR file, you'll have to add a run line that uses 69the ``-run-pass`` option to it. In order to test the post register allocation 70pseudo instruction expansion pass on X86-64, a run line like the one shown 71below can be used: 72 73 ``# RUN: llc -o - %s -mtriple=x86_64-- -run-pass=postrapseudos | FileCheck %s`` 74 75The MIR files are target dependent, so they have to be placed in the target 76specific test directories (``lib/CodeGen/TARGETNAME``). They also need to 77specify a target triple or a target architecture either in the run line or in 78the embedded LLVM IR module. 79 80Simplifying MIR files 81^^^^^^^^^^^^^^^^^^^^^ 82 83The MIR code coming out of ``-stop-after``/``-stop-before`` is very verbose; 84Tests are more accessible and future proof when simplified: 85 86- Use the ``-simplify-mir`` option with llc. 87 88- Machine function attributes often have default values or the test works just 89 as well with default values. Typical candidates for this are: `alignment:`, 90 `exposesReturnsTwice`, `legalized`, `regBankSelected`, `selected`. 91 The whole `frameInfo` section is often unnecessary if there is no special 92 frame usage in the function. `tracksRegLiveness` on the other hand is often 93 necessary for some passes that care about block livein lists. 94 95- The (global) `liveins:` list is typically only interesting for early 96 instruction selection passes and can be removed when testing later passes. 97 The per-block `liveins:` on the other hand are necessary if 98 `tracksRegLiveness` is true. 99 100- Branch probability data in block `successors:` lists can be dropped if the 101 test doesn't depend on it. Example: 102 `successors: %bb.1(0x40000000), %bb.2(0x40000000)` can be replaced with 103 `successors: %bb.1, %bb.2`. 104 105- MIR code contains a whole IR module. This is necessary because there are 106 no equivalents in MIR for global variables, references to external functions, 107 function attributes, metadata, debug info. Instead some MIR data references 108 the IR constructs. You can often remove them if the test doesn't depend on 109 them. 110 111- Alias Analysis is performed on IR values. These are referenced by memory 112 operands in MIR. Example: `:: (load 8 from %ir.foobar, !alias.scope !9)`. 113 If the test doesn't depend on (good) alias analysis the references can be 114 dropped: `:: (load 8)` 115 116- MIR blocks can reference IR blocks for debug printing, profile information 117 or debug locations. Example: `bb.42.myblock` in MIR references the IR block 118 `myblock`. It is usually possible to drop the `.myblock` reference and simply 119 use `bb.42`. 120 121- If there are no memory operands or blocks referencing the IR then the 122 IR function can be replaced by a parameterless dummy function like 123 `define @func() { ret void }`. 124 125- It is possible to drop the whole IR section of the MIR file if it only 126 contains dummy functions (see above). The .mir loader will create the 127 IR functions automatically in this case. 128 129.. _limitations: 130 131Limitations 132----------- 133 134Currently the MIR format has several limitations in terms of which state it 135can serialize: 136 137- The target-specific state in the target-specific ``MachineFunctionInfo`` 138 subclasses isn't serialized at the moment. 139 140- The target-specific ``MachineConstantPoolValue`` subclasses (in the ARM and 141 SystemZ backends) aren't serialized at the moment. 142 143- The ``MCSymbol`` machine operands don't support temporary or local symbols. 144 145- A lot of the state in ``MachineModuleInfo`` isn't serialized - only the CFI 146 instructions and the variable debug information from MMI is serialized right 147 now. 148 149These limitations impose restrictions on what you can test with the MIR format. 150For now, tests that would like to test some behaviour that depends on the state 151of temporary or local ``MCSymbol`` operands or the exception handling state in 152MMI, can't use the MIR format. As well as that, tests that test some behaviour 153that depends on the state of the target specific ``MachineFunctionInfo`` or 154``MachineConstantPoolValue`` subclasses can't use the MIR format at the moment. 155 156High Level Structure 157==================== 158 159.. _embedded-module: 160 161Embedded Module 162--------------- 163 164When the first YAML document contains a `YAML block literal string`_, the MIR 165parser will treat this string as an LLVM assembly language string that 166represents an embedded LLVM IR module. 167Here is an example of a YAML document that contains an LLVM module: 168 169.. code-block:: llvm 170 171 define i32 @inc(i32* %x) { 172 entry: 173 %0 = load i32, i32* %x 174 %1 = add i32 %0, 1 175 store i32 %1, i32* %x 176 ret i32 %1 177 } 178 179.. _YAML block literal string: http://www.yaml.org/spec/1.2/spec.html#id2795688 180 181Machine Functions 182----------------- 183 184The remaining YAML documents contain the machine functions. This is an example 185of such YAML document: 186 187.. code-block:: text 188 189 --- 190 name: inc 191 tracksRegLiveness: true 192 liveins: 193 - { reg: '$rdi' } 194 callSites: 195 - { bb: 0, offset: 3, fwdArgRegs: 196 - { arg: 0, reg: '$edi' } } 197 body: | 198 bb.0.entry: 199 liveins: $rdi 200 201 $eax = MOV32rm $rdi, 1, _, 0, _ 202 $eax = INC32r killed $eax, implicit-def dead $eflags 203 MOV32mr killed $rdi, 1, _, 0, _, $eax 204 CALL64pcrel32 @foo <regmask...> 205 RETQ $eax 206 ... 207 208The document above consists of attributes that represent the various 209properties and data structures in a machine function. 210 211The attribute ``name`` is required, and its value should be identical to the 212name of a function that this machine function is based on. 213 214The attribute ``body`` is a `YAML block literal string`_. Its value represents 215the function's machine basic blocks and their machine instructions. 216 217The attribute ``callSites`` is a representation of call site information which 218keeps track of call instructions and registers used to transfer call arguments. 219 220Machine Instructions Format Reference 221===================================== 222 223The machine basic blocks and their instructions are represented using a custom, 224human readable serialization language. This language is used in the 225`YAML block literal string`_ that corresponds to the machine function's body. 226 227A source string that uses this language contains a list of machine basic 228blocks, which are described in the section below. 229 230Machine Basic Blocks 231-------------------- 232 233A machine basic block is defined in a single block definition source construct 234that contains the block's ID. 235The example below defines two blocks that have an ID of zero and one: 236 237.. code-block:: text 238 239 bb.0: 240 <instructions> 241 bb.1: 242 <instructions> 243 244A machine basic block can also have a name. It should be specified after the ID 245in the block's definition: 246 247.. code-block:: text 248 249 bb.0.entry: ; This block's name is "entry" 250 <instructions> 251 252The block's name should be identical to the name of the IR block that this 253machine block is based on. 254 255.. _block-references: 256 257Block References 258^^^^^^^^^^^^^^^^ 259 260The machine basic blocks are identified by their ID numbers. Individual 261blocks are referenced using the following syntax: 262 263.. code-block:: text 264 265 %bb.<id> 266 267Example: 268 269.. code-block:: llvm 270 271 %bb.0 272 273The following syntax is also supported, but the former syntax is preferred for 274block references: 275 276.. code-block:: text 277 278 %bb.<id>[.<name>] 279 280Example: 281 282.. code-block:: llvm 283 284 %bb.1.then 285 286Successors 287^^^^^^^^^^ 288 289The machine basic block's successors have to be specified before any of the 290instructions: 291 292.. code-block:: text 293 294 bb.0.entry: 295 successors: %bb.1.then, %bb.2.else 296 <instructions> 297 bb.1.then: 298 <instructions> 299 bb.2.else: 300 <instructions> 301 302The branch weights can be specified in brackets after the successor blocks. 303The example below defines a block that has two successors with branch weights 304of 32 and 16: 305 306.. code-block:: text 307 308 bb.0.entry: 309 successors: %bb.1.then(32), %bb.2.else(16) 310 311.. _bb-liveins: 312 313Live In Registers 314^^^^^^^^^^^^^^^^^ 315 316The machine basic block's live in registers have to be specified before any of 317the instructions: 318 319.. code-block:: text 320 321 bb.0.entry: 322 liveins: $edi, $esi 323 324The list of live in registers and successors can be empty. The language also 325allows multiple live in register and successor lists - they are combined into 326one list by the parser. 327 328Miscellaneous Attributes 329^^^^^^^^^^^^^^^^^^^^^^^^ 330 331The attributes ``IsAddressTaken``, ``IsLandingPad`` and ``Alignment`` can be 332specified in brackets after the block's definition: 333 334.. code-block:: text 335 336 bb.0.entry (address-taken): 337 <instructions> 338 bb.2.else (align 4): 339 <instructions> 340 bb.3(landing-pad, align 4): 341 <instructions> 342 343.. TODO: Describe the way the reference to an unnamed LLVM IR block can be 344 preserved. 345 346``Alignment`` is specified in bytes, and must be a power of two. 347 348.. _mir-instructions: 349 350Machine Instructions 351-------------------- 352 353A machine instruction is composed of a name, 354:ref:`machine operands <machine-operands>`, 355:ref:`instruction flags <instruction-flags>`, and machine memory operands. 356 357The instruction's name is usually specified before the operands. The example 358below shows an instance of the X86 ``RETQ`` instruction with a single machine 359operand: 360 361.. code-block:: text 362 363 RETQ $eax 364 365However, if the machine instruction has one or more explicitly defined register 366operands, the instruction's name has to be specified after them. The example 367below shows an instance of the AArch64 ``LDPXpost`` instruction with three 368defined register operands: 369 370.. code-block:: text 371 372 $sp, $fp, $lr = LDPXpost $sp, 2 373 374The instruction names are serialized using the exact definitions from the 375target's ``*InstrInfo.td`` files, and they are case sensitive. This means that 376similar instruction names like ``TSTri`` and ``tSTRi`` represent different 377machine instructions. 378 379.. _instruction-flags: 380 381Instruction Flags 382^^^^^^^^^^^^^^^^^ 383 384The flag ``frame-setup`` or ``frame-destroy`` can be specified before the 385instruction's name: 386 387.. code-block:: text 388 389 $fp = frame-setup ADDXri $sp, 0, 0 390 391.. code-block:: text 392 393 $x21, $x20 = frame-destroy LDPXi $sp 394 395.. _registers: 396 397Bundled Instructions 398^^^^^^^^^^^^^^^^^^^^ 399 400The syntax for bundled instructions is the following: 401 402.. code-block:: text 403 404 BUNDLE implicit-def $r0, implicit-def $r1, implicit $r2 { 405 $r0 = SOME_OP $r2 406 $r1 = ANOTHER_OP internal $r0 407 } 408 409The first instruction is often a bundle header. The instructions between ``{`` 410and ``}`` are bundled with the first instruction. 411 412.. _mir-registers: 413 414Registers 415--------- 416 417Registers are one of the key primitives in the machine instructions 418serialization language. They are primarily used in the 419:ref:`register machine operands <register-operands>`, 420but they can also be used in a number of other places, like the 421:ref:`basic block's live in list <bb-liveins>`. 422 423The physical registers are identified by their name and by the '$' prefix sigil. 424They use the following syntax: 425 426.. code-block:: text 427 428 $<name> 429 430The example below shows three X86 physical registers: 431 432.. code-block:: text 433 434 $eax 435 $r15 436 $eflags 437 438The virtual registers are identified by their ID number and by the '%' sigil. 439They use the following syntax: 440 441.. code-block:: text 442 443 %<id> 444 445Example: 446 447.. code-block:: text 448 449 %0 450 451The null registers are represented using an underscore ('``_``'). They can also be 452represented using a '``$noreg``' named register, although the former syntax 453is preferred. 454 455.. _machine-operands: 456 457Machine Operands 458---------------- 459 460There are seventeen different kinds of machine operands, and all of them can be 461serialized. 462 463Immediate Operands 464^^^^^^^^^^^^^^^^^^ 465 466The immediate machine operands are untyped, 64-bit signed integers. The 467example below shows an instance of the X86 ``MOV32ri`` instruction that has an 468immediate machine operand ``-42``: 469 470.. code-block:: text 471 472 $eax = MOV32ri -42 473 474An immediate operand is also used to represent a subregister index when the 475machine instruction has one of the following opcodes: 476 477- ``EXTRACT_SUBREG`` 478 479- ``INSERT_SUBREG`` 480 481- ``REG_SEQUENCE`` 482 483- ``SUBREG_TO_REG`` 484 485In case this is true, the Machine Operand is printed according to the target. 486 487For example: 488 489In AArch64RegisterInfo.td: 490 491.. code-block:: text 492 493 def sub_32 : SubRegIndex<32>; 494 495If the third operand is an immediate with the value ``15`` (target-dependent 496value), based on the instruction's opcode and the operand's index the operand 497will be printed as ``%subreg.sub_32``: 498 499.. code-block:: text 500 501 %1:gpr64 = SUBREG_TO_REG 0, %0, %subreg.sub_32 502 503For integers > 64bit, we use a special machine operand, ``MO_CImmediate``, 504which stores the immediate in a ``ConstantInt`` using an ``APInt`` (LLVM's 505arbitrary precision integers). 506 507.. TODO: Describe the FPIMM immediate operands. 508 509.. _register-operands: 510 511Register Operands 512^^^^^^^^^^^^^^^^^ 513 514The :ref:`register <registers>` primitive is used to represent the register 515machine operands. The register operands can also have optional 516:ref:`register flags <register-flags>`, 517:ref:`a subregister index <subregister-indices>`, 518and a reference to the tied register operand. 519The full syntax of a register operand is shown below: 520 521.. code-block:: text 522 523 [<flags>] <register> [ :<subregister-idx-name> ] [ (tied-def <tied-op>) ] 524 525This example shows an instance of the X86 ``XOR32rr`` instruction that has 5265 register operands with different register flags: 527 528.. code-block:: text 529 530 dead $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $al 531 532.. _register-flags: 533 534Register Flags 535~~~~~~~~~~~~~~ 536 537The table below shows all of the possible register flags along with the 538corresponding internal ``llvm::RegState`` representation: 539 540.. list-table:: 541 :header-rows: 1 542 543 * - Flag 544 - Internal Value 545 546 * - ``implicit`` 547 - ``RegState::Implicit`` 548 549 * - ``implicit-def`` 550 - ``RegState::ImplicitDefine`` 551 552 * - ``def`` 553 - ``RegState::Define`` 554 555 * - ``dead`` 556 - ``RegState::Dead`` 557 558 * - ``killed`` 559 - ``RegState::Kill`` 560 561 * - ``undef`` 562 - ``RegState::Undef`` 563 564 * - ``internal`` 565 - ``RegState::InternalRead`` 566 567 * - ``early-clobber`` 568 - ``RegState::EarlyClobber`` 569 570 * - ``debug-use`` 571 - ``RegState::Debug`` 572 573 * - ``renamable`` 574 - ``RegState::Renamable`` 575 576.. _subregister-indices: 577 578Subregister Indices 579~~~~~~~~~~~~~~~~~~~ 580 581The register machine operands can reference a portion of a register by using 582the subregister indices. The example below shows an instance of the ``COPY`` 583pseudo instruction that uses the X86 ``sub_8bit`` subregister index to copy 8 584lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1: 585 586.. code-block:: text 587 588 %1 = COPY %0:sub_8bit 589 590The names of the subregister indices are target specific, and are typically 591defined in the target's ``*RegisterInfo.td`` file. 592 593Constant Pool Indices 594^^^^^^^^^^^^^^^^^^^^^ 595 596A constant pool index (CPI) operand is printed using its index in the 597function's ``MachineConstantPool`` and an offset. 598 599For example, a CPI with the index 1 and offset 8: 600 601.. code-block:: text 602 603 %1:gr64 = MOV64ri %const.1 + 8 604 605For a CPI with the index 0 and offset -12: 606 607.. code-block:: text 608 609 %1:gr64 = MOV64ri %const.0 - 12 610 611A constant pool entry is bound to a LLVM IR ``Constant`` or a target-specific 612``MachineConstantPoolValue``. When serializing all the function's constants the 613following format is used: 614 615.. code-block:: text 616 617 constants: 618 - id: <index> 619 value: <value> 620 alignment: <alignment> 621 isTargetSpecific: <target-specific> 622 623where: 624 - ``<index>`` is a 32-bit unsigned integer; 625 - ``<value>`` is a `LLVM IR Constant 626 <https://www.llvm.org/docs/LangRef.html#constants>`_; 627 - ``<alignment>`` is a 32-bit unsigned integer specified in bytes, and must be 628 power of two; 629 - ``<target-specific>`` is either true or false. 630 631Example: 632 633.. code-block:: text 634 635 constants: 636 - id: 0 637 value: 'double 3.250000e+00' 638 alignment: 8 639 - id: 1 640 value: 'g-(LPC0+8)' 641 alignment: 4 642 isTargetSpecific: true 643 644Global Value Operands 645^^^^^^^^^^^^^^^^^^^^^ 646 647The global value machine operands reference the global values from the 648:ref:`embedded LLVM IR module <embedded-module>`. 649The example below shows an instance of the X86 ``MOV64rm`` instruction that has 650a global value operand named ``G``: 651 652.. code-block:: text 653 654 $rax = MOV64rm $rip, 1, _, @G, _ 655 656The named global values are represented using an identifier with the '@' prefix. 657If the identifier doesn't match the regular expression 658`[-a-zA-Z$._][-a-zA-Z$._0-9]*`, then this identifier must be quoted. 659 660The unnamed global values are represented using an unsigned numeric value with 661the '@' prefix, like in the following examples: ``@0``, ``@989``. 662 663Target-dependent Index Operands 664^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 665 666A target index operand is a target-specific index and an offset. The 667target-specific index is printed using target-specific names and a positive or 668negative offset. 669 670For example, the ``amdgpu-constdata-start`` is associated with the index ``0`` 671in the AMDGPU backend. So if we have a target index operand with the index 0 672and the offset 8: 673 674.. code-block:: text 675 676 $sgpr2 = S_ADD_U32 _, target-index(amdgpu-constdata-start) + 8, implicit-def _, implicit-def _ 677 678Jump-table Index Operands 679^^^^^^^^^^^^^^^^^^^^^^^^^ 680 681A jump-table index operand with the index 0 is printed as following: 682 683.. code-block:: text 684 685 tBR_JTr killed $r0, %jump-table.0 686 687A machine jump-table entry contains a list of ``MachineBasicBlocks``. When serializing all the function's jump-table entries, the following format is used: 688 689.. code-block:: text 690 691 jumpTable: 692 kind: <kind> 693 entries: 694 - id: <index> 695 blocks: [ <bbreference>, <bbreference>, ... ] 696 697where ``<kind>`` is describing how the jump table is represented and emitted (plain address, relocations, PIC, etc.), and each ``<index>`` is a 32-bit unsigned integer and ``blocks`` contains a list of :ref:`machine basic block references <block-references>`. 698 699Example: 700 701.. code-block:: text 702 703 jumpTable: 704 kind: inline 705 entries: 706 - id: 0 707 blocks: [ '%bb.3', '%bb.9', '%bb.4.d3' ] 708 - id: 1 709 blocks: [ '%bb.7', '%bb.7', '%bb.4.d3', '%bb.5' ] 710 711External Symbol Operands 712^^^^^^^^^^^^^^^^^^^^^^^^^ 713 714An external symbol operand is represented using an identifier with the ``&`` 715prefix. The identifier is surrounded with ""'s and escaped if it has any 716special non-printable characters in it. 717 718Example: 719 720.. code-block:: text 721 722 CALL64pcrel32 &__stack_chk_fail, csr_64, implicit $rsp, implicit-def $rsp 723 724MCSymbol Operands 725^^^^^^^^^^^^^^^^^ 726 727A MCSymbol operand is holding a pointer to a ``MCSymbol``. For the limitations 728of this operand in MIR, see :ref:`limitations <limitations>`. 729 730The syntax is: 731 732.. code-block:: text 733 734 EH_LABEL <mcsymbol Ltmp1> 735 736CFIIndex Operands 737^^^^^^^^^^^^^^^^^ 738 739A CFI Index operand is holding an index into a per-function side-table, 740``MachineFunction::getFrameInstructions()``, which references all the frame 741instructions in a ``MachineFunction``. A ``CFI_INSTRUCTION`` may look like it 742contains multiple operands, but the only operand it contains is the CFI Index. 743The other operands are tracked by the ``MCCFIInstruction`` object. 744 745The syntax is: 746 747.. code-block:: text 748 749 CFI_INSTRUCTION offset $w30, -16 750 751which may be emitted later in the MC layer as: 752 753.. code-block:: text 754 755 .cfi_offset w30, -16 756 757IntrinsicID Operands 758^^^^^^^^^^^^^^^^^^^^ 759 760An Intrinsic ID operand contains a generic intrinsic ID or a target-specific ID. 761 762The syntax for the ``returnaddress`` intrinsic is: 763 764.. code-block:: text 765 766 $x0 = COPY intrinsic(@llvm.returnaddress) 767 768Predicate Operands 769^^^^^^^^^^^^^^^^^^ 770 771A Predicate operand contains an IR predicate from ``CmpInst::Predicate``, like 772``ICMP_EQ``, etc. 773 774For an int eq predicate ``ICMP_EQ``, the syntax is: 775 776.. code-block:: text 777 778 %2:gpr(s32) = G_ICMP intpred(eq), %0, %1 779 780.. TODO: Describe the parsers default behaviour when optional YAML attributes 781 are missing. 782.. TODO: Describe the syntax for virtual register YAML definitions. 783.. TODO: Describe the machine function's YAML flag attributes. 784.. TODO: Describe the syntax for the register mask machine operands. 785.. TODO: Describe the frame information YAML mapping. 786.. TODO: Describe the syntax of the stack object machine operands and their 787 YAML definitions. 788.. TODO: Describe the syntax of the block address machine operands. 789.. TODO: Describe the syntax of the metadata machine operands, and the 790 instructions debug location attribute. 791.. TODO: Describe the syntax of the register live out machine operands. 792.. TODO: Describe the syntax of the machine memory operands. 793 794Comments 795^^^^^^^^ 796 797Machine operands can have C/C++ style comments, which are annotations enclosed 798between ``/*`` and ``*/`` to improve readability of e.g. immediate operands. 799In the example below, ARM instructions EOR and BCC and immediate operands 800``14`` and ``0`` have been annotated with their condition codes (CC) 801definitions, i.e. the ``always`` and ``eq`` condition codes: 802 803.. code-block:: text 804 805 dead renamable $r2, $cpsr = tEOR killed renamable $r2, renamable $r1, 14 /* CC::always */, $noreg 806 t2Bcc %bb.4, 0 /* CC:eq */, killed $cpsr 807 808As these annotations are comments, they are ignored by the MI parser. 809Comments can be added or customized by overriding InstrInfo's hook 810``createMIROperandComment()``. 811 812Debug-Info constructs 813--------------------- 814 815Most of the debugging information in a MIR file is to be found in the metadata 816of the embedded module. Within a machine function, that metadata is referred to 817by various constructs to describe source locations and variable locations. 818 819Source locations 820^^^^^^^^^^^^^^^^ 821 822Every MIR instruction may optionally have a trailing reference to a 823``DILocation`` metadata node, after all operands and symbols, but before 824memory operands: 825 826.. code-block:: text 827 828 $rbp = MOV64rr $rdi, debug-location !12 829 830The source location attachment is synonymous with the ``!dbg`` metadata 831attachment in LLVM-IR. The absence of a source location attachment will be 832represented by an empty ``DebugLoc`` object in the machine instruction. 833 834Fixed variable locations 835^^^^^^^^^^^^^^^^^^^^^^^^ 836 837There are several ways of specifying variable locations. The simplest is 838describing a variable that is permanently located on the stack. In the stack 839or fixedStack attribute of the machine function, the variable, scope, and 840any qualifying location modifier are provided: 841 842.. code-block:: text 843 844 - { id: 0, name: offset.addr, offset: -24, size: 8, alignment: 8, stack-id: default, 845 4 debug-info-variable: '!1', debug-info-expression: '!DIExpression()', 846 debug-info-location: '!2' } 847 848Where: 849 850- ``debug-info-variable`` identifies a DILocalVariable metadata node, 851 852- ``debug-info-expression`` adds qualifiers to the variable location, 853 854- ``debug-info-location`` identifies a DILocation metadata node. 855 856These metadata attributes correspond to the operands of a ``llvm.dbg.declare`` 857IR intrinsic, see the :ref:`source level debugging<format_common_intrinsics>` 858documentation. 859 860Varying variable locations 861^^^^^^^^^^^^^^^^^^^^^^^^^^ 862 863Variables that are not always on the stack or change location are specified 864with the ``DBG_VALUE`` meta machine instruction. It is synonymous with the 865``llvm.dbg.value`` IR intrinsic, and is written: 866 867.. code-block:: text 868 869 DBG_VALUE $rax, $noreg, !123, !DIExpression(), debug-location !456 870 871The operands to which respectively: 872 8731. Identifies a machine location such as a register, immediate, or frame index, 874 8752. Is either $noreg, or immediate value zero if an extra level of indirection is to be added to the first operand, 876 8773. Identifies a ``DILocalVariable`` metadata node, 878 8794. Specifies an expression qualifying the variable location, either inline or as a metadata node reference, 880 881While the source location identifies the ``DILocation`` for the scope of the 882variable. The second operand (``IsIndirect``) is deprecated and to be deleted. 883All additional qualifiers for the variable location should be made through the 884expression metadata. 885 886Instruction referencing locations 887^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 888 889This experimental feature aims to separate the specification of variable 890*values* from the program point where a variable takes on that value. Changes 891in variable value occur in the same manner as ``DBG_VALUE`` meta instructions 892but using ``DBG_INSTR_REF``. Variable values are identified by a pair of 893instruction number and operand number. Consider the example below: 894 895.. code-block:: text 896 897 $rbp = MOV64ri 0, debug-instr-number 1, debug-location !12 898 DBG_INSTR_REF 1, 0, !123, !DIExpression(), debug-location !456 899 900Instruction numbers are directly attached to machine instructions with an 901optional ``debug-instr-number`` attachment, before the optional 902``debug-location`` attachment. The value defined in ``$rbp`` in the code 903above would be identified by the pair ``<1, 0>``. 904 905The first two operands of the ``DBG_INSTR_REF`` above record the instruction 906and operand number ``<1, 0>``, identifying the value defined by the ``MOV64ri``. 907The additional operands to ``DBG_INSTR_REF`` are identical to ``DBG_VALUE``, 908and the ``DBG_INSTR_REF`` s position records where the variable takes on the 909designated value in the same way. 910 911More information about how these constructs are used will appear on the source 912level debugging page in due course, see also :doc:`SourceLevelDebugging` and :doc:`HowToUpdateDebugInfo`. 913