1================================ 2Source Level Debugging with LLVM 3================================ 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11This document is the central repository for all information pertaining to debug 12information in LLVM. It describes the :ref:`actual format that the LLVM debug 13information takes <format>`, which is useful for those interested in creating 14front-ends or dealing directly with the information. Further, this document 15provides specific examples of what debug information for C/C++ looks like. 16 17Philosophy behind LLVM debugging information 18-------------------------------------------- 19 20The idea of the LLVM debugging information is to capture how the important 21pieces of the source-language's Abstract Syntax Tree map onto LLVM code. 22Several design aspects have shaped the solution that appears here. The 23important ones are: 24 25* Debugging information should have very little impact on the rest of the 26 compiler. No transformations, analyses, or code generators should need to 27 be modified because of debugging information. 28 29* LLVM optimizations should interact in :ref:`well-defined and easily described 30 ways <intro_debugopt>` with the debugging information. 31 32* Because LLVM is designed to support arbitrary programming languages, 33 LLVM-to-LLVM tools should not need to know anything about the semantics of 34 the source-level-language. 35 36* Source-level languages are often **widely** different from one another. 37 LLVM should not put any restrictions of the flavor of the source-language, 38 and the debugging information should work with any language. 39 40* With code generator support, it should be possible to use an LLVM compiler 41 to compile a program to native machine code and standard debugging 42 formats. This allows compatibility with traditional machine-code level 43 debuggers, like GDB or DBX. 44 45The approach used by the LLVM implementation is to use a small set of 46:ref:`intrinsic functions <format_common_intrinsics>` to define a mapping 47between LLVM program objects and the source-level objects. The description of 48the source-level program is maintained in LLVM metadata in an 49:ref:`implementation-defined format <ccxx_frontend>` (the C/C++ front-end 50currently uses working draft 7 of the `DWARF 3 standard 51<http://www.eagercon.com/dwarf/dwarf3std.htm>`_). 52 53When a program is being debugged, a debugger interacts with the user and turns 54the stored debug information into source-language specific information. As 55such, a debugger must be aware of the source-language, and is thus tied to a 56specific language or family of languages. 57 58Debug information consumers 59--------------------------- 60 61The role of debug information is to provide meta information normally stripped 62away during the compilation process. This meta information provides an LLVM 63user a relationship between generated code and the original program source 64code. 65 66Currently, debug information is consumed by DwarfDebug to produce dwarf 67information used by the gdb debugger. Other targets could use the same 68information to produce stabs or other debug forms. 69 70It would also be reasonable to use debug information to feed profiling tools 71for analysis of generated code, or, tools for reconstructing the original 72source from generated code. 73 74TODO - expound a bit more. 75 76.. _intro_debugopt: 77 78Debugging optimized code 79------------------------ 80 81An extremely high priority of LLVM debugging information is to make it interact 82well with optimizations and analysis. In particular, the LLVM debug 83information provides the following guarantees: 84 85* LLVM debug information **always provides information to accurately read 86 the source-level state of the program**, regardless of which LLVM 87 optimizations have been run, and without any modification to the 88 optimizations themselves. However, some optimizations may impact the 89 ability to modify the current state of the program with a debugger, such 90 as setting program variables, or calling functions that have been 91 deleted. 92 93* As desired, LLVM optimizations can be upgraded to be aware of the LLVM 94 debugging information, allowing them to update the debugging information 95 as they perform aggressive optimizations. This means that, with effort, 96 the LLVM optimizers could optimize debug code just as well as non-debug 97 code. 98 99* LLVM debug information does not prevent optimizations from 100 happening (for example inlining, basic block reordering/merging/cleanup, 101 tail duplication, etc). 102 103* LLVM debug information is automatically optimized along with the rest of 104 the program, using existing facilities. For example, duplicate 105 information is automatically merged by the linker, and unused information 106 is automatically removed. 107 108Basically, the debug information allows you to compile a program with 109"``-O0 -g``" and get full debug information, allowing you to arbitrarily modify 110the program as it executes from a debugger. Compiling a program with 111"``-O3 -g``" gives you full debug information that is always available and 112accurate for reading (e.g., you get accurate stack traces despite tail call 113elimination and inlining), but you might lose the ability to modify the program 114and call functions where were optimized out of the program, or inlined away 115completely. 116 117:ref:`LLVM test suite <test-suite-quickstart>` provides a framework to test 118optimizer's handling of debugging information. It can be run like this: 119 120.. code-block:: bash 121 122 % cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level 123 % make TEST=dbgopt 124 125This will test impact of debugging information on optimization passes. If 126debugging information influences optimization passes then it will be reported 127as a failure. See :doc:`TestingGuide` for more information on LLVM test 128infrastructure and how to run various tests. 129 130.. _format: 131 132Debugging information format 133============================ 134 135LLVM debugging information has been carefully designed to make it possible for 136the optimizer to optimize the program and debugging information without 137necessarily having to know anything about debugging information. In 138particular, the use of metadata avoids duplicated debugging information from 139the beginning, and the global dead code elimination pass automatically deletes 140debugging information for a function if it decides to delete the function. 141 142To do this, most of the debugging information (descriptors for types, 143variables, functions, source files, etc) is inserted by the language front-end 144in the form of LLVM metadata. 145 146Debug information is designed to be agnostic about the target debugger and 147debugging information representation (e.g. DWARF/Stabs/etc). It uses a generic 148pass to decode the information that represents variables, types, functions, 149namespaces, etc: this allows for arbitrary source-language semantics and 150type-systems to be used, as long as there is a module written for the target 151debugger to interpret the information. 152 153To provide basic functionality, the LLVM debugger does have to make some 154assumptions about the source-level language being debugged, though it keeps 155these to a minimum. The only common features that the LLVM debugger assumes 156exist are :ref:`source files <format_files>`, and :ref:`program objects 157<format_global_variables>`. These abstract objects are used by a debugger to 158form stack traces, show information about local variables, etc. 159 160This section of the documentation first describes the representation aspects 161common to any source-language. :ref:`ccxx_frontend` describes the data layout 162conventions used by the C and C++ front-ends. 163 164Debug information descriptors 165----------------------------- 166 167In consideration of the complexity and volume of debug information, LLVM 168provides a specification for well formed debug descriptors. 169 170Consumers of LLVM debug information expect the descriptors for program objects 171to start in a canonical format, but the descriptors can include additional 172information appended at the end that is source-language specific. All debugging 173information objects start with a tag to indicate what type of object it is. 174The source-language is allowed to define its own objects, by using unreserved 175tag numbers. We recommend using with tags in the range 0x1000 through 0x2000 176(there is a defined ``enum DW_TAG_user_base = 0x1000``.) 177 178The fields of debug descriptors used internally by LLVM are restricted to only 179the simple data types ``i32``, ``i1``, ``float``, ``double``, ``mdstring`` and 180``mdnode``. 181 182.. code-block:: llvm 183 184 !1 = metadata !{ 185 i32, ;; A tag 186 ... 187 } 188 189Most of the string and integer fields in descriptors are packed into a single, 190null-separated ``mdstring``. The first field of the header is always an 191``i32`` containing the DWARF tag value identifying the content of the 192descriptor. 193 194For clarity of definition in this document, these header fields are described 195below split inside an imaginary ``DIHeader`` construct. This is invalid 196assembly syntax. In valid IR, these fields are stringified and concatenated, 197separated by ``\00``. 198 199The details of the various descriptors follow. 200 201Compile unit descriptors 202^^^^^^^^^^^^^^^^^^^^^^^^ 203 204.. code-block:: llvm 205 206 !0 = metadata !{ 207 DIHeader( 208 i32, ;; Tag = 17 (DW_TAG_compile_unit) 209 i32, ;; DWARF language identifier (ex. DW_LANG_C89) 210 mdstring, ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") 211 i1, ;; True if this is optimized. 212 mdstring, ;; Flags 213 i32, ;; Runtime version 214 mdstring, ;; Split debug filename 215 i32 ;; Debug info emission kind (1 = Full Debug Info, 2 = Line Tables Only) 216 ), 217 metadata, ;; Source directory (including trailing slash) & file pair 218 metadata, ;; List of enums types 219 metadata, ;; List of retained types 220 metadata, ;; List of subprograms 221 metadata, ;; List of global variables 222 metadata ;; List of imported entities 223 } 224 225These descriptors contain a source language ID for the file (we use the DWARF 2263.0 ID numbers, such as ``DW_LANG_C89``, ``DW_LANG_C_plus_plus``, 227``DW_LANG_Cobol74``, etc), a reference to a metadata node containing a pair of 228strings for the source file name and the working directory, as well as an 229identifier string for the compiler that produced it. 230 231Compile unit descriptors provide the root context for objects declared in a 232specific compilation unit. File descriptors are defined using this context. 233These descriptors are collected by a named metadata ``!llvm.dbg.cu``. They 234keep track of subprograms, global variables, type information, and imported 235entities (declarations and namespaces). 236 237.. _format_files: 238 239File descriptors 240^^^^^^^^^^^^^^^^ 241 242.. code-block:: llvm 243 244 !0 = metadata !{ 245 DIHeader( 246 i32 ;; Tag = 41 (DW_TAG_file_type) 247 ), 248 metadata ;; Source directory (including trailing slash) & file pair 249 } 250 251These descriptors contain information for a file. Global variables and top 252level functions would be defined using this context. File descriptors also 253provide context for source line correspondence. 254 255Each input file is encoded as a separate file descriptor in LLVM debugging 256information output. 257 258.. _format_global_variables: 259 260Global variable descriptors 261^^^^^^^^^^^^^^^^^^^^^^^^^^^ 262 263.. code-block:: llvm 264 265 !1 = metadata !{ 266 DIHeader( 267 i32, ;; Tag = 52 (DW_TAG_variable) 268 mdstring, ;; Name 269 mdstring, ;; Display name (fully qualified C++ name) 270 mdstring, ;; MIPS linkage name (for C++) 271 i32, ;; Line number where defined 272 i1, ;; True if the global is local to compile unit (static) 273 i1 ;; True if the global is defined in the compile unit (not extern) 274 ), 275 metadata, ;; Reference to context descriptor 276 metadata, ;; Reference to file where defined 277 metadata, ;; Reference to type descriptor 278 {}*, ;; Reference to the global variable 279 metadata, ;; The static member declaration, if any 280 } 281 282These descriptors provide debug information about global variables. They 283provide details such as name, type and where the variable is defined. All 284global variables are collected inside the named metadata ``!llvm.dbg.cu``. 285 286.. _format_subprograms: 287 288Subprogram descriptors 289^^^^^^^^^^^^^^^^^^^^^^ 290 291.. code-block:: llvm 292 293 !2 = metadata !{ 294 DIHeader( 295 i32, ;; Tag = 46 (DW_TAG_subprogram) 296 mdstring, ;; Name 297 mdstring, ;; Display name (fully qualified C++ name) 298 mdstring, ;; MIPS linkage name (for C++) 299 i32, ;; Line number where defined 300 i1, ;; True if the global is local to compile unit (static) 301 i1, ;; True if the global is defined in the compile unit (not extern) 302 i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual 303 i32, ;; Index into a virtual function 304 i32, ;; Flags - Artificial, Private, Protected, Explicit, Prototyped. 305 i1, ;; isOptimized 306 i32 ;; Line number where the scope of the subprogram begins 307 ), 308 metadata, ;; Source directory (including trailing slash) & file pair 309 metadata, ;; Reference to context descriptor 310 metadata, ;; Reference to type descriptor 311 metadata, ;; indicates which base type contains the vtable pointer for the 312 ;; derived class 313 {}*, ;; Reference to the LLVM function 314 metadata, ;; Lists function template parameters 315 metadata, ;; Function declaration descriptor 316 metadata ;; List of function variables 317 } 318 319These descriptors provide debug information about functions, methods and 320subprograms. They provide details such as name, return types and the source 321location where the subprogram is defined. 322 323Block descriptors 324^^^^^^^^^^^^^^^^^ 325 326.. code-block:: llvm 327 328 !3 = metadata !{ 329 DIHeader( 330 i32, ;; Tag = 11 (DW_TAG_lexical_block) 331 i32, ;; Line number 332 i32, ;; Column number 333 i32 ;; Unique ID to identify blocks from a template function 334 ), 335 metadata, ;; Source directory (including trailing slash) & file pair 336 metadata ;; Reference to context descriptor 337 } 338 339This descriptor provides debug information about nested blocks within a 340subprogram. The line number and column numbers are used to dinstinguish two 341lexical blocks at same depth. 342 343.. code-block:: llvm 344 345 !3 = metadata !{ 346 DIHeader( 347 i32, ;; Tag = 11 (DW_TAG_lexical_block) 348 i32 ;; DWARF path discriminator value 349 ), 350 metadata, ;; Source directory (including trailing slash) & file pair 351 metadata ;; Reference to the scope we're annotating with a file change 352 } 353 354This descriptor provides a wrapper around a lexical scope to handle file 355changes in the middle of a lexical block. 356 357.. _format_basic_type: 358 359Basic type descriptors 360^^^^^^^^^^^^^^^^^^^^^^ 361 362.. code-block:: llvm 363 364 !4 = metadata !{ 365 DIHeader( 366 i32, ;; Tag = 36 (DW_TAG_base_type) 367 mdstring, ;; Name (may be "" for anonymous types) 368 i32, ;; Line number where defined (may be 0) 369 i64, ;; Size in bits 370 i64, ;; Alignment in bits 371 i64, ;; Offset in bits 372 i32, ;; Flags 373 i32 ;; DWARF type encoding 374 ), 375 metadata, ;; Source directory (including trailing slash) & file pair (may be null) 376 metadata ;; Reference to context 377 } 378 379These descriptors define primitive types used in the code. Example ``int``, 380``bool`` and ``float``. The context provides the scope of the type, which is 381usually the top level. Since basic types are not usually user defined the 382context and line number can be left as NULL and 0. The size, alignment and 383offset are expressed in bits and can be 64 bit values. The alignment is used 384to round the offset when embedded in a :ref:`composite type 385<format_composite_type>` (example to keep float doubles on 64 bit boundaries). 386The offset is the bit offset if embedded in a :ref:`composite type 387<format_composite_type>`. 388 389The type encoding provides the details of the type. The values are typically 390one of the following: 391 392.. code-block:: llvm 393 394 DW_ATE_address = 1 395 DW_ATE_boolean = 2 396 DW_ATE_float = 4 397 DW_ATE_signed = 5 398 DW_ATE_signed_char = 6 399 DW_ATE_unsigned = 7 400 DW_ATE_unsigned_char = 8 401 402.. _format_derived_type: 403 404Derived type descriptors 405^^^^^^^^^^^^^^^^^^^^^^^^ 406 407.. code-block:: llvm 408 409 !5 = metadata !{ 410 DIHeader( 411 i32, ;; Tag (see below) 412 mdstring, ;; Name (may be "" for anonymous types) 413 i32, ;; Line number where defined (may be 0) 414 i64, ;; Size in bits 415 i64, ;; Alignment in bits 416 i64, ;; Offset in bits 417 i32 ;; Flags to encode attributes, e.g. private 418 ), 419 metadata, ;; Source directory (including trailing slash) & file pair (may be null) 420 metadata, ;; Reference to context 421 metadata, ;; Reference to type derived from 422 metadata ;; (optional) Objective C property node 423 } 424 425These descriptors are used to define types derived from other types. The value 426of the tag varies depending on the meaning. The following are possible tag 427values: 428 429.. code-block:: llvm 430 431 DW_TAG_formal_parameter = 5 432 DW_TAG_member = 13 433 DW_TAG_pointer_type = 15 434 DW_TAG_reference_type = 16 435 DW_TAG_typedef = 22 436 DW_TAG_ptr_to_member_type = 31 437 DW_TAG_const_type = 38 438 DW_TAG_volatile_type = 53 439 DW_TAG_restrict_type = 55 440 441``DW_TAG_member`` is used to define a member of a :ref:`composite type 442<format_composite_type>` or :ref:`subprogram <format_subprograms>`. The type 443of the member is the :ref:`derived type <format_derived_type>`. 444``DW_TAG_formal_parameter`` is used to define a member which is a formal 445argument of a subprogram. 446 447``DW_TAG_typedef`` is used to provide a name for the derived type. 448 449``DW_TAG_pointer_type``, ``DW_TAG_reference_type``, ``DW_TAG_const_type``, 450``DW_TAG_volatile_type`` and ``DW_TAG_restrict_type`` are used to qualify the 451:ref:`derived type <format_derived_type>`. 452 453:ref:`Derived type <format_derived_type>` location can be determined from the 454context and line number. The size, alignment and offset are expressed in bits 455and can be 64 bit values. The alignment is used to round the offset when 456embedded in a :ref:`composite type <format_composite_type>` (example to keep 457float doubles on 64 bit boundaries.) The offset is the bit offset if embedded 458in a :ref:`composite type <format_composite_type>`. 459 460Note that the ``void *`` type is expressed as a type derived from NULL. 461 462.. _format_composite_type: 463 464Composite type descriptors 465^^^^^^^^^^^^^^^^^^^^^^^^^^ 466 467.. code-block:: llvm 468 469 !6 = metadata !{ 470 DIHeader( 471 i32, ;; Tag (see below) 472 mdstring, ;; Name (may be "" for anonymous types) 473 i32, ;; Line number where defined (may be 0) 474 i64, ;; Size in bits 475 i64, ;; Alignment in bits 476 i64, ;; Offset in bits 477 i32, ;; Flags 478 i32 ;; Runtime languages 479 ), 480 metadata, ;; Source directory (including trailing slash) & file pair (may be null) 481 metadata, ;; Reference to context 482 metadata, ;; Reference to type derived from 483 metadata, ;; Reference to array of member descriptors 484 metadata, ;; Base type containing the vtable pointer for this type 485 metadata, ;; Template parameters 486 mdstring ;; A unique identifier for type uniquing purpose (may be null) 487 } 488 489These descriptors are used to define types that are composed of 0 or more 490elements. The value of the tag varies depending on the meaning. The following 491are possible tag values: 492 493.. code-block:: llvm 494 495 DW_TAG_array_type = 1 496 DW_TAG_enumeration_type = 4 497 DW_TAG_structure_type = 19 498 DW_TAG_union_type = 23 499 DW_TAG_subroutine_type = 21 500 DW_TAG_inheritance = 28 501 502The vector flag indicates that an array type is a native packed vector. 503 504The members of array types (tag = ``DW_TAG_array_type``) are 505:ref:`subrange descriptors <format_subrange>`, each 506representing the range of subscripts at that level of indexing. 507 508The members of enumeration types (tag = ``DW_TAG_enumeration_type``) are 509:ref:`enumerator descriptors <format_enumerator>`, each representing the 510definition of enumeration value for the set. All enumeration type descriptors 511are collected inside the named metadata ``!llvm.dbg.cu``. 512 513The members of structure (tag = ``DW_TAG_structure_type``) or union (tag = 514``DW_TAG_union_type``) types are any one of the :ref:`basic 515<format_basic_type>`, :ref:`derived <format_derived_type>` or :ref:`composite 516<format_composite_type>` type descriptors, each representing a field member of 517the structure or union. 518 519For C++ classes (tag = ``DW_TAG_structure_type``), member descriptors provide 520information about base classes, static members and member functions. If a 521member is a :ref:`derived type descriptor <format_derived_type>` and has a tag 522of ``DW_TAG_inheritance``, then the type represents a base class. If the member 523of is a :ref:`global variable descriptor <format_global_variables>` then it 524represents a static member. And, if the member is a :ref:`subprogram 525descriptor <format_subprograms>` then it represents a member function. For 526static members and member functions, ``getName()`` returns the members link or 527the C++ mangled name. ``getDisplayName()`` the simplied version of the name. 528 529The first member of subroutine (tag = ``DW_TAG_subroutine_type``) type elements 530is the return type for the subroutine. The remaining elements are the formal 531arguments to the subroutine. 532 533:ref:`Composite type <format_composite_type>` location can be determined from 534the context and line number. The size, alignment and offset are expressed in 535bits and can be 64 bit values. The alignment is used to round the offset when 536embedded in a :ref:`composite type <format_composite_type>` (as an example, to 537keep float doubles on 64 bit boundaries). The offset is the bit offset if 538embedded in a :ref:`composite type <format_composite_type>`. 539 540.. _format_subrange: 541 542Subrange descriptors 543^^^^^^^^^^^^^^^^^^^^ 544 545.. code-block:: llvm 546 547 !42 = metadata !{ 548 DIHeader( 549 i32, ;; Tag = 33 (DW_TAG_subrange_type) 550 i64, ;; Low value 551 i64 ;; High value 552 ) 553 } 554 555These descriptors are used to define ranges of array subscripts for an array 556:ref:`composite type <format_composite_type>`. The low value defines the lower 557bounds typically zero for C/C++. The high value is the upper bounds. Values 558are 64 bit. ``High - Low + 1`` is the size of the array. If ``Low > High`` 559the array bounds are not included in generated debugging information. 560 561.. _format_enumerator: 562 563Enumerator descriptors 564^^^^^^^^^^^^^^^^^^^^^^ 565 566.. code-block:: llvm 567 568 !6 = metadata !{ 569 DIHeader( 570 i32, ;; Tag = 40 (DW_TAG_enumerator) 571 mdstring, ;; Name 572 i64 ;; Value 573 ) 574 } 575 576These descriptors are used to define members of an enumeration :ref:`composite 577type <format_composite_type>`, it associates the name to the value. 578 579Local variables 580^^^^^^^^^^^^^^^ 581 582.. code-block:: llvm 583 584 !7 = metadata !{ 585 DIHeader( 586 i32, ;; Tag (see below) 587 mdstring, ;; Name 588 i32, ;; 24 bit - Line number where defined 589 ;; 8 bit - Argument number. 1 indicates 1st argument. 590 i32 ;; flags 591 ), 592 metadata, ;; Context 593 metadata, ;; Reference to file where defined 594 metadata, ;; Reference to the type descriptor 595 metadata ;; (optional) Reference to inline location 596 } 597 598These descriptors are used to define variables local to a sub program. The 599value of the tag depends on the usage of the variable: 600 601.. code-block:: llvm 602 603 DW_TAG_auto_variable = 256 604 DW_TAG_arg_variable = 257 605 606An auto variable is any variable declared in the body of the function. An 607argument variable is any variable that appears as a formal argument to the 608function. 609 610The context is either the subprogram or block where the variable is defined. 611Name the source variable name. Context and line indicate where the variable 612was defined. Type descriptor defines the declared type of the variable. 613 614Complex Expressions 615^^^^^^^^^^^^^^^^^^^ 616.. code-block:: llvm 617 618 !8 = metadata !{ 619 i32, ;; DW_TAG_expression 620 ... 621 } 622 623Complex expressions describe variable storage locations in terms of 624prefix-notated DWARF expressions. Currently the only supported 625operators are ``DW_OP_plus``, ``DW_OP_deref``, and ``DW_OP_piece``. 626 627The ``DW_OP_piece`` operator is used for (typically larger aggregate) 628variables that are fragmented across several locations. It takes two 629i32 arguments, an offset and a size in bytes to describe which piece 630of the variable is at this location. 631 632 633.. _format_common_intrinsics: 634 635Debugger intrinsic functions 636^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 637 638LLVM uses several intrinsic functions (name prefixed with "``llvm.dbg``") to 639provide debug information at various points in generated code. 640 641``llvm.dbg.declare`` 642^^^^^^^^^^^^^^^^^^^^ 643 644.. code-block:: llvm 645 646 void %llvm.dbg.declare(metadata, metadata) 647 648This intrinsic provides information about a local element (e.g., variable). 649The first argument is metadata holding the alloca for the variable. The second 650argument is metadata containing a description of the variable. 651 652``llvm.dbg.value`` 653^^^^^^^^^^^^^^^^^^ 654 655.. code-block:: llvm 656 657 void %llvm.dbg.value(metadata, i64, metadata) 658 659This intrinsic provides information when a user source variable is set to a new 660value. The first argument is the new value (wrapped as metadata). The second 661argument is the offset in the user source variable where the new value is 662written. The third argument is metadata containing a description of the user 663source variable. 664 665Object lifetimes and scoping 666============================ 667 668In many languages, the local variables in functions can have their lifetimes or 669scopes limited to a subset of a function. In the C family of languages, for 670example, variables are only live (readable and writable) within the source 671block that they are defined in. In functional languages, values are only 672readable after they have been defined. Though this is a very obvious concept, 673it is non-trivial to model in LLVM, because it has no notion of scoping in this 674sense, and does not want to be tied to a language's scoping rules. 675 676In order to handle this, the LLVM debug format uses the metadata attached to 677llvm instructions to encode line number and scoping information. Consider the 678following C fragment, for example: 679 680.. code-block:: c 681 682 1. void foo() { 683 2. int X = 21; 684 3. int Y = 22; 685 4. { 686 5. int Z = 23; 687 6. Z = X; 688 7. } 689 8. X = Y; 690 9. } 691 692Compiled to LLVM, this function would be represented like this: 693 694.. code-block:: llvm 695 696 define void @foo() #0 { 697 entry: 698 %X = alloca i32, align 4 699 %Y = alloca i32, align 4 700 %Z = alloca i32, align 4 701 call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12 702 ; [debug line = 2:7] [debug variable = X] 703 store i32 21, i32* %X, align 4, !dbg !12 704 call void @llvm.dbg.declare(metadata !{i32* %Y}, metadata !13), !dbg !14 705 ; [debug line = 3:7] [debug variable = Y] 706 store i32 22, i32* %Y, align 4, !dbg !14 707 call void @llvm.dbg.declare(metadata !{i32* %Z}, metadata !15), !dbg !17 708 ; [debug line = 5:9] [debug variable = Z] 709 store i32 23, i32* %Z, align 4, !dbg !17 710 %0 = load i32* %X, align 4, !dbg !18 711 [debug line = 6:5] 712 store i32 %0, i32* %Z, align 4, !dbg !18 713 %1 = load i32* %Y, align 4, !dbg !19 714 [debug line = 8:3] 715 store i32 %1, i32* %X, align 4, !dbg !19 716 ret void, !dbg !20 717 } 718 719 ; Function Attrs: nounwind readnone 720 declare void @llvm.dbg.declare(metadata, metadata) #1 721 722 attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" 723 "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" 724 "no-infs-fp-math"="false" "no-nans-fp-math"="false" 725 "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" 726 "use-soft-float"="false" } 727 attributes #1 = { nounwind readnone } 728 729 !llvm.dbg.cu = !{!0} 730 !llvm.module.flags = !{!8} 731 !llvm.ident = !{!9} 732 733 !0 = metadata !{i32 786449, metadata !1, i32 12, 734 metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)", 735 i1 false, metadata !"", i32 0, metadata !2, metadata !2, metadata !3, 736 metadata !2, metadata !2, metadata !""} ; [ DW_TAG_compile_unit ] \ 737 [/private/tmp/foo.c] \ 738 [DW_LANG_C99] 739 !1 = metadata !{metadata !"t.c", metadata !"/private/tmp"} 740 !2 = metadata !{i32 0} 741 !3 = metadata !{metadata !4} 742 !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo", 743 metadata !"foo", metadata !"", i32 1, metadata !6, 744 i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false, 745 void ()* @foo, null, null, metadata !2, i32 1} 746 ; [ DW_TAG_subprogram ] [line 1] [def] [foo] 747 !5 = metadata !{i32 786473, metadata !1} ; [ DW_TAG_file_type ] \ 748 [/private/tmp/t.c] 749 !6 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0, 750 i64 0, i32 0, null, metadata !7, i32 0, null, null, null} 751 ; [ DW_TAG_subroutine_type ] \ 752 [line 0, size 0, align 0, offset 0] [from ] 753 !7 = metadata !{null} 754 !8 = metadata !{i32 2, metadata !"Dwarf Version", i32 2} 755 !9 = metadata !{metadata !"clang version 3.4 (trunk 193128) (llvm/trunk 193139)"} 756 !10 = metadata !{i32 786688, metadata !4, metadata !"X", metadata !5, i32 2, 757 metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [X] \ 758 [line 2] 759 !11 = metadata !{i32 786468, null, null, metadata !"int", i32 0, i64 32, 760 i64 32, i64 0, i32 0, i32 5} ; [ DW_TAG_base_type ] [int] \ 761 [line 0, size 32, align 32, offset 0, enc DW_ATE_signed] 762 !12 = metadata !{i32 2, i32 0, metadata !4, null} 763 !13 = metadata !{i32 786688, metadata !4, metadata !"Y", metadata !5, i32 3, 764 metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Y] \ 765 [line 3] 766 !14 = metadata !{i32 3, i32 0, metadata !4, null} 767 !15 = metadata !{i32 786688, metadata !16, metadata !"Z", metadata !5, i32 5, 768 metadata !11, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [Z] \ 769 [line 5] 770 !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0} \ 771 ; [ DW_TAG_lexical_block ] [/private/tmp/t.c] 772 !17 = metadata !{i32 5, i32 0, metadata !16, null} 773 !18 = metadata !{i32 6, i32 0, metadata !16, null} 774 !19 = metadata !{i32 8, i32 0, metadata !4, null} ; [ DW_TAG_imported_declaration ] 775 !20 = metadata !{i32 9, i32 0, metadata !4, null} 776 777This example illustrates a few important details about LLVM debugging 778information. In particular, it shows how the ``llvm.dbg.declare`` intrinsic and 779location information, which are attached to an instruction, are applied 780together to allow a debugger to analyze the relationship between statements, 781variable definitions, and the code used to implement the function. 782 783.. code-block:: llvm 784 785 call void @llvm.dbg.declare(metadata !{i32* %X}, metadata !10), !dbg !12 786 ; [debug line = 2:7] [debug variable = X] 787 788The first intrinsic ``%llvm.dbg.declare`` encodes debugging information for the 789variable ``X``. The metadata ``!dbg !12`` attached to the intrinsic provides 790scope information for the variable ``X``. 791 792.. code-block:: llvm 793 794 !12 = metadata !{i32 2, i32 0, metadata !4, null} 795 !4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo", 796 metadata !"foo", metadata !"", i32 1, metadata !6, 797 i1 false, i1 true, i32 0, i32 0, null, i32 0, i1 false, 798 void ()* @foo, null, null, metadata !2, i32 1} 799 ; [ DW_TAG_subprogram ] [line 1] [def] [foo] 800 801Here ``!12`` is metadata providing location information. It has four fields: 802line number, column number, scope, and original scope. The original scope 803represents inline location if this instruction is inlined inside a caller, and 804is null otherwise. In this example, scope is encoded by ``!4``, a 805:ref:`subprogram descriptor <format_subprograms>`. This way the location 806information attached to the intrinsics indicates that the variable ``X`` is 807declared at line number 2 at a function level scope in function ``foo``. 808 809Now lets take another example. 810 811.. code-block:: llvm 812 813 call void @llvm.dbg.declare(metadata !{i32* %Z}, metadata !15), !dbg !17 814 ; [debug line = 5:9] [debug variable = Z] 815 816The third intrinsic ``%llvm.dbg.declare`` encodes debugging information for 817variable ``Z``. The metadata ``!dbg !17`` attached to the intrinsic provides 818scope information for the variable ``Z``. 819 820.. code-block:: llvm 821 822 !16 = metadata !{i32 786443, metadata !1, metadata !4, i32 4, i32 0, i32 0} \ 823 ; [ DW_TAG_lexical_block ] [/private/tmp/t.c] 824 !17 = metadata !{i32 5, i32 0, metadata !16, null} 825 826Here ``!15`` indicates that ``Z`` is declared at line number 5 and 827column number 0 inside of lexical scope ``!16``. The lexical scope itself 828resides inside of subprogram ``!4`` described above. 829 830The scope information attached with each instruction provides a straightforward 831way to find instructions covered by a scope. 832 833.. _ccxx_frontend: 834 835C/C++ front-end specific debug information 836========================================== 837 838The C and C++ front-ends represent information about the program in a format 839that is effectively identical to `DWARF 3.0 840<http://www.eagercon.com/dwarf/dwarf3std.htm>`_ in terms of information 841content. This allows code generators to trivially support native debuggers by 842generating standard dwarf information, and contains enough information for 843non-dwarf targets to translate it as needed. 844 845This section describes the forms used to represent C and C++ programs. Other 846languages could pattern themselves after this (which itself is tuned to 847representing programs in the same way that DWARF 3 does), or they could choose 848to provide completely different forms if they don't fit into the DWARF model. 849As support for debugging information gets added to the various LLVM 850source-language front-ends, the information used should be documented here. 851 852The following sections provide examples of a few C/C++ constructs and the debug 853information that would best describe those constructs. The canonical 854references are the ``DIDescriptor`` classes defined in 855``include/llvm/IR/DebugInfo.h`` and the implementations of the helper functions 856in ``lib/IR/DIBuilder.cpp``. 857 858C/C++ source file information 859----------------------------- 860 861``llvm::Instruction`` provides easy access to metadata attached with an 862instruction. One can extract line number information encoded in LLVM IR using 863``Instruction::getMetadata()`` and ``DILocation::getLineNumber()``. 864 865.. code-block:: c++ 866 867 if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction 868 DILocation Loc(N); // DILocation is in DebugInfo.h 869 unsigned Line = Loc.getLineNumber(); 870 StringRef File = Loc.getFilename(); 871 StringRef Dir = Loc.getDirectory(); 872 } 873 874C/C++ global variable information 875--------------------------------- 876 877Given an integer global variable declared as follows: 878 879.. code-block:: c 880 881 int MyGlobal = 100; 882 883a C/C++ front-end would generate the following descriptors: 884 885.. code-block:: llvm 886 887 ;; 888 ;; Define the global itself. 889 ;; 890 @MyGlobal = global i32 100, align 4 891 ... 892 ;; 893 ;; List of debug info of globals 894 ;; 895 !llvm.dbg.cu = !{!0} 896 897 ;; Define the compile unit. 898 !0 = metadata !{ 899 ; Header( 900 ; i32 17, ;; Tag 901 ; i32 0, ;; Context 902 ; i32 4, ;; Language 903 ; metadata !"clang version 3.6.0 ", ;; Producer 904 ; i1 false, ;; "isOptimized"? 905 ; metadata !"", ;; Flags 906 ; i32 0, ;; Runtime Version 907 ; "", ;; Split debug filename 908 ; 1 ;; Full debug info 909 ; ) 910 metadata !"0x11\0012\00clang version 3.6.0 \000\00\000\00\001", 911 metadata !1, ;; File 912 metadata !2, ;; Enum Types 913 metadata !2, ;; Retained Types 914 metadata !2, ;; Subprograms 915 metadata !3, ;; Global Variables 916 metadata !2 ;; Imported entities 917 } ; [ DW_TAG_compile_unit ] 918 919 ;; The file/directory pair. 920 !1 = metadata !{ 921 metadata !"foo.c", ;; Filename 922 metadata !"/Users/dexonsmith/data/llvm/debug-info" ;; Directory 923 } 924 925 ;; An empty array. 926 !2 = metadata !{} 927 928 ;; The Array of Global Variables 929 !3 = metadata !{ 930 metadata !4 931 } 932 933 ;; 934 ;; Define the global variable itself. 935 ;; 936 !4 = metadata !{ 937 ; Header( 938 ; i32 52, ;; Tag 939 ; metadata !"MyGlobal", ;; Name 940 ; metadata !"MyGlobal", ;; Display Name 941 ; metadata !"", ;; Linkage Name 942 ; i32 1, ;; Line 943 ; i32 0, ;; IsLocalToUnit 944 ; i32 1 ;; IsDefinition 945 ; ) 946 metadata !"0x34\00MyGlobal\00MyGlobal\00\001\000\001", 947 null, ;; Unused 948 metadata !5, ;; File 949 metadata !6, ;; Type 950 i32* @MyGlobal, ;; LLVM-IR Value 951 null ;; Static member declaration 952 } ; [ DW_TAG_variable ] 953 954 ;; 955 ;; Define the file 956 ;; 957 !5 = metadata !{ 958 ; Header( 959 ; i32 41 ;; Tag 960 ; ) 961 metadata !"0x29", 962 metadata !1 ;; File/directory pair 963 } ; [ DW_TAG_file_type ] 964 965 ;; 966 ;; Define the type 967 ;; 968 !6 = metadata !{ 969 ; Header( 970 ; i32 36, ;; Tag 971 ; metadata !"int", ;; Name 972 ; i32 0, ;; Line 973 ; i64 32, ;; Size in Bits 974 ; i64 32, ;; Align in Bits 975 ; i64 0, ;; Offset 976 ; i32 0, ;; Flags 977 ; i32 5 ;; Encoding 978 ; ) 979 metadata !"0x24\00int\000\0032\0032\000\000\005", 980 null, ;; Unused 981 null ;; Unused 982 } ; [ DW_TAG_base_type ] 983 984C/C++ function information 985-------------------------- 986 987Given a function declared as follows: 988 989.. code-block:: c 990 991 int main(int argc, char *argv[]) { 992 return 0; 993 } 994 995a C/C++ front-end would generate the following descriptors: 996 997.. code-block:: llvm 998 999 ;; 1000 ;; Define the anchor for subprograms. 1001 ;; 1002 !6 = metadata !{ 1003 ; Header( 1004 ; i32 46, ;; Tag 1005 ; metadata !"main", ;; Name 1006 ; metadata !"main", ;; Display name 1007 ; metadata !"", ;; Linkage name 1008 ; i32 1, ;; Line number 1009 ; i1 false, ;; Is local 1010 ; i1 true, ;; Is definition 1011 ; i32 0, ;; Virtuality attribute, e.g. pure virtual function 1012 ; i32 0, ;; Index into virtual table for C++ methods 1013 ; i32 256, ;; Flags 1014 ; i1 0, ;; True if this function is optimized 1015 ; 1 ;; Line number of the opening '{' of the function 1016 ; ) 1017 metadata !"0x2e\00main\00main\00\001\000\001\000\000\00256\000\001", 1018 metadata !1, ;; File 1019 metadata !5, ;; Context 1020 metadata !6, ;; Type 1021 null, ;; Containing type 1022 i32 (i32, i8**)* @main, ;; Pointer to llvm::Function 1023 null, ;; Function template parameters 1024 null, ;; Function declaration 1025 metadata !2 ;; List of function variables (emitted when optimizing) 1026 } 1027 1028 ;; 1029 ;; Define the subprogram itself. 1030 ;; 1031 define i32 @main(i32 %argc, i8** %argv) { 1032 ... 1033 } 1034 1035Debugging information format 1036============================ 1037 1038Debugging Information Extension for Objective C Properties 1039---------------------------------------------------------- 1040 1041Introduction 1042^^^^^^^^^^^^ 1043 1044Objective C provides a simpler way to declare and define accessor methods using 1045declared properties. The language provides features to declare a property and 1046to let compiler synthesize accessor methods. 1047 1048The debugger lets developer inspect Objective C interfaces and their instance 1049variables and class variables. However, the debugger does not know anything 1050about the properties defined in Objective C interfaces. The debugger consumes 1051information generated by compiler in DWARF format. The format does not support 1052encoding of Objective C properties. This proposal describes DWARF extensions to 1053encode Objective C properties, which the debugger can use to let developers 1054inspect Objective C properties. 1055 1056Proposal 1057^^^^^^^^ 1058 1059Objective C properties exist separately from class members. A property can be 1060defined only by "setter" and "getter" selectors, and be calculated anew on each 1061access. Or a property can just be a direct access to some declared ivar. 1062Finally it can have an ivar "automatically synthesized" for it by the compiler, 1063in which case the property can be referred to in user code directly using the 1064standard C dereference syntax as well as through the property "dot" syntax, but 1065there is no entry in the ``@interface`` declaration corresponding to this ivar. 1066 1067To facilitate debugging, these properties we will add a new DWARF TAG into the 1068``DW_TAG_structure_type`` definition for the class to hold the description of a 1069given property, and a set of DWARF attributes that provide said description. 1070The property tag will also contain the name and declared type of the property. 1071 1072If there is a related ivar, there will also be a DWARF property attribute placed 1073in the ``DW_TAG_member`` DIE for that ivar referring back to the property TAG 1074for that property. And in the case where the compiler synthesizes the ivar 1075directly, the compiler is expected to generate a ``DW_TAG_member`` for that 1076ivar (with the ``DW_AT_artificial`` set to 1), whose name will be the name used 1077to access this ivar directly in code, and with the property attribute pointing 1078back to the property it is backing. 1079 1080The following examples will serve as illustration for our discussion: 1081 1082.. code-block:: objc 1083 1084 @interface I1 { 1085 int n2; 1086 } 1087 1088 @property int p1; 1089 @property int p2; 1090 @end 1091 1092 @implementation I1 1093 @synthesize p1; 1094 @synthesize p2 = n2; 1095 @end 1096 1097This produces the following DWARF (this is a "pseudo dwarfdump" output): 1098 1099.. code-block:: none 1100 1101 0x00000100: TAG_structure_type [7] * 1102 AT_APPLE_runtime_class( 0x10 ) 1103 AT_name( "I1" ) 1104 AT_decl_file( "Objc_Property.m" ) 1105 AT_decl_line( 3 ) 1106 1107 0x00000110 TAG_APPLE_property 1108 AT_name ( "p1" ) 1109 AT_type ( {0x00000150} ( int ) ) 1110 1111 0x00000120: TAG_APPLE_property 1112 AT_name ( "p2" ) 1113 AT_type ( {0x00000150} ( int ) ) 1114 1115 0x00000130: TAG_member [8] 1116 AT_name( "_p1" ) 1117 AT_APPLE_property ( {0x00000110} "p1" ) 1118 AT_type( {0x00000150} ( int ) ) 1119 AT_artificial ( 0x1 ) 1120 1121 0x00000140: TAG_member [8] 1122 AT_name( "n2" ) 1123 AT_APPLE_property ( {0x00000120} "p2" ) 1124 AT_type( {0x00000150} ( int ) ) 1125 1126 0x00000150: AT_type( ( int ) ) 1127 1128Note, the current convention is that the name of the ivar for an 1129auto-synthesized property is the name of the property from which it derives 1130with an underscore prepended, as is shown in the example. But we actually 1131don't need to know this convention, since we are given the name of the ivar 1132directly. 1133 1134Also, it is common practice in ObjC to have different property declarations in 1135the @interface and @implementation - e.g. to provide a read-only property in 1136the interface,and a read-write interface in the implementation. In that case, 1137the compiler should emit whichever property declaration will be in force in the 1138current translation unit. 1139 1140Developers can decorate a property with attributes which are encoded using 1141``DW_AT_APPLE_property_attribute``. 1142 1143.. code-block:: objc 1144 1145 @property (readonly, nonatomic) int pr; 1146 1147.. code-block:: none 1148 1149 TAG_APPLE_property [8] 1150 AT_name( "pr" ) 1151 AT_type ( {0x00000147} (int) ) 1152 AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) 1153 1154The setter and getter method names are attached to the property using 1155``DW_AT_APPLE_property_setter`` and ``DW_AT_APPLE_property_getter`` attributes. 1156 1157.. code-block:: objc 1158 1159 @interface I1 1160 @property (setter=myOwnP3Setter:) int p3; 1161 -(void)myOwnP3Setter:(int)a; 1162 @end 1163 1164 @implementation I1 1165 @synthesize p3; 1166 -(void)myOwnP3Setter:(int)a{ } 1167 @end 1168 1169The DWARF for this would be: 1170 1171.. code-block:: none 1172 1173 0x000003bd: TAG_structure_type [7] * 1174 AT_APPLE_runtime_class( 0x10 ) 1175 AT_name( "I1" ) 1176 AT_decl_file( "Objc_Property.m" ) 1177 AT_decl_line( 3 ) 1178 1179 0x000003cd TAG_APPLE_property 1180 AT_name ( "p3" ) 1181 AT_APPLE_property_setter ( "myOwnP3Setter:" ) 1182 AT_type( {0x00000147} ( int ) ) 1183 1184 0x000003f3: TAG_member [8] 1185 AT_name( "_p3" ) 1186 AT_type ( {0x00000147} ( int ) ) 1187 AT_APPLE_property ( {0x000003cd} ) 1188 AT_artificial ( 0x1 ) 1189 1190New DWARF Tags 1191^^^^^^^^^^^^^^ 1192 1193+-----------------------+--------+ 1194| TAG | Value | 1195+=======================+========+ 1196| DW_TAG_APPLE_property | 0x4200 | 1197+-----------------------+--------+ 1198 1199New DWARF Attributes 1200^^^^^^^^^^^^^^^^^^^^ 1201 1202+--------------------------------+--------+-----------+ 1203| Attribute | Value | Classes | 1204+================================+========+===========+ 1205| DW_AT_APPLE_property | 0x3fed | Reference | 1206+--------------------------------+--------+-----------+ 1207| DW_AT_APPLE_property_getter | 0x3fe9 | String | 1208+--------------------------------+--------+-----------+ 1209| DW_AT_APPLE_property_setter | 0x3fea | String | 1210+--------------------------------+--------+-----------+ 1211| DW_AT_APPLE_property_attribute | 0x3feb | Constant | 1212+--------------------------------+--------+-----------+ 1213 1214New DWARF Constants 1215^^^^^^^^^^^^^^^^^^^ 1216 1217+--------------------------------------+-------+ 1218| Name | Value | 1219+======================================+=======+ 1220| DW_APPLE_PROPERTY_readonly | 0x01 | 1221+--------------------------------------+-------+ 1222| DW_APPLE_PROPERTY_getter | 0x02 | 1223+--------------------------------------+-------+ 1224| DW_APPLE_PROPERTY_assign | 0x04 | 1225+--------------------------------------+-------+ 1226| DW_APPLE_PROPERTY_readwrite | 0x08 | 1227+--------------------------------------+-------+ 1228| DW_APPLE_PROPERTY_retain | 0x10 | 1229+--------------------------------------+-------+ 1230| DW_APPLE_PROPERTY_copy | 0x20 | 1231+--------------------------------------+-------+ 1232| DW_APPLE_PROPERTY_nonatomic | 0x40 | 1233+--------------------------------------+-------+ 1234| DW_APPLE_PROPERTY_setter | 0x80 | 1235+--------------------------------------+-------+ 1236| DW_APPLE_PROPERTY_atomic | 0x100 | 1237+--------------------------------------+-------+ 1238| DW_APPLE_PROPERTY_weak | 0x200 | 1239+--------------------------------------+-------+ 1240| DW_APPLE_PROPERTY_strong | 0x400 | 1241+--------------------------------------+-------+ 1242| DW_APPLE_PROPERTY_unsafe_unretained | 0x800 | 1243+--------------------------------+-----+-------+ 1244 1245Name Accelerator Tables 1246----------------------- 1247 1248Introduction 1249^^^^^^^^^^^^ 1250 1251The "``.debug_pubnames``" and "``.debug_pubtypes``" formats are not what a 1252debugger needs. The "``pub``" in the section name indicates that the entries 1253in the table are publicly visible names only. This means no static or hidden 1254functions show up in the "``.debug_pubnames``". No static variables or private 1255class variables are in the "``.debug_pubtypes``". Many compilers add different 1256things to these tables, so we can't rely upon the contents between gcc, icc, or 1257clang. 1258 1259The typical query given by users tends not to match up with the contents of 1260these tables. For example, the DWARF spec states that "In the case of the name 1261of a function member or static data member of a C++ structure, class or union, 1262the name presented in the "``.debug_pubnames``" section is not the simple name 1263given by the ``DW_AT_name attribute`` of the referenced debugging information 1264entry, but rather the fully qualified name of the data or function member." 1265So the only names in these tables for complex C++ entries is a fully 1266qualified name. Debugger users tend not to enter their search strings as 1267"``a::b::c(int,const Foo&) const``", but rather as "``c``", "``b::c``" , or 1268"``a::b::c``". So the name entered in the name table must be demangled in 1269order to chop it up appropriately and additional names must be manually entered 1270into the table to make it effective as a name lookup table for debuggers to 1271se. 1272 1273All debuggers currently ignore the "``.debug_pubnames``" table as a result of 1274its inconsistent and useless public-only name content making it a waste of 1275space in the object file. These tables, when they are written to disk, are not 1276sorted in any way, leaving every debugger to do its own parsing and sorting. 1277These tables also include an inlined copy of the string values in the table 1278itself making the tables much larger than they need to be on disk, especially 1279for large C++ programs. 1280 1281Can't we just fix the sections by adding all of the names we need to this 1282table? No, because that is not what the tables are defined to contain and we 1283won't know the difference between the old bad tables and the new good tables. 1284At best we could make our own renamed sections that contain all of the data we 1285need. 1286 1287These tables are also insufficient for what a debugger like LLDB needs. LLDB 1288uses clang for its expression parsing where LLDB acts as a PCH. LLDB is then 1289often asked to look for type "``foo``" or namespace "``bar``", or list items in 1290namespace "``baz``". Namespaces are not included in the pubnames or pubtypes 1291tables. Since clang asks a lot of questions when it is parsing an expression, 1292we need to be very fast when looking up names, as it happens a lot. Having new 1293accelerator tables that are optimized for very quick lookups will benefit this 1294type of debugging experience greatly. 1295 1296We would like to generate name lookup tables that can be mapped into memory 1297from disk, and used as is, with little or no up-front parsing. We would also 1298be able to control the exact content of these different tables so they contain 1299exactly what we need. The Name Accelerator Tables were designed to fix these 1300issues. In order to solve these issues we need to: 1301 1302* Have a format that can be mapped into memory from disk and used as is 1303* Lookups should be very fast 1304* Extensible table format so these tables can be made by many producers 1305* Contain all of the names needed for typical lookups out of the box 1306* Strict rules for the contents of tables 1307 1308Table size is important and the accelerator table format should allow the reuse 1309of strings from common string tables so the strings for the names are not 1310duplicated. We also want to make sure the table is ready to be used as-is by 1311simply mapping the table into memory with minimal header parsing. 1312 1313The name lookups need to be fast and optimized for the kinds of lookups that 1314debuggers tend to do. Optimally we would like to touch as few parts of the 1315mapped table as possible when doing a name lookup and be able to quickly find 1316the name entry we are looking for, or discover there are no matches. In the 1317case of debuggers we optimized for lookups that fail most of the time. 1318 1319Each table that is defined should have strict rules on exactly what is in the 1320accelerator tables and documented so clients can rely on the content. 1321 1322Hash Tables 1323^^^^^^^^^^^ 1324 1325Standard Hash Tables 1326"""""""""""""""""""" 1327 1328Typical hash tables have a header, buckets, and each bucket points to the 1329bucket contents: 1330 1331.. code-block:: none 1332 1333 .------------. 1334 | HEADER | 1335 |------------| 1336 | BUCKETS | 1337 |------------| 1338 | DATA | 1339 `------------' 1340 1341The BUCKETS are an array of offsets to DATA for each hash: 1342 1343.. code-block:: none 1344 1345 .------------. 1346 | 0x00001000 | BUCKETS[0] 1347 | 0x00002000 | BUCKETS[1] 1348 | 0x00002200 | BUCKETS[2] 1349 | 0x000034f0 | BUCKETS[3] 1350 | | ... 1351 | 0xXXXXXXXX | BUCKETS[n_buckets] 1352 '------------' 1353 1354So for ``bucket[3]`` in the example above, we have an offset into the table 13550x000034f0 which points to a chain of entries for the bucket. Each bucket must 1356contain a next pointer, full 32 bit hash value, the string itself, and the data 1357for the current string value. 1358 1359.. code-block:: none 1360 1361 .------------. 1362 0x000034f0: | 0x00003500 | next pointer 1363 | 0x12345678 | 32 bit hash 1364 | "erase" | string value 1365 | data[n] | HashData for this bucket 1366 |------------| 1367 0x00003500: | 0x00003550 | next pointer 1368 | 0x29273623 | 32 bit hash 1369 | "dump" | string value 1370 | data[n] | HashData for this bucket 1371 |------------| 1372 0x00003550: | 0x00000000 | next pointer 1373 | 0x82638293 | 32 bit hash 1374 | "main" | string value 1375 | data[n] | HashData for this bucket 1376 `------------' 1377 1378The problem with this layout for debuggers is that we need to optimize for the 1379negative lookup case where the symbol we're searching for is not present. So 1380if we were to lookup "``printf``" in the table above, we would make a 32 hash 1381for "``printf``", it might match ``bucket[3]``. We would need to go to the 1382offset 0x000034f0 and start looking to see if our 32 bit hash matches. To do 1383so, we need to read the next pointer, then read the hash, compare it, and skip 1384to the next bucket. Each time we are skipping many bytes in memory and 1385touching new cache pages just to do the compare on the full 32 bit hash. All 1386of these accesses then tell us that we didn't have a match. 1387 1388Name Hash Tables 1389"""""""""""""""" 1390 1391To solve the issues mentioned above we have structured the hash tables a bit 1392differently: a header, buckets, an array of all unique 32 bit hash values, 1393followed by an array of hash value data offsets, one for each hash value, then 1394the data for all hash values: 1395 1396.. code-block:: none 1397 1398 .-------------. 1399 | HEADER | 1400 |-------------| 1401 | BUCKETS | 1402 |-------------| 1403 | HASHES | 1404 |-------------| 1405 | OFFSETS | 1406 |-------------| 1407 | DATA | 1408 `-------------' 1409 1410The ``BUCKETS`` in the name tables are an index into the ``HASHES`` array. By 1411making all of the full 32 bit hash values contiguous in memory, we allow 1412ourselves to efficiently check for a match while touching as little memory as 1413possible. Most often checking the 32 bit hash values is as far as the lookup 1414goes. If it does match, it usually is a match with no collisions. So for a 1415table with "``n_buckets``" buckets, and "``n_hashes``" unique 32 bit hash 1416values, we can clarify the contents of the ``BUCKETS``, ``HASHES`` and 1417``OFFSETS`` as: 1418 1419.. code-block:: none 1420 1421 .-------------------------. 1422 | HEADER.magic | uint32_t 1423 | HEADER.version | uint16_t 1424 | HEADER.hash_function | uint16_t 1425 | HEADER.bucket_count | uint32_t 1426 | HEADER.hashes_count | uint32_t 1427 | HEADER.header_data_len | uint32_t 1428 | HEADER_DATA | HeaderData 1429 |-------------------------| 1430 | BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes 1431 |-------------------------| 1432 | HASHES | uint32_t[n_hashes] // 32 bit hash values 1433 |-------------------------| 1434 | OFFSETS | uint32_t[n_hashes] // 32 bit offsets to hash value data 1435 |-------------------------| 1436 | ALL HASH DATA | 1437 `-------------------------' 1438 1439So taking the exact same data from the standard hash example above we end up 1440with: 1441 1442.. code-block:: none 1443 1444 .------------. 1445 | HEADER | 1446 |------------| 1447 | 0 | BUCKETS[0] 1448 | 2 | BUCKETS[1] 1449 | 5 | BUCKETS[2] 1450 | 6 | BUCKETS[3] 1451 | | ... 1452 | ... | BUCKETS[n_buckets] 1453 |------------| 1454 | 0x........ | HASHES[0] 1455 | 0x........ | HASHES[1] 1456 | 0x........ | HASHES[2] 1457 | 0x........ | HASHES[3] 1458 | 0x........ | HASHES[4] 1459 | 0x........ | HASHES[5] 1460 | 0x12345678 | HASHES[6] hash for BUCKETS[3] 1461 | 0x29273623 | HASHES[7] hash for BUCKETS[3] 1462 | 0x82638293 | HASHES[8] hash for BUCKETS[3] 1463 | 0x........ | HASHES[9] 1464 | 0x........ | HASHES[10] 1465 | 0x........ | HASHES[11] 1466 | 0x........ | HASHES[12] 1467 | 0x........ | HASHES[13] 1468 | 0x........ | HASHES[n_hashes] 1469 |------------| 1470 | 0x........ | OFFSETS[0] 1471 | 0x........ | OFFSETS[1] 1472 | 0x........ | OFFSETS[2] 1473 | 0x........ | OFFSETS[3] 1474 | 0x........ | OFFSETS[4] 1475 | 0x........ | OFFSETS[5] 1476 | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] 1477 | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] 1478 | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] 1479 | 0x........ | OFFSETS[9] 1480 | 0x........ | OFFSETS[10] 1481 | 0x........ | OFFSETS[11] 1482 | 0x........ | OFFSETS[12] 1483 | 0x........ | OFFSETS[13] 1484 | 0x........ | OFFSETS[n_hashes] 1485 |------------| 1486 | | 1487 | | 1488 | | 1489 | | 1490 | | 1491 |------------| 1492 0x000034f0: | 0x00001203 | .debug_str ("erase") 1493 | 0x00000004 | A 32 bit array count - number of HashData with name "erase" 1494 | 0x........ | HashData[0] 1495 | 0x........ | HashData[1] 1496 | 0x........ | HashData[2] 1497 | 0x........ | HashData[3] 1498 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1499 |------------| 1500 0x00003500: | 0x00001203 | String offset into .debug_str ("collision") 1501 | 0x00000002 | A 32 bit array count - number of HashData with name "collision" 1502 | 0x........ | HashData[0] 1503 | 0x........ | HashData[1] 1504 | 0x00001203 | String offset into .debug_str ("dump") 1505 | 0x00000003 | A 32 bit array count - number of HashData with name "dump" 1506 | 0x........ | HashData[0] 1507 | 0x........ | HashData[1] 1508 | 0x........ | HashData[2] 1509 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1510 |------------| 1511 0x00003550: | 0x00001203 | String offset into .debug_str ("main") 1512 | 0x00000009 | A 32 bit array count - number of HashData with name "main" 1513 | 0x........ | HashData[0] 1514 | 0x........ | HashData[1] 1515 | 0x........ | HashData[2] 1516 | 0x........ | HashData[3] 1517 | 0x........ | HashData[4] 1518 | 0x........ | HashData[5] 1519 | 0x........ | HashData[6] 1520 | 0x........ | HashData[7] 1521 | 0x........ | HashData[8] 1522 | 0x00000000 | String offset into .debug_str (terminate data for hash) 1523 `------------' 1524 1525So we still have all of the same data, we just organize it more efficiently for 1526debugger lookup. If we repeat the same "``printf``" lookup from above, we 1527would hash "``printf``" and find it matches ``BUCKETS[3]`` by taking the 32 bit 1528hash value and modulo it by ``n_buckets``. ``BUCKETS[3]`` contains "6" which 1529is the index into the ``HASHES`` table. We would then compare any consecutive 153032 bit hashes values in the ``HASHES`` array as long as the hashes would be in 1531``BUCKETS[3]``. We do this by verifying that each subsequent hash value modulo 1532``n_buckets`` is still 3. In the case of a failed lookup we would access the 1533memory for ``BUCKETS[3]``, and then compare a few consecutive 32 bit hashes 1534before we know that we have no match. We don't end up marching through 1535multiple words of memory and we really keep the number of processor data cache 1536lines being accessed as small as possible. 1537 1538The string hash that is used for these lookup tables is the Daniel J. 1539Bernstein hash which is also used in the ELF ``GNU_HASH`` sections. It is a 1540very good hash for all kinds of names in programs with very few hash 1541collisions. 1542 1543Empty buckets are designated by using an invalid hash index of ``UINT32_MAX``. 1544 1545Details 1546^^^^^^^ 1547 1548These name hash tables are designed to be generic where specializations of the 1549table get to define additional data that goes into the header ("``HeaderData``"), 1550how the string value is stored ("``KeyType``") and the content of the data for each 1551hash value. 1552 1553Header Layout 1554""""""""""""" 1555 1556The header has a fixed part, and the specialized part. The exact format of the 1557header is: 1558 1559.. code-block:: c 1560 1561 struct Header 1562 { 1563 uint32_t magic; // 'HASH' magic value to allow endian detection 1564 uint16_t version; // Version number 1565 uint16_t hash_function; // The hash function enumeration that was used 1566 uint32_t bucket_count; // The number of buckets in this hash table 1567 uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table 1568 uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment 1569 // Specifically the length of the following HeaderData field - this does not 1570 // include the size of the preceding fields 1571 HeaderData header_data; // Implementation specific header data 1572 }; 1573 1574The header starts with a 32 bit "``magic``" value which must be ``'HASH'`` 1575encoded as an ASCII integer. This allows the detection of the start of the 1576hash table and also allows the table's byte order to be determined so the table 1577can be correctly extracted. The "``magic``" value is followed by a 16 bit 1578``version`` number which allows the table to be revised and modified in the 1579future. The current version number is 1. ``hash_function`` is a ``uint16_t`` 1580enumeration that specifies which hash function was used to produce this table. 1581The current values for the hash function enumerations include: 1582 1583.. code-block:: c 1584 1585 enum HashFunctionType 1586 { 1587 eHashFunctionDJB = 0u, // Daniel J Bernstein hash function 1588 }; 1589 1590``bucket_count`` is a 32 bit unsigned integer that represents how many buckets 1591are in the ``BUCKETS`` array. ``hashes_count`` is the number of unique 32 bit 1592hash values that are in the ``HASHES`` array, and is the same number of offsets 1593are contained in the ``OFFSETS`` array. ``header_data_len`` specifies the size 1594in bytes of the ``HeaderData`` that is filled in by specialized versions of 1595this table. 1596 1597Fixed Lookup 1598"""""""""""" 1599 1600The header is followed by the buckets, hashes, offsets, and hash value data. 1601 1602.. code-block:: c 1603 1604 struct FixedTable 1605 { 1606 uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below 1607 uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table 1608 uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above 1609 }; 1610 1611``buckets`` is an array of 32 bit indexes into the ``hashes`` array. The 1612``hashes`` array contains all of the 32 bit hash values for all names in the 1613hash table. Each hash in the ``hashes`` table has an offset in the ``offsets`` 1614array that points to the data for the hash value. 1615 1616This table setup makes it very easy to repurpose these tables to contain 1617different data, while keeping the lookup mechanism the same for all tables. 1618This layout also makes it possible to save the table to disk and map it in 1619later and do very efficient name lookups with little or no parsing. 1620 1621DWARF lookup tables can be implemented in a variety of ways and can store a lot 1622of information for each name. We want to make the DWARF tables extensible and 1623able to store the data efficiently so we have used some of the DWARF features 1624that enable efficient data storage to define exactly what kind of data we store 1625for each name. 1626 1627The ``HeaderData`` contains a definition of the contents of each HashData chunk. 1628We might want to store an offset to all of the debug information entries (DIEs) 1629for each name. To keep things extensible, we create a list of items, or 1630Atoms, that are contained in the data for each name. First comes the type of 1631the data in each atom: 1632 1633.. code-block:: c 1634 1635 enum AtomType 1636 { 1637 eAtomTypeNULL = 0u, 1638 eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding 1639 eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question 1640 eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 1641 eAtomTypeNameFlags = 4u, // Flags from enum NameFlags 1642 eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags 1643 }; 1644 1645The enumeration values and their meanings are: 1646 1647.. code-block:: none 1648 1649 eAtomTypeNULL - a termination atom that specifies the end of the atom list 1650 eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name 1651 eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE 1652 eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is 1653 eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) 1654 eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) 1655 1656Then we allow each atom type to define the atom type and how the data for each 1657atom type data is encoded: 1658 1659.. code-block:: c 1660 1661 struct Atom 1662 { 1663 uint16_t type; // AtomType enum value 1664 uint16_t form; // DWARF DW_FORM_XXX defines 1665 }; 1666 1667The ``form`` type above is from the DWARF specification and defines the exact 1668encoding of the data for the Atom type. See the DWARF specification for the 1669``DW_FORM_`` definitions. 1670 1671.. code-block:: c 1672 1673 struct HeaderData 1674 { 1675 uint32_t die_offset_base; 1676 uint32_t atom_count; 1677 Atoms atoms[atom_count0]; 1678 }; 1679 1680``HeaderData`` defines the base DIE offset that should be added to any atoms 1681that are encoded using the ``DW_FORM_ref1``, ``DW_FORM_ref2``, 1682``DW_FORM_ref4``, ``DW_FORM_ref8`` or ``DW_FORM_ref_udata``. It also defines 1683what is contained in each ``HashData`` object -- ``Atom.form`` tells us how large 1684each field will be in the ``HashData`` and the ``Atom.type`` tells us how this data 1685should be interpreted. 1686 1687For the current implementations of the "``.apple_names``" (all functions + 1688globals), the "``.apple_types``" (names of all types that are defined), and 1689the "``.apple_namespaces``" (all namespaces), we currently set the ``Atom`` 1690array to be: 1691 1692.. code-block:: c 1693 1694 HeaderData.atom_count = 1; 1695 HeaderData.atoms[0].type = eAtomTypeDIEOffset; 1696 HeaderData.atoms[0].form = DW_FORM_data4; 1697 1698This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is 1699encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have 1700multiple matching DIEs in a single file, which could come up with an inlined 1701function for instance. Future tables could include more information about the 1702DIE such as flags indicating if the DIE is a function, method, block, 1703or inlined. 1704 1705The KeyType for the DWARF table is a 32 bit string table offset into the 1706".debug_str" table. The ".debug_str" is the string table for the DWARF which 1707may already contain copies of all of the strings. This helps make sure, with 1708help from the compiler, that we reuse the strings between all of the DWARF 1709sections and keeps the hash table size down. Another benefit to having the 1710compiler generate all strings as DW_FORM_strp in the debug info, is that 1711DWARF parsing can be made much faster. 1712 1713After a lookup is made, we get an offset into the hash data. The hash data 1714needs to be able to deal with 32 bit hash collisions, so the chunk of data 1715at the offset in the hash data consists of a triple: 1716 1717.. code-block:: c 1718 1719 uint32_t str_offset 1720 uint32_t hash_data_count 1721 HashData[hash_data_count] 1722 1723If "str_offset" is zero, then the bucket contents are done. 99.9% of the 1724hash data chunks contain a single item (no 32 bit hash collision): 1725 1726.. code-block:: none 1727 1728 .------------. 1729 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 1730 | 0x00000004 | uint32_t HashData count 1731 | 0x........ | uint32_t HashData[0] DIE offset 1732 | 0x........ | uint32_t HashData[1] DIE offset 1733 | 0x........ | uint32_t HashData[2] DIE offset 1734 | 0x........ | uint32_t HashData[3] DIE offset 1735 | 0x00000000 | uint32_t KeyType (end of hash chain) 1736 `------------' 1737 1738If there are collisions, you will have multiple valid string offsets: 1739 1740.. code-block:: none 1741 1742 .------------. 1743 | 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") 1744 | 0x00000004 | uint32_t HashData count 1745 | 0x........ | uint32_t HashData[0] DIE offset 1746 | 0x........ | uint32_t HashData[1] DIE offset 1747 | 0x........ | uint32_t HashData[2] DIE offset 1748 | 0x........ | uint32_t HashData[3] DIE offset 1749 | 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") 1750 | 0x00000002 | uint32_t HashData count 1751 | 0x........ | uint32_t HashData[0] DIE offset 1752 | 0x........ | uint32_t HashData[1] DIE offset 1753 | 0x00000000 | uint32_t KeyType (end of hash chain) 1754 `------------' 1755 1756Current testing with real world C++ binaries has shown that there is around 1 175732 bit hash collision per 100,000 name entries. 1758 1759Contents 1760^^^^^^^^ 1761 1762As we said, we want to strictly define exactly what is included in the 1763different tables. For DWARF, we have 3 tables: "``.apple_names``", 1764"``.apple_types``", and "``.apple_namespaces``". 1765 1766"``.apple_names``" sections should contain an entry for each DWARF DIE whose 1767``DW_TAG`` is a ``DW_TAG_label``, ``DW_TAG_inlined_subroutine``, or 1768``DW_TAG_subprogram`` that has address attributes: ``DW_AT_low_pc``, 1769``DW_AT_high_pc``, ``DW_AT_ranges`` or ``DW_AT_entry_pc``. It also contains 1770``DW_TAG_variable`` DIEs that have a ``DW_OP_addr`` in the location (global and 1771static variables). All global and static variables should be included, 1772including those scoped within functions and classes. For example using the 1773following code: 1774 1775.. code-block:: c 1776 1777 static int var = 0; 1778 1779 void f () 1780 { 1781 static int var = 0; 1782 } 1783 1784Both of the static ``var`` variables would be included in the table. All 1785functions should emit both their full names and their basenames. For C or C++, 1786the full name is the mangled name (if available) which is usually in the 1787``DW_AT_MIPS_linkage_name`` attribute, and the ``DW_AT_name`` contains the 1788function basename. If global or static variables have a mangled name in a 1789``DW_AT_MIPS_linkage_name`` attribute, this should be emitted along with the 1790simple name found in the ``DW_AT_name`` attribute. 1791 1792"``.apple_types``" sections should contain an entry for each DWARF DIE whose 1793tag is one of: 1794 1795* DW_TAG_array_type 1796* DW_TAG_class_type 1797* DW_TAG_enumeration_type 1798* DW_TAG_pointer_type 1799* DW_TAG_reference_type 1800* DW_TAG_string_type 1801* DW_TAG_structure_type 1802* DW_TAG_subroutine_type 1803* DW_TAG_typedef 1804* DW_TAG_union_type 1805* DW_TAG_ptr_to_member_type 1806* DW_TAG_set_type 1807* DW_TAG_subrange_type 1808* DW_TAG_base_type 1809* DW_TAG_const_type 1810* DW_TAG_constant 1811* DW_TAG_file_type 1812* DW_TAG_namelist 1813* DW_TAG_packed_type 1814* DW_TAG_volatile_type 1815* DW_TAG_restrict_type 1816* DW_TAG_interface_type 1817* DW_TAG_unspecified_type 1818* DW_TAG_shared_type 1819 1820Only entries with a ``DW_AT_name`` attribute are included, and the entry must 1821not be a forward declaration (``DW_AT_declaration`` attribute with a non-zero 1822value). For example, using the following code: 1823 1824.. code-block:: c 1825 1826 int main () 1827 { 1828 int *b = 0; 1829 return *b; 1830 } 1831 1832We get a few type DIEs: 1833 1834.. code-block:: none 1835 1836 0x00000067: TAG_base_type [5] 1837 AT_encoding( DW_ATE_signed ) 1838 AT_name( "int" ) 1839 AT_byte_size( 0x04 ) 1840 1841 0x0000006e: TAG_pointer_type [6] 1842 AT_type( {0x00000067} ( int ) ) 1843 AT_byte_size( 0x08 ) 1844 1845The DW_TAG_pointer_type is not included because it does not have a ``DW_AT_name``. 1846 1847"``.apple_namespaces``" section should contain all ``DW_TAG_namespace`` DIEs. 1848If we run into a namespace that has no name this is an anonymous namespace, and 1849the name should be output as "``(anonymous namespace)``" (without the quotes). 1850Why? This matches the output of the ``abi::cxa_demangle()`` that is in the 1851standard C++ library that demangles mangled names. 1852 1853 1854Language Extensions and File Format Changes 1855^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1856 1857Objective-C Extensions 1858"""""""""""""""""""""" 1859 1860"``.apple_objc``" section should contain all ``DW_TAG_subprogram`` DIEs for an 1861Objective-C class. The name used in the hash table is the name of the 1862Objective-C class itself. If the Objective-C class has a category, then an 1863entry is made for both the class name without the category, and for the class 1864name with the category. So if we have a DIE at offset 0x1234 with a name of 1865method "``-[NSString(my_additions) stringWithSpecialString:]``", we would add 1866an entry for "``NSString``" that points to DIE 0x1234, and an entry for 1867"``NSString(my_additions)``" that points to 0x1234. This allows us to quickly 1868track down all Objective-C methods for an Objective-C class when doing 1869expressions. It is needed because of the dynamic nature of Objective-C where 1870anyone can add methods to a class. The DWARF for Objective-C methods is also 1871emitted differently from C++ classes where the methods are not usually 1872contained in the class definition, they are scattered about across one or more 1873compile units. Categories can also be defined in different shared libraries. 1874So we need to be able to quickly find all of the methods and class functions 1875given the Objective-C class name, or quickly find all methods and class 1876functions for a class + category name. This table does not contain any 1877selector names, it just maps Objective-C class names (or class names + 1878category) to all of the methods and class functions. The selectors are added 1879as function basenames in the "``.debug_names``" section. 1880 1881In the "``.apple_names``" section for Objective-C functions, the full name is 1882the entire function name with the brackets ("``-[NSString 1883stringWithCString:]``") and the basename is the selector only 1884("``stringWithCString:``"). 1885 1886Mach-O Changes 1887"""""""""""""" 1888 1889The sections names for the apple hash tables are for non-mach-o files. For 1890mach-o files, the sections should be contained in the ``__DWARF`` segment with 1891names as follows: 1892 1893* "``.apple_names``" -> "``__apple_names``" 1894* "``.apple_types``" -> "``__apple_types``" 1895* "``.apple_namespaces``" -> "``__apple_namespac``" (16 character limit) 1896* "``.apple_objc``" -> "``__apple_objc``" 1897 1898