1349cc55cSDimitry AndricHierarchical Trace Representation (HTR)
2349cc55cSDimitry Andric======================================
3349cc55cSDimitry AndricThe humongous amount of data processor traces like the ones obtained with Intel PT contain is not digestible to humans in its raw form. Given this, it is useful to summarize these massive traces by extracting useful information. Hierarchical Trace Representation (HTR) is the way lldb represents a summarized trace internally. HTR efficiently stores trace data and allows the trace data to be transformed in a way akin to compiler passes.
4349cc55cSDimitry Andric
5349cc55cSDimitry AndricConcepts
6349cc55cSDimitry Andric--------
7349cc55cSDimitry Andric**Block:** One or more contiguous units of the trace. At minimum, the unit of a trace is the load address of an instruction.
8349cc55cSDimitry Andric
9349cc55cSDimitry Andric**Block Metadata:** Metadata associated with each *block*. For processor traces, some metadata examples are the number of instructions in the block or information on what functions are called in the block.
10349cc55cSDimitry Andric
11349cc55cSDimitry Andric**Layer:** The representation of trace data between passes. For Intel PT there are two types of layers:
12349cc55cSDimitry Andric
13349cc55cSDimitry Andric **Instruction Layer:** Composed of the load addresses of the instructions in the trace. In an effort to save space,
14349cc55cSDimitry Andric metadata is only stored for instructions that are of interest, not every instruction in the trace. HTR contains a
15349cc55cSDimitry Andric single instruction layer.
16349cc55cSDimitry Andric
17349cc55cSDimitry Andric **Block Layer:** Composed of blocks - a block in *layer n* refers to a sequence of blocks in *layer n - 1*. A block in
18349cc55cSDimitry Andric *layer 1* refers to a sequence of instructions in *layer 0* (the instruction layer). Metadata is stored for each block in
19349cc55cSDimitry Andric a block layer. HTR contains one or more block layers.
20349cc55cSDimitry Andric
21349cc55cSDimitry Andric**Pass:** A transformation applied to a *layer* that generates a new *layer* that is a more summarized, consolidated representation of the trace data.
22349cc55cSDimitry AndricA pass merges instructions/blocks based on its specific purpose - for example, a pass designed to summarize a processor trace by function calls would merge all the blocks of a function into a single block representing the entire function.
23349cc55cSDimitry Andric
24*5f757f3fSDimitry AndricThe image below illustrates the transformation of a trace's representation (HTR)
25349cc55cSDimitry Andric
26349cc55cSDimitry Andric.. image:: media/htr-example.png
27349cc55cSDimitry Andric
28349cc55cSDimitry Andric
29349cc55cSDimitry AndricPasses
30349cc55cSDimitry Andric------
31349cc55cSDimitry AndricA *pass* is applied to a *layer* to extract useful information (summarization) and compress the trace representation into a new *layer*. The idea is to have a series of passes where each pass specializes in extracting certain information about the trace. Some examples of potential passes include: identifying functions, identifying loops, or a more general purpose such as identifying long sequences of instructions that are repeated (i.e. Basic Super Block). Below you will find a description of each pass currently implemented in lldb.
32349cc55cSDimitry Andric
33349cc55cSDimitry Andric**Basic Super Block Reduction**
34349cc55cSDimitry Andric
35349cc55cSDimitry AndricA “basic super block” is the longest sequence of blocks that always occur in the same order. (The concept is akin to “Basic Block'' in compiler theory, but refers to dynamic occurrences rather than CFG nodes).
36349cc55cSDimitry Andric
37349cc55cSDimitry AndricThe image below shows the "basic super blocks" of the sequence. Each unique "basic super block" is marked with a different color
38349cc55cSDimitry Andric
39349cc55cSDimitry Andric.. image:: media/basic_super_block_pass.png
40349cc55cSDimitry Andric
41349cc55cSDimitry Andric*Procedure to find all super blocks:*
42349cc55cSDimitry Andric
43349cc55cSDimitry Andric- For each block, compute the number of distinct predecessor and successor blocks.
44349cc55cSDimitry Andric
45349cc55cSDimitry Andric - **Predecessor** - the block that occurs directly before (to the left of) the current block
46349cc55cSDimitry Andric - **Successor** - the block that occurs directly after (to the right of) the current block
47349cc55cSDimitry Andric
48349cc55cSDimitry Andric- A block with more than one distinct successor is always the start of a super block, the super block will continue until the next block with more than one distinct predecessor or successor.
49