• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..08-Nov-2021-

README.mdH A D08-Nov-20217.8 KiB150121

profile.protoH A D08-Nov-20218.6 KiB213195

README.md

1This is a description of the profile.proto format.
2
3# Overview
4
5Profile.proto is a data representation for profile data. It is independent of
6the type of data being collected and the sampling process used to collect that
7data. On disk, it is represented as a gzip-compressed protocol buffer, described
8at src/proto/profile.proto
9
10A profile in this context refers to a collection of samples, each one
11representing measurements performed at a certain point in the life of a job. A
12sample associates a set of measurement values with a list of locations, commonly
13representing the program call stack when the sample was taken.
14
15Tools such as pprof analyze these samples and display this information in
16multiple forms, such as identifying hottest locations, building graphical call
17graphs or trees, etc.
18
19# General structure of a profile
20
21A profile is represented on a Profile message, which contain the following
22fields:
23
24* *sample*: A profile sample, with the values measured and the associated call
25  stack as a list of location ids. Samples with identical call stacks can be
26  merged by adding their respective values, element by element.
27* *location*: A unique place in the program, commonly mapped to a single
28  instruction address. It has a unique nonzero id, to be referenced from the
29  samples. It contains source information in the form of lines, and a mapping id
30  that points to a binary.
31* *function*: A program function as defined in the program source. It has a
32  unique nonzero id, referenced from the location lines. It contains a
33  human-readable name for the function (eg a C++ demangled name), a system name
34  (eg a C++ mangled name), the name of the corresponding source file, and other
35  function attributes.
36* *mapping*: A binary that is part of the program during the profile
37  collection. It has a unique nonzero id, referenced from the locations. It
38  includes details on how the binary was mapped during program execution. By
39  convention the main program binary is the first mapping, followed by any
40  shared libraries.
41* *string_table*: All strings in the profile are represented as indices into
42  this repeating field. The first string is empty, so index == 0 always
43  represents the empty string.
44
45# Measurement values
46
47Measurement values are represented as 64-bit integers. The profile contains an
48explicit description of each value represented, using a ValueType message, with
49two fields:
50
51* *Type*: A human-readable description of the type semantics. For example “cpu”
52  to represent CPU time, “wall” or “time” for wallclock time, or “memory” for
53  bytes allocated.
54* *Unit*: A human-readable name of the unit represented by the 64-bit integer
55  values. For example, it could be “nanoseconds” or “milliseconds” for a time
56  value, or “bytes” or “megabytes” for a memory size. If this is just
57  representing a number of events, the recommended unit name is “count”.
58
59A profile can represent multiple measurements per sample, but all samples must
60have the same number and type of measurements. The actual values are stored in
61the Sample.value fields, each one described by the corresponding
62Profile.sample_type field.
63
64Some profiles have a uniform period that describe the granularity of the data
65collection. For example, a CPU profile may have a period of 100ms, or a memory
66allocation profile may have a period of 512kb. Profiles can optionally describe
67such a value on the Profile.period and Profile.period_type fields. The profile
68period is meant for human consumption and does not affect the interpretation of
69the profiling data.
70
71By convention, the first value on all profiles is the number of samples
72collected at this call stack, with unit “count”. Because the profile does not
73describe the sampling process beyond the optional period, it must include
74unsampled values for all measurements. For example, a CPU profile could have
75value[0] == samples, and value[1] == time in milliseconds.
76
77## Locations, functions and mappings
78
79Each sample lists the id of each location where the sample was collected, in
80bottom-up order. Each location has an explicit unique nonzero integer id,
81independent of its position in the profile, and holds additional information to
82identify the corresponding source.
83
84The profile source is expected to perform any adjustment required to the
85locations in order to point to the calls in the stack. For example, if the
86profile source extracts the call stack by walking back over the program stack,
87it must adjust the instruction addresses to point to the actual call
88instruction, instead of the instruction that each call will return to.
89
90Sources usually generate profiles that fall into these two categories:
91
92* *Unsymbolized profiles*: These only contain instruction addresses, and are to
93  be symbolized by a separate tool. It is critical for each location to point to
94  a valid mapping, which will provide the information required for
95  symbolization. These are used for profiles of compiled languages, such as C++
96  and Go.
97
98* *Symbolized profiles*: These contain all the symbol information available for
99  the profile. Mappings and instruction addresses are optional for symbolized
100  locations. These are used for profiles of interpreted or jitted languages,
101  such as Java or Python.  Also, the profile format allows the generation of
102  mixed profiles, with symbolized and unsymbolized locations.
103
104The symbol information is represented in the repeating lines field of the
105Location message. A location has multiple lines if it reflects multiple program
106sources, for example if representing inlined call stacks. Lines reference
107functions by their unique nonzero id, and the source line number within the
108source file listed by the function. A function contains the source attributes
109for a function, including its name, source file, etc. Functions include both a
110user and a system form of the name, for example to include C++ demangled and
111mangled names. For profiles where only a single name exists, both should be set
112to the same string.
113
114Mappings are also referenced from locations by their unique nonzero id, and
115include all information needed to symbolize addresses within the mapping. It
116includes similar information to the Linux /proc/self/maps file. Locations
117associated to a mapping should have addresses that land between the mapping
118start and limit. Also, if available, mappings should include a build id to
119uniquely identify the version of the binary being used.
120
121## Labels
122
123Samples optionally contain labels, which are annotations to discriminate samples
124with identical locations. For example, a label can be used on a malloc profile
125to indicate allocation size, so two samples on the same call stack with sizes
1262MB and 4MB do not get merged into a single sample with two allocations and a
127size of 6MB.
128
129Labels can be string-based or numeric. They are represented by the Label
130message, with a key identifying the label and either a string or numeric
131value. For numeric labels, the measurement unit can be specified in the profile.
132If no unit is specified and the key is "request" or "alignment",
133then the units are assumed to be "bytes". Otherwise when no unit is specified
134the key will be used as the measurement unit of the numeric value. All tags with
135the same key should have the same unit.
136
137## Keep and drop expressions
138
139Some profile sources may have knowledge of locations that are uninteresting or
140irrelevant. However, if symbolization is needed in order to identify these
141locations, the profile source may not be able to remove them when the profile is
142generated. The profile format provides a mechanism to identify these frames by
143name, through regular expressions.
144
145These expressions must match the function name in its entirety. Frames that
146match Profile.drop\_frames will be dropped from the profile, along with any
147frames below it. Frames that match Profile.keep\_frames will be kept, even if
148they match drop\_frames.
149
150