1# Native heap profiler
2
3NOTE: **heapprofd requires Android 10 or higher**
4
5Heapprofd is a tool that tracks native heap allocations & deallocations of an
6Android process within a given time period. The resulting profile can be used to
7attribute memory usage to particular call-stacks, supporting a mix of both
8native and java code. The tool can be used by Android platform and app
9developers to investigate memory issues.
10
11On debug Android builds, you can profile all apps and most system services.
12On "user" builds, you can only use it on apps with the debuggable or
13profileable manifest flag.
14
15## Quickstart
16
17See the [Memory Guide](/docs/case-studies/memory.md#heapprofd) for getting
18started with heapprofd.
19
20## UI
21
22Dumps from heapprofd are shown as flamegraphs in the UI after clicking on the
23diamond. Each diamond corresponds to a snapshot of the allocations and
24callstacks collected at that point in time.
25
26![heapprofd snapshots in the UI tracks](/docs/images/profile-diamond.png)
27
28![heapprofd flamegraph](/docs/images/native-flamegraph.png)
29
30## SQL
31
32Information about callstacks is written to the following tables:
33
34* [`stack_profile_mapping`](/docs/analysis/sql-tables.autogen#stack_profile_mapping)
35* [`stack_profile_frame`](/docs/analysis/sql-tables.autogen#stack_profile_frame)
36* [`stack_profile_callsite`](/docs/analysis/sql-tables.autogen#stack_profile_callsite)
37
38The allocations themselves are written to
39[`heap_profile_allocation`](/docs/analysis/sql-tables.autogen#heap_profile_allocation).
40
41Offline symbolization data is stored in
42[`stack_profile_symbol`](/docs/analysis/sql-tables.autogen#stack_profile_symbol).
43
44See [Example Queries](#heapprofd-example-queries) for example SQL queries.
45
46## Recording
47
48Heapprofd can be configured and started in three ways.
49
50#### Manual configuration
51
52This requires manually setting the
53[HeapprofdConfig](/docs/reference/trace-config-proto.autogen#HeapprofdConfig)
54section of the trace config. The only benefit of doing so is that in this way
55heap profiling can be enabled alongside any other tracing data sources.
56
57#### Using the tools/heap_profile script (recommended)
58
59You can use the `tools/heap_profile` script. If you are having trouble
60make sure you are using the
61[latest version](
62https://raw.githubusercontent.com/google/perfetto/master/tools/heap_profile).
63
64You can target processes either by name (`-n com.example.myapp`) or by PID
65(`-p 1234`). In the first case, the heap profile will be initiated on both on
66already-running processes that match the package name and new processes launched
67after the profiling session is started.
68For the full arguments list see the
69[heap_profile cmdline reference page](/docs/reference/heap_profile-cli).
70
71#### Using the Recording page of Perfetto UI
72
73You can also use the [Perfetto UI](https://ui.perfetto.dev/#!/record?p=memory)
74to record heapprofd profiles. Tick "Heap profiling" in the trace configuration,
75enter the processes you want to target, click "Add Device" to pair your phone,
76and record profiles straight from your browser. This is also possible on
77Windows.
78
79## Viewing the data
80
81The resulting profile proto contains four views on the data
82
83* **space**: how many bytes were allocated but not freed at this callstack the
84  moment the dump was created.
85* **alloc\_space**: how many bytes were allocated (including ones freed at the
86  moment of the dump) at this callstack
87* **objects**: how many allocations without matching frees were done at this
88  callstack.
89* **alloc\_objects**: how many allocations (including ones with matching frees)
90  were done at this callstack.
91
92_(Googlers: You can also open the gzipped protos using http://pprof/)_
93
94TIP: you might want to put `libart.so` as a "Hide regex" when profiling apps.
95
96You can use the [Perfetto UI](https://ui.perfetto.dev) to visualize heap dumps.
97Upload the `raw-trace` file in your output directory. You will see all heap
98dumps as diamonds on the timeline, click any of them to get a flamegraph.
99
100Alternatively [Speedscope](https://speedscope.app) can be used to visualize
101the gzipped protos, but will only show the space view.
102
103TIP: Click Left Heavy on the top left for a good visualization.
104
105## Sampling interval
106
107Heapprofd samples heap allocations by hooking calls to malloc/free and C++'s
108operator new/delete. Given a sampling interval of n bytes, one allocation is
109sampled, on average, every n bytes allocated. This allows to reduce the
110performance impact on the target process. The default sampling rate
111is 4096 bytes.
112
113The easiest way to reason about this is to imagine the memory allocations as a
114stream of one byte allocations. From this stream, every byte has a 1/n
115probability of being selected as a sample, and the corresponding callstack
116gets attributed the complete n bytes. For more accuracy, allocations larger than
117the sampling interval bypass the sampling logic and are recorded with their true
118size.
119
120## Startup profiling
121
122When specifying a target process name (as opposite to the PID), new processes
123matching that name are profiled from their startup. The resulting profile will
124contain all allocations done between the start of the process and the end
125of the profiling session.
126
127On Android, Java apps are usually not exec()-ed from scratch, but fork()-ed from
128the [zygote], which then specializes into the desired app. If the app's name
129matches a name specified in the profiling session, profiling will be enabled as
130part of the zygote specialization. The resulting profile contains all
131allocations done between that point in zygote specialization and the end of the
132profiling session. Some allocations done early in the specialization process are
133not accounted for.
134
135At the trace proto level, the resulting [ProfilePacket] will have the
136`from_startup` field set to true in the corresponding `ProcessHeapSamples`
137message. This is not surfaced in the converted pprof compatible proto.
138
139[ProfilePacket]: /docs/reference/trace-packet-proto.autogen#ProfilePacket
140[zygote]: https://developer.android.com/topic/performance/memory-overview#SharingRAM
141
142## Runtime profiling
143
144When a profiling session is started, all matching processes (by name or PID)
145are enumerated and are signalled to request profiling. Profiling isn't actually
146enabled until a few hundred milliseconds after the next allocation that is
147done by the application. If the application is idle when profiling is
148requested, and then does a burst of allocations, these may be missed.
149
150The resulting profile will contain all allocations done between when profiling
151is enabled, and the end of the profiling session.
152
153The resulting [ProfilePacket] will have `from_startup` set to false in the
154corresponding `ProcessHeapSamples` message. This does not get surfaced in the
155converted pprof compatible proto.
156
157## Concurrent profiling sessions
158
159If multiple sessions name the same target process (either by name or PID),
160only the first relevant session will profile the process. The other sessions
161will report that the process had already been profiled when converting to
162the pprof compatible proto.
163
164If you see this message but do not expect any other sessions, run
165
166```shell
167adb shell killall perfetto
168```
169
170to stop any concurrent sessions that may be running.
171
172The resulting [ProfilePacket] will have `rejected_concurrent` set  to true in
173otherwise empty corresponding `ProcessHeapSamples` message. This does not get
174surfaced in the converted pprof compatible proto.
175
176## {#heapprofd-targets} Target processes
177
178Depending on the build of Android that heapprofd is run on, some processes
179are not be eligible to be profiled.
180
181On _user_ (i.e. production, non-rootable) builds, only Java applications with
182either the profileable or the debuggable manifest flag set can be profiled.
183Profiling requests for non-profileable/debuggable processes will result in an
184empty profile.
185
186On userdebug builds, all processes except for a small set of critical
187services can be profiled (to find the set of disallowed targets, look for
188`never_profile_heap` in [heapprofd.te](
189https://cs.android.com/android/platform/superproject/+/master:system/sepolicy/private/heapprofd.te?q=never_profile_heap).
190This restriction can be lifted by disabling SELinux by running
191`adb shell su root setenforce 0` or by passing `--disable-selinux` to the
192`heap_profile` script.
193
194<center>
195
196|                         | userdebug setenforce 0 | userdebug | user |
197|-------------------------|:----------------------:|:---------:|:----:|
198| critical native service |            Y           |     N     |  N   |
199| native service          |            Y           |     Y     |  N   |
200| app                     |            Y           |     Y     |  N   |
201| profileable app         |            Y           |     Y     |  Y   |
202| debuggable app          |            Y           |     Y     |  Y   |
203
204</center>
205
206To mark an app as profileable, put `<profileable android:shell="true"/>` into
207the `<application>` section of the app manifest.
208
209```xml
210<manifest ...>
211    <application>
212        <profileable android:shell="true"/>
213        ...
214    </application>
215</manifest>
216```
217
218## DEDUPED frames
219
220If the name of a Java method includes `[DEDUPED]`, this means that multiple
221methods share the same code. ART only stores the name of a single one in its
222metadata, which is displayed here. This is not necessarily the one that was
223called.
224
225## Triggering heap snapshots on demand
226
227Heap snapshot are recorded into the trace either at regular time intervals, if
228using the `continuous_dump_config` field, or at the end of the session.
229
230You can also trigger a snapshot of all currently profiled processes by running
231`adb shell killall -USR1 heapprofd`. This can be useful in lab tests for
232recording the current memory usage of the target in a specific state.
233
234This dump will show up in addition to the dump at the end of the profile that is
235always produced. You can create multiple of these dumps, and they will be
236enumerated in the output directory.
237
238## Symbolization
239
240NOTE: Symbolization is currently only available on Linux and MacOS.
241
242### Set up llvm-symbolizer
243
244You only need to do this once.
245
246To use symbolization, your system must have llvm-symbolizer installed and
247accessible from `$PATH` as `llvm-symbolizer`. On Debian, you can install it
248using `sudo apt install llvm-9`.
249This will create `/usr/bin/llvm-symbolizer-9`. Symlink that to somewhere in
250your `$PATH` as `llvm-symbolizer`.
251
252For instance, `ln -s /usr/bin/llvm-symbolizer-9 ~/bin/llvm-symbolizer`, and
253add `~/bin` to your path (or run the commands below with `PATH=~/bin:$PATH`
254prefixed).
255
256### Symbolize your profile
257
258If the profiled binary or libraries do not have symbol names, you can
259symbolize profiles offline. Even if they do, you might want to symbolize in
260order to get inlined function and line number information. All tools
261(traceconv, trace_processor_shell, the heap_profile script) support specifying
262the `PERFETTO_BINARY_PATH` as an environment variable.
263
264```
265PERFETTO_BINARY_PATH=somedir tools/heap_profile --name ${NAME}
266```
267
268You can persist symbols for a trace by running
269`PERFETTO_BINARY_PATH=somedir tools/traceconv symbolize raw-trace > symbols`.
270You can then concatenate the symbols to the trace (
271`cat raw-trace symbols > symbolized-trace`) and the symbols will part of
272`symbolized-trace`. The `tools/heap_profile` script will also generate this
273file in your output directory, if `PERFETTO_BINARY_PATH` is used.
274
275The symbol file is the first with matching Build ID in the following order:
276
2771. absolute path of library file relative to binary path.
2782. absolute path of library file relative to binary path, but with base.apk!
279    removed from filename.
2803. basename of library file relative to binary path.
2814. basename of library file relative to binary path, but with base.apk!
282    removed from filename.
2835. in the subdirectory .build-id: the first two hex digits of the build-id
284    as subdirectory, then the rest of the hex digits, with ".debug" appended.
285    See
286    https://fedoraproject.org/wiki/RolandMcGrath/BuildID#Find_files_by_build_ID
287
288For example, "/system/lib/base.apk!foo.so" with build id abcd1234,
289is looked for at:
290
2911. $PERFETTO_BINARY_PATH/system/lib/base.apk!foo.so
2922. $PERFETTO_BINARY_PATH/system/lib/foo.so
2933. $PERFETTO_BINARY_PATH/base.apk!foo.so
2944. $PERFETTO_BINARY_PATH/foo.so
2955. $PERFETTO_BINARY_PATH/.build-id/ab/cd1234.debug
296
297Alternatively, you can set the `PERFETTO_SYMBOLIZER_MODE` environment variable
298to `index`, and the symbolizer will recursively search the given directory for
299an ELF file with the given build id. This way, you will not have to worry
300about correct filenames.
301
302## Troubleshooting
303
304### Buffer overrun
305
306If the rate of allocations is too high for heapprofd to keep up, the profiling
307session will end early due to a buffer overrun. If the buffer overrun is
308caused by a transient spike in allocations, increasing the shared memory buffer
309size (passing `--shmem-size` to `tools/heap_profile`) can resolve the issue.
310Otherwise the sampling interval can be increased (at the expense of lower
311accuracy in the resulting profile) by passing `--interval=16000` or higher.
312
313### Profile is empty
314
315Check whether your target process is eligible to be profiled by consulting
316[Target processes](#heapprofd-targets) above.
317
318Also check the [Known Issues](#known-issues).
319
320### Implausible callstacks
321
322If you see a callstack that seems to impossible from looking at the code, make
323sure no [DEDUPED frames](#deduped-frames) are involved.
324
325Also, if your code is linked using _Identical Code Folding_
326(ICF), i.e. passing `-Wl,--icf=...` to the linker, most trivial functions, often
327constructors and destructors, can be aliased to binary-equivalent operators
328of completely unrelated classes.
329
330### Symbolization: Could not find library
331
332When symbolizing a profile, you might come across messages like this:
333
334```bash
335Could not find /data/app/invalid.app-wFgo3GRaod02wSvPZQ==/lib/arm64/somelib.so
336(Build ID: 44b7138abd5957b8d0a56ce86216d478).
337```
338
339Check whether your library (in this example somelib.so) exists in
340`PERFETTO_BINARY_PATH`. Then compare the Build ID to the one in your
341symbol file, which you can get by running
342`readelf -n /path/in/binary/path/somelib.so`. If it does not match, the
343symbolized file has a different version than the one on device, and cannot
344be used for symbolization.
345If it does, try moving somelib.so to the root of `PERFETTO_BINARY_PATH` and
346try again.
347
348### Only one frame shown
349If you only see a single frame for functions in a specific library, make sure
350that the library has unwind information. We need one of
351
352* `.gnu_debugdata`
353* `.eh_frame` (+ preferably `.eh_frame_hdr`)
354* `.debug_frame`.
355
356Frame-pointer unwinding is *not supported*.
357
358To check if an ELF file has any of those, run
359
360```console
361$ readelf -S file.so | grep "gnu_debugdata\|eh_frame\|debug_frame"
362  [12] .eh_frame_hdr     PROGBITS         000000000000c2b0  0000c2b0
363  [13] .eh_frame         PROGBITS         0000000000011000  00011000
364  [24] .gnu_debugdata    PROGBITS         0000000000000000  000f7292
365```
366
367If this does not show one or more of the sections, change your build system
368to not strip them.
369
370## (non-Android) Linux support
371
372NOTE: This is experimental and only for ad-hoc investigations.
373
374You can use a standalone library to profile memory allocations on Linux.
375First [build Perfetto](/docs/contributing/build-instructions.md)
376
377```
378tools/build_all_configs.py
379ninja -C out/linux_clang_release
380```
381
382Then, run traced
383
384```
385out/linux_clang_release/traced
386```
387
388Start the profile (e.g. targeting trace_processor_shell)
389
390```
391out/linux_clang_release/perfetto \
392  -c - --txt \
393  -o ~/heapprofd-trace \
394<<EOF
395
396buffers {
397  size_kb: 32768
398}
399
400data_sources {
401  config {
402    name: "android.heapprofd"
403    heapprofd_config {
404      shmem_size_bytes: 8388608
405      sampling_interval_bytes: 4096
406      block_client: true
407      process_cmdline: "trace_processor_shell"
408      dump_at_max: true
409    }
410  }
411}
412
413duration_ms: 604800000
414write_into_file: true
415flush_timeout_ms: 30000
416flush_period_ms: 604800000
417
418EOF
419```
420
421Finally, run your target (e.g. trace_processor_shell) with LD_PRELOAD
422
423```
424LD_PRELOAD=out/linux_clang_release/libheapprofd_preload.so out/linux_clang_release/trace_processor_shell <trace>
425```
426
427Then, Ctrl-C the Perfetto invocation and upload ~/heapprofd-trace to the
428[Perfetto UI](https://ui.perfetto.dev).
429
430## Known Issues
431
432### Android 11
433
434* 32-bit programs cannot be targeted on 64-bit devices.
435* Setting `sampling_interval_bytes` to 0 crashes the target process.
436  This is an invalid config that should be rejected instead.
437* For startup profiles, some frame names might be missing. This will be
438  resolved in Android 12.
439
440### Android 10
441
442* On ARM32, the bottom-most frame is always `ERROR 2`. This is harmless and
443  the callstacks are still complete.
444* x86 platforms are not supported. This includes the Android _Cuttlefish_
445  emulator.
446* If heapprofd is run standalone (by running `heapprofd` in a root shell, rather
447  than through init), `/dev/socket/heapprofd` get assigned an incorrect SELinux
448  domain. You will not be able to profile any processes unless you disable
449  SELinux enforcement.
450  Run `restorecon /dev/socket/heapprofd` in a root shell to resolve.
451* Using `vfork(2)` or `clone(2)` with `CLONE_VM` and allocating / freeing
452  memory in the child process will prematurely end the profile.
453  `java.lang.Runtime.exec` does this, calling it will prematurely end
454  the profile. Note that this is in violation of the POSIX standard.
455* 32-bit programs cannot be targeted on 64-bit devices.
456* Setting `sampling_interval_bytes` to 0 crashes the target process.
457  This is an invalid config that should be rejected instead.
458* Function names in libraries with load bias might be incorrect. Use
459  [offline symbolization](#symbolization) to resolve this issue.
460* For startup profiles, some frame names might be missing. This will be
461  resolved in Android 12.
462
463## Heapprofd vs malloc_info() vs RSS
464
465When using heapprofd and interpreting results, it is important to know the
466precise meaning of the different memory metrics that can be obtained from the
467operating system.
468
469**heapprofd** gives you the number of bytes the target program
470requested from the default C/C++ allocator. If you are profiling a Java app from
471startup, allocations that happen early in the application's initialization will
472not be visible to heapprofd. Native services that do not fork from the Zygote
473are not affected by this.
474
475**malloc\_info** is a libc function that gives you information about the
476allocator. This can be triggered on userdebug builds by using
477`am dumpheap -m <PID> /data/local/tmp/heap.txt`. This will in general be more
478than the memory seen by heapprofd, depending on the allocator not all memory
479is immediately freed. In particular, jemalloc retains some freed memory in
480thread caches.
481
482**Heap RSS** is the amount of memory requested from the operating system by the
483allocator. This is larger than the previous two numbers because memory can only
484be obtained in page size chunks, and fragmentation causes some of that memory to
485be wasted. This can be obtained by running `adb shell dumpsys meminfo <PID>` and
486looking at the "Private Dirty" column.
487RSS can also end up being smaller than the other two if the device kernel uses
488memory compression (ZRAM, enabled by default on recent versions of android) and
489the memory of the process get swapped out onto ZRAM.
490
491|                     | heapprofd         | malloc\_info | RSS |
492|---------------------|:-----------------:|:------------:|:---:|
493| from native startup |          x        |      x       |  x  |
494| after zygote init   |          x        |      x       |  x  |
495| before zygote init  |                   |      x       |  x  |
496| thread caches       |                   |      x       |  x  |
497| fragmentation       |                   |              |  x  |
498
499If you observe high RSS or malloc\_info metrics but heapprofd does not match,
500you might be hitting some patological fragmentation problem in the allocator.
501
502## Convert to pprof
503
504You can use [traceconv](/docs/quickstart/traceconv.md) to convert the heap dumps
505in a trace into the [pprof](https://github.com/google/pprof) format. These can
506then be viewed using the pprof CLI or a UI (e.g. Speedscope, or Google-internal
507pprof/).
508
509```bash
510tools/traceconv profile /tmp/profile
511```
512
513This will create a directory in `/tmp/` containing the heap dumps. Run:
514
515```bash
516gzip /tmp/heap_profile-XXXXXX/*.pb
517```
518
519to get gzipped protos, which tools handling pprof profile protos expect.
520
521## {#heapprofd-example-queries} Example SQL Queries
522
523We can get the callstacks that allocated using an SQL Query in the
524Trace Processor. For each frame, we get one row for the number of allocated
525bytes, where `count` and `size` is positive, and, if any of them were already
526freed, another line with negative `count` and `size`. The sum of those gets us
527the `space` view.
528
529```sql
530select a.callsite_id, a.ts, a.upid, f.name, f.rel_pc, m.build_id, m.name as mapping_name,
531        sum(a.size) as space_size, sum(a.count) as space_count
532      from heap_profile_allocation a join
533           stack_profile_callsite c ON (a.callsite_id = c.id) join
534           stack_profile_frame f ON (c.frame_id = f.id) join
535           stack_profile_mapping m ON (f.mapping = m.id)
536      group by 1, 2, 3, 4, 5, 6, 7 order by space_size desc;
537```
538
539| callsite_id | ts | upid | name | rel_pc | build_id | mapping_name | space_size | space_count |
540|-------------|----|------|-------|-----------|------|--------|----------|------|
541|6660|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |106496|4|
542|192 |5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
543|1421|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
544|1537|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26624 |1|
545|8843|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |26424 |1|
546|8618|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |24576 |4|
547|3750|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |12288 |1|
548|2820|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192  |2|
549|3788|5|1| malloc |244716| 8126fd.. | /apex/com.android.runtime/lib64/bionic/libc.so |8192  |2|
550
551We can see all the functions are "malloc" and "realloc", which is not terribly
552informative. Usually we are interested in the _cumulative_ bytes allocated in
553a function (otherwise, we will always only see malloc / realloc). Chasing the
554parent_id of a callsite (not shown in this table) recursively is very hard in
555SQL.
556
557There is an **experimental** table that surfaces this information. The **API is
558subject to change**.
559
560```sql
561select name, map_name, cumulative_size
562       from experimental_flamegraph(8300973884377,1,'native')
563       order by abs(cumulative_size) desc;
564```
565
566| name | map_name | cumulative_size |
567|------|----------|----------------|
568|__start_thread|/apex/com.android.runtime/lib64/bionic/libc.so|392608|
569|_ZL15__pthread_startPv|/apex/com.android.runtime/lib64/bionic/libc.so|392608|
570|_ZN13thread_data_t10trampolineEPKS|/system/lib64/libutils.so|199496|
571|_ZN7android14AndroidRuntime15javaThreadShellEPv|/system/lib64/libandroid_runtime.so|199496|
572|_ZN7android6Thread11_threadLoopEPv|/system/lib64/libutils.so|199496|
573|_ZN3art6Thread14CreateCallbackEPv|/apex/com.android.art/lib64/libart.so|193112|
574|_ZN3art35InvokeVirtualOrInterface...|/apex/com.android.art/lib64/libart.so|193112|
575|_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc|/apex/com.android.art/lib64/libart.so|193112|
576|art_quick_invoke_stub|/apex/com.android.art/lib64/libart.so|193112|
577