1.. _imc:
2
3OPAL/Skiboot In-Memory Collection (IMC) interface Documentation
4===============================================================
5
6Overview:
7---------
8
9In-Memory-Collection (IMC) is performance monitoring infrastrcuture
10for counters that (once started) can be read from memory at any time by
11an operating system. Such counters include those for the Nest and Core
12units, enabling continuous monitoring of resource utilisation on the chip.
13
14The API is agnostic as to how these counters are implemented. For the
15Nest units, they're implemented by having microcode in an on-chip
16microcontroller and for core units, they are implemented as part of core logic
17to gather data and periodically write it to the memory locations.
18
19Nest (On-Chip, Off-Core) unit:
20------------------------------
21
22Nest units have dedicated hardware counters which can be programmed
23to monitor various chip resources such as memory bandwidth,
24xlink bandwidth, alink bandwidth, PCI, NVlink and so on. These Nest
25unit PMU counters can be programmed in-band via scom. But alternatively,
26programming of these counters and periodically moving the counter data
27to memory are offloaded to a hardware engine part of OCC (On-Chip Controller).
28
29Microcode, starts to run at system boot in OCC complex, initialize these
30Nest unit PMUs and periodically accumulate the nest pmu counter values
31to memory. List of supported events by the microcode is packages as a DTS
32and stored in IMA_CATALOG partition.
33
34Core unit:
35----------
36
37Core IMC PMU counters are handled in the core-imc unit. Each core has
384 Core Performance Monitoring Counters (CPMCs) which are used by Core-IMC logic.
39Two of these are dedicated to count core cycles and instructions.
40The 2 remaining CPMCs have to multiplex 128 events each.
41
42Core IMC hardware does not support interrupts and it peridocially (based on
43sampling duration) fetches the counter data and accumulate to main memory.
44Memory to accumulate counter data are refered from "PDBAR" (per-core scom)
45and "LDBAR" per-thread spr.
46
47Trace mode of IMC:
48------------------
49
50POWER9 support two modes for IMC which are the Accumulation mode and
51Trace mode. In Accumulation mode event counts are accumulated in system
52memory. Hypervisor/kernel then reads the posted counts periodically, or
53when requested. In IMC Trace mode, the 64 bit trace scom value is initialized
54with the event information. The CPMC*SEL and CPMC_LOAD in the trace scom, specifies
55the event to be monitored and the sampling duration. On each overflow in the
56CPMC*SEL, hardware snapshots the program counter along with event counts
57and writes into memory pointed by LDBAR. LDBAR has bits to indicate whether
58hardware is configured for accumulation or trace mode.
59Currently the event monitored for trace-mode is fixed as cycle.
60
61PMI interrupt handling is avoided, since IMC trace mode snapshots the
62program counter and update to the memory. And this also provide a way for
63the operating system to do instruction sampling in real time without
64PMI(Performance Monitoring Interrupts) processing overhead.
65
66**Example:**
67
68Performance data using 'perf top' with and without trace-imc event:
69
70
71*PMI interrupts count when `perf top` command is executed without trace-imc event.*
72::
73
74     # cat /proc/interrupts  (a snippet from the output)
75     9944      1072        804        804       1644        804       1306
76     804        804        804        804        804        804        804
77     804        804       1961       1602        804        804       1258
78     [-----------------------------------------------------------------]
79     803        803        803        803        803        803        803
80     803        803        803        803        804        804        804
81     804        804        804        804        804        804        803
82     803        803        803        803        803       1306        803
83     803   Performance monitoring interrupts
84
85
86*PMI interrupts count when `perf top` command executed with trace-imc event
87(executed right after 'perf top' without trace-imc event).*
88::
89
90   # perf top -e trace_imc/trace_cycles/
91   12.50%  [kernel]          [k] arch_cpu_idle
92   11.81%  [kernel]          [k] __next_timer_interrupt
93   11.22%  [kernel]          [k] rcu_idle_enter
94   10.25%  [kernel]          [k] find_next_bit
95    7.91%  [kernel]          [k] do_idle
96    7.69%  [kernel]          [k] rcu_dynticks_eqs_exit
97    5.20%  [kernel]          [k] tick_nohz_idle_stop_tick
98        [-----------------------]
99
100   # cat /proc/interrupts (a snippet from the output)
101
102   9944      1072        804        804       1644        804       1306
103   804        804        804        804        804        804        804
104   804        804       1961       1602        804        804       1258
105   [-----------------------------------------------------------------]
106   803        803        803        803        803        803        803
107   803        803        803        804        804        804        804
108   804        804        804        804        804        804        803
109   803        803        803        803        803       1306        803
110   803   Performance monitoring interrupts
111
112Here the PMI interrupts count remains the same.
113
114OPAL APIs:
115----------
116
117The OPAL API is simple: a call to init a counter type, and calls to
118start and stop collection. The memory locations are described in the
119device tree.
120
121See :ref:`opal-imc-counters` and :ref:`device-tree/imc`
122