History log of /linux/tools/perf/builtin-stat.c (Results 1 – 25 of 646)
Revision Date Author Comments
# f08cc258 07-Sep-2024 Ian Rogers <irogers@google.com>

perf evsel: Add accessor for tool_event

Currently tool events use a dedicated variable within the evsel. Later
changes will move this to the unused struct perf_event_attr config for
these events. Ad

perf evsel: Add accessor for tool_event

Currently tool events use a dedicated variable within the evsel. Later
changes will move this to the unused struct perf_event_attr config for
these events. Add an accessor to allow the later change to be well
typed and avoid changing all uses.

Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Yang Jihong <yangjihong@bytedance.com>
Cc: Dominique Martinet <asmadeus@codewreck.org>
Cc: Clément Le Goffic <clement.legoffic@foss.st.com>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Howard Chu <howardchu95@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Junhao He <hejunhao3@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: James Clark <james.clark@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Benjamin Gray <bgray@linux.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Xu Yang <xu.yang_2@nxp.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Veronika Molnarova <vmolnaro@redhat.com>
Cc: Dr. David Alan Gilbert <linux@treblig.org>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# d546e3ac 20-Jul-2024 Weilin Wang <weilin.wang@intel.com>

perf stat: Add command line option for enabling TPEBS recording

With this command line option, TPEBS recording is turned off in 'perf
stat' on default. It will only be turned on when this option is

perf stat: Add command line option for enabling TPEBS recording

With this command line option, TPEBS recording is turned off in 'perf
stat' on default. It will only be turned on when this option is given in
'perf stat' command.

Example with --record-tpebs:

perf stat -M tma_split_loads -C1-4 --record-tpebs sleep 1

[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.044 MB - ]

Performance counter stats for 'CPU(s) 1-4':

53,259,156,071 cpu_core/TOPDOWN.SLOTS/ # 1.6 % tma_split_loads (50.00%)
15,867,565,250 cpu_core/topdown-retiring/ (50.00%)
15,655,580,731 cpu_core/topdown-mem-bound/ (50.00%)
11,738,022,218 cpu_core/topdown-bad-spec/ (50.00%)
6,151,265,424 cpu_core/topdown-fe-bound/ (50.00%)
20,445,917,581 cpu_core/topdown-be-bound/ (50.00%)
6,925,098,013 cpu_core/L1D_PEND_MISS.PENDING/ (50.00%)
3,838,653,421 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ (50.00%)
4,797,059,783 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (50.00%)
11,931,916,714 cpu_core/CPU_CLK_UNHALTED.THREAD/ (50.00%)
102,576,164 cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/ (50.00%)
64,071,854 cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/ (50.00%)
3 cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/R

1.003049679 seconds time elapsed

Example without --record-tpebs:

perf stat -M tma_contested_accesses -C1 sleep 1

Performance counter stats for 'CPU(s) 1':

50,203,891 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_contested_accesses (63.60%)
10,040,777 cpu_core/topdown-retiring/ (63.60%)
6,890,729 cpu_core/topdown-mem-bound/ (63.60%)
2,756,463 cpu_core/topdown-bad-spec/ (63.60%)
10,828,288 cpu_core/topdown-fe-bound/ (63.60%)
28,350,432 cpu_core/topdown-be-bound/ (63.60%)
98 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/ (63.70%)
577,520 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ (54.62%)
313,339 cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/ (54.62%)
14,155 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (45.54%)
0 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/ (36.30%)
8,468,077 cpu_core/CPU_CLK_UNHALTED.THREAD/ (45.38%)
198 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/ (45.38%)
8,324 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (45.38%)
3,388,031,520 TSC
23,226,785 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (54.46%)
80 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/ (54.46%)
0 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/R
0 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/R
1,006,816,667 ns duration_time

1.002537737 seconds time elapsed

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-7-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 8db5cabc 20-Jul-2024 Weilin Wang <weilin.wang@intel.com>

perf stat: Fork and launch 'perf record' when 'perf stat' needs to get retire latency value for a metric.

When retire_latency value is used in a metric formula, evsel would fork
a 'perf record' proc

perf stat: Fork and launch 'perf record' when 'perf stat' needs to get retire latency value for a metric.

When retire_latency value is used in a metric formula, evsel would fork
a 'perf record' process with "-e" and "-W" options. 'perf record' will
collect required retire_latency values in parallel while 'perf stat' is
collecting counting values.

At the point of time that 'perf stat' stops counting, evsel would stop
'perf record' by sending sigterm signal to 'perf record' process.
Sampled data will be processed to get retire latency value. Another
thread is required to synchronize between 'perf stat' and 'perf record'
when we pass data through pipe.

Retire_latency evsel is not opened for 'perf stat' so that there is no
counter wasted on it. This commit includes code suggested by Namhyung to
adjust reading size for groups that include retire_latency evsels.

In current :R parsing implementation, the parser would recognize events
with retire_latency modifier and insert them into the evlist like a
normal event. Ideally, we need to avoid counting these events.

In this commit, at the time when a retire_latency evsel is read, set the
retire latency value processed from the sampled data to count value.
This sampled retire latency value will be used for metric calculation
and final event count print out. No special metric calculation and event
print out code required for retire_latency events.

Reviewed-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240720062102.444578-4-weilin.wang@intel.com
[ Squashed the 3rd and 4th commit in the series to keep it building patch by patch ]
[ Constified the 'struct perf_tool' pointer in process_sample_event() ]
[ Use perf_tool__init(&tool, false) to address a segfault I reported and Ian/Weilin diagnosed ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 071b117e 12-Aug-2024 Ian Rogers <irogers@google.com>

perf stat: Use perf_tool__init()

Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.

perf stat: Use perf_tool__init()

Use perf_tool__init() so that more uses of 'struct perf_tool' can be const
and not relying on perf_tool__fill_defaults().

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 30f29bae 12-Aug-2024 Ian Rogers <irogers@google.com>

perf tool: Constify tool pointers

The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be co

perf tool: Constify tool pointers

The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.

Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linux.dev>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nick Terrell <terrelln@fb.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 966854e7 03-Jul-2024 Namhyung Kim <namhyung@kernel.org>

perf bpf-filter: Pass 'target' to perf_bpf_filter__prepare()

This is needed to prepare target-specific actions in the later patch.
We want to reuse the pinned BPF program and map for regular users t

perf bpf-filter: Pass 'target' to perf_bpf_filter__prepare()

This is needed to prepare target-specific actions in the later patch.
We want to reuse the pinned BPF program and map for regular users to
profile their own processes.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20240703223035.2024586-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 6828d692 03-May-2024 Ian Rogers <irogers@google.com>

perf evsel: Refactor tool events

Tool events unnecessarily open a dummy perf event which is useless
even with `perf record` which will still open a dummy event. Change
the behavior of tool events so

perf evsel: Refactor tool events

Tool events unnecessarily open a dummy perf event which is useless
even with `perf record` which will still open a dummy event. Change
the behavior of tool events so:

- duration_time - call `rdclock` on open and then report the count as
a delta since the start in evsel__read_counter. This moves code out
of builtin-stat making it more general purpose.

- user_time/system_time - open the fd as either `/proc/pid/stat` or
`/proc/stat` for cases like system wide. evsel__read_counter will
read the appropriate field out of the procfs file. These values
were previously supplied by wait4, if the procfs read fails then
the wait4 values are used, assuming the process/thread terminated.
By reading user_time and system_time this way, interval mode, per
PID and per CPU can be supported although there are restrictions
given what the files provide (e.g. per PID can't be combined with
per CPU).

Opening any of the tool events for `perf record` is changed to return
invalid.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: James Clark <james.clark@arm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Ze Gao <zegao2021@gmail.com>
Cc: Song Liu <song@kernel.org>
Cc: Leo Yan <leo.yan@linux.dev>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240503232849.17752-1-irogers@google.com

show more ...


# f5803651 05-Jun-2024 Ian Rogers <irogers@google.com>

perf stat: Choose the most disaggregate command line option

When multiple aggregation options are passed to perf stat the behavior
isn't clear. Consider "perf stat -A --per-socket .." and "perf stat

perf stat: Choose the most disaggregate command line option

When multiple aggregation options are passed to perf stat the behavior
isn't clear. Consider "perf stat -A --per-socket .." and "perf stat
--per-socket -A ..", the first won't aggregate at all while the second
will do per-socket aggregation, even though the same options were
passed.

Rather than set an enum value, gather the options in a struct and
process them from most to least aggregate. This ensures the least
aggregate option always applies, so no aggregation if "-A" is passed.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240605063828.195700-2-irogers@google.com

show more ...


# 0dddd91a 05-Jun-2024 Ian Rogers <irogers@google.com>

perf stat: Make options local

Reduce the scope of stat_options to cmd_stat, and pass as an argument
to __cmd_record. This is done to make more localized changes to the
options in later patches. A si

perf stat: Make options local

Reduce the scope of stat_options to cmd_stat, and pass as an argument
to __cmd_record. This is done to make more localized changes to the
options in later patches. A side-effect of the change is to reduce the
size of a stripped PIE perf binary by 5952 bytes. The savings come
mainly in the dynamic relocation section.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240605063828.195700-1-irogers@google.com

show more ...


# a8cd4766 07-May-2024 Ian Rogers <irogers@google.com>

perf cpumap: Remove refcnt from 'struct cpu_aggr_map'

It is assigned a value of 1 and never incremented. Remove and replace
puts with delete.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adri

perf cpumap: Remove refcnt from 'struct cpu_aggr_map'

It is assigned a value of 1 and never incremented. Remove and replace
puts with delete.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240507183545.1236093-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 03f23570 12-Apr-2024 Weilin Wang <weilin.wang@intel.com>

perf stat: Add new field in stat_config to enable hardware aware grouping

Hardware counter and event information could be used to help creating event
groups that better utilize hardware counters and

perf stat: Add new field in stat_config to enable hardware aware grouping

Hardware counter and event information could be used to help creating event
groups that better utilize hardware counters and improve multiplexing.

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Weilin Wang <weilin.wang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Link: https://lore.kernel.org/r/20240412210756.309828-2-weilin.wang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 954ac1b4 02-Feb-2024 Ian Rogers <irogers@google.com>

perf stat: Remove duplicate cpus_map_matched function

Use libperf's perf_cpu_map__equal() that performs the same function.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <

perf stat: Remove duplicate cpus_map_matched function

Use libperf's perf_cpu_map__equal() that performs the same function.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20240202234057.2085863-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 3e5deb70 02-Feb-2024 Ian Rogers <irogers@google.com>

perf cpumap: Clean up use of perf_cpu_map__has_any_cpu_or_is_empty

Most uses of what was perf_cpu_map__empty but is now
perf_cpu_map__has_any_cpu_or_is_empty want to do something with the
CPU map if

perf cpumap: Clean up use of perf_cpu_map__has_any_cpu_or_is_empty

Most uses of what was perf_cpu_map__empty but is now
perf_cpu_map__has_any_cpu_or_is_empty want to do something with the
CPU map if it contains CPUs. Replace uses of
perf_cpu_map__has_any_cpu_or_is_empty with other helpers so that CPUs
within the map can be handled.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Link: https://lore.kernel.org/r/20240202234057.2085863-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# cbc917a1 08-Feb-2024 Yicong Yang <yangyicong@hisilicon.com>

perf stat: Support per-cluster aggregation

Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Ja

perf stat: Support per-cluster aggregation

Some platforms have 'cluster' topology and CPUs in the cluster will
share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2
cache (for Intel Jacobsville). Currently parsing and building cluster
topology have been supported since [1].

perf stat has already supported aggregation for other topologies like
die or socket, etc. It'll be useful to aggregate per-cluster to find
problems like L3T bandwidth contention.

This patch add support for "--per-cluster" option for per-cluster
aggregation. Also update the docs and related test. The output will
be like:

[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5

Performance counter stats for 'system wide':

S56-D0-CLS158 4 1,321,521,570 LLC-load
S56-D0-CLS594 4 794,211,453 LLC-load
S56-D0-CLS1030 4 41,623 LLC-load
S56-D0-CLS1466 4 41,646 LLC-load
S56-D0-CLS1902 4 16,863 LLC-load
S56-D0-CLS2338 4 15,721 LLC-load
S56-D0-CLS2774 4 22,671 LLC-load
[...]

On a legacy system without cluster or cluster support, the output will
be look like:
[root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1

Performance counter stats for 'system wide':

S56-D0-CLS0 64 18,011,485 cycles
S7182-D0-CLS0 64 16,548,835 cycles

Note that this patch doesn't mix the cluster information in the outputs
of --per-core to avoid breaking any tools/scripts using it.

Note that perf recently supports "--per-cache" aggregation, but it's not
the same with the cluster although cluster CPUs may share some cache
resources. For example on my machine all clusters within a die share the
same L3 cache:
$ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list
0-31
$ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list
0-3

[1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die")

Tested-by: Jie Zhan <zhanjie9@hisilicon.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Cc: james.clark@arm.com
Cc: 21cnbao@gmail.com
Cc: prime.zeng@hisilicon.com
Cc: Jonathan.Cameron@huawei.com
Cc: fanghao11@huawei.com
Cc: linuxarm@huawei.com
Cc: tim.c.chen@intel.com
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com

show more ...


# 6f33e6fa 14-Dec-2023 Ian Rogers <irogers@google.com>

perf stat: Combine the -A/--no-aggr and --no-merge options

The -A or --no-aggr option disables aggregation of core events:

$ perf stat -A -e cycles,data_total -a true

Performance counter stat

perf stat: Combine the -A/--no-aggr and --no-merge options

The -A or --no-aggr option disables aggregation of core events:

$ perf stat -A -e cycles,data_total -a true

Performance counter stats for 'system wide':

CPU0 1,287,665 cycles
CPU1 1,831,681 cycles
CPU2 27,345,998 cycles
CPU3 1,964,799 cycles
CPU4 236,174 cycles
CPU5 3,302,825 cycles
CPU6 9,201,446 cycles
CPU7 1,403,043 cycles
CPU0 110.90 MiB data_total

0.008961761 seconds time elapsed

The --no-merge option disables the aggregation of uncore events:

$ perf stat --no-merge -e cycles,data_total -a true

Performance counter stats for 'system wide':

38,482,778 cycles
15.04 MiB data_total [uncore_imc_free_running_1]
15.00 MiB data_total [uncore_imc_free_running_0]

0.005915155 seconds time elapsed

Having two options confuses users who generally don't appreciate the
difference in PMUs. Keep all the options but make it so they all
disable aggregation both of core and uncore events:

$ perf stat -A -e cycles,data_total -a true

Performance counter stats for 'system wide':

CPU0 85,878 cycles
CPU1 88,179 cycles
CPU2 60,872 cycles
CPU3 3,265,567 cycles
CPU4 82,357 cycles
CPU5 83,383 cycles
CPU6 84,156 cycles
CPU7 220,803 cycles
CPU0 2.38 MiB data_total [uncore_imc_free_running_0]
CPU0 2.38 MiB data_total [uncore_imc_free_running_1]

0.001397205 seconds time elapsed

Update the relevant 'perf stat' man page information.

Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kaige Ye <ye@kaige.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231214060256.2094017-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 923ca62a 29-Nov-2023 Ian Rogers <irogers@google.com>

libperf cpumap: Rename perf_cpu_map__empty() to perf_cpu_map__has_any_cpu_or_is_empty()

The name perf_cpu_map_empty is misleading as true is also returned
when the map contains an "any" CPU (aka dum

libperf cpumap: Rename perf_cpu_map__empty() to perf_cpu_map__has_any_cpu_or_is_empty()

The name perf_cpu_map_empty is misleading as true is also returned
when the map contains an "any" CPU (aka dummy) map.

Rename to perf_cpu_map__has_any_cpu_or_is_empty(), later changes will
(re)introduce perf_cpu_map__empty() and perf_cpu_map__has_any_cpu().

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: André Almeida <andrealmeid@igalia.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Atish Patra <atishp@rivosinc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20231129060211.1890454-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 8596ba32 29-Nov-2023 Ian Rogers <irogers@google.com>

perf stat: Fix help message for --metric-no-threshold option

Copy-paste error led to help message for metric-no-threshold repeating
that of metric-no-merge.

Fixes: 1fd09e299bdd434b ("perf metric: A

perf stat: Fix help message for --metric-no-threshold option

Copy-paste error led to help message for metric-no-threshold repeating
that of metric-no-merge.

Fixes: 1fd09e299bdd434b ("perf metric: Add --metric-no-threshold option")
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231129223540.2247030-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 0713ab3b 06-Dec-2023 Ian Rogers <irogers@google.com>

perf stat: Exit perf stat if parse groups fails

Metrics were added by a callback but commit a4b8cfcabb1d90ec ("perf
stat: Delay metric parsing") postponed this to allow optimizations based
on the CP

perf stat: Exit perf stat if parse groups fails

Metrics were added by a callback but commit a4b8cfcabb1d90ec ("perf
stat: Delay metric parsing") postponed this to allow optimizations based
on the CPU configuration.

In doing so it stopped errors in metric parsing from causing 'perf stat'
termination.

This change adds the termination for bad metric names back in.

Fixes: a4b8cfcabb1d90ec ("perf stat: Delay metric parsing")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Closes: https://lore.kernel.org/lkml/ZXByT1K6enTh2EHT@kernel.org/
Link: https://lore.kernel.org/r/20231206183533.972028-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# eb2eac0c 21-Nov-2023 Ian Rogers <irogers@google.com>

perf evsel: Fallback to "task-clock" when not system wide

When the "cycles" event isn't available evsel will fallback to the
"cpu-clock" software event.

"task-clock" is similar to "cpu-clock" but o

perf evsel: Fallback to "task-clock" when not system wide

When the "cycles" event isn't available evsel will fallback to the
"cpu-clock" software event.

"task-clock" is similar to "cpu-clock" but only runs when the process is
running.

Falling back to "cpu-clock" when not system wide leads to confusion, by
falling back to "task-clock" it is hoped the confusion is less.

Pass the target to determine if "task-clock" is more appropriate.

Update a nearby comment and debug string for the change.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ajay Kaher <akaher@vmware.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20231121000420.368075-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# a84fbf20 06-Sep-2023 Ian Rogers <irogers@google.com>

perf stat: Fix aggr mode initialization

Generating metrics llc_code_read_mpi_demand_plus_prefetch,
llc_data_read_mpi_demand_plus_prefetch,
llc_miss_local_memory_bandwidth_read,
llc_miss_local_memory

perf stat: Fix aggr mode initialization

Generating metrics llc_code_read_mpi_demand_plus_prefetch,
llc_data_read_mpi_demand_plus_prefetch,
llc_miss_local_memory_bandwidth_read,
llc_miss_local_memory_bandwidth_write,
nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read,
memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw,
C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency,
C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency,
C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would
trigger an address sanitizer heap-buffer-overflows on a SkylakeX.

```
==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98
READ of size 4 at 0x5020003eee78 thread T0
#0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12
#1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6
#2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9
#3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31
#4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18
#5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3
#6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5
#7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2
#8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3
#9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11
#10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8
#11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2
#12 0x558265bb9389 in main tools/perf/perf.c:537:3
```

The issue was the use of testing a cpumap with NULL rather than using
empty, as a map containing the dummy value isn't NULL and the -1
results in an empty aggr map being allocated which legitimately
overflows when any member is accessed.

Fixes: 8a96f454f5668572 ("perf stat: Avoid SEGV if core.cpus isn't set")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# db1f5f10 14-Jun-2023 Yang Jihong <yangjihong1@huawei.com>

perf stat: Add missing newline in pr_err messages

The newline is missing for error messages in add_default_attributes()

Before:

# perf stat --topdown
Topdown requested but the topdown metric g

perf stat: Add missing newline in pr_err messages

The newline is missing for error messages in add_default_attributes()

Before:

# perf stat --topdown
Topdown requested but the topdown metric groups aren't present.
(See perf list the metric groups have names like TopdownL1)#

After:

# perf stat --topdown
Topdown requested but the topdown metric groups aren't present.
(See perf list the metric groups have names like TopdownL1)
#

In addition, perf_stat_init_aggr_mode() and perf_stat_init_aggr_mode_file()
have the same problem, fixed by the way.

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Ian Rogers <irogers@google.com>
Reviewed-by: James Clark <james.clark@arm.com>
Link: https://lore.kernel.org/r/20230614021505.59856-1-yangjihong1@huawei.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>

show more ...


# dada1a1f 16-Jun-2023 Namhyung Kim <namhyung@kernel.org>

perf stat: Show average value on multiple runs

When -r option is used, perf stat runs the command multiple times and
update stats in the evsel->stats.res_stats for global aggregation. But
the value

perf stat: Show average value on multiple runs

When -r option is used, perf stat runs the command multiple times and
update stats in the evsel->stats.res_stats for global aggregation. But
the value is never used and the value it prints at the end is just the
value from the last run. I think we should print the average number of
multiple runs.

Add evlist__copy_res_stats() to update the aggr counter (for display)
using the values in the evsel->stats.res_stats.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# ed4090a2 16-Jun-2023 Namhyung Kim <namhyung@kernel.org>

perf stat: Reset aggr stats for each run

When it runs multiple times with -r option, it missed to reset the
aggregation counters and the values were added up. The aggregation
count has the values t

perf stat: Reset aggr stats for each run

When it runs multiple times with -r option, it missed to reset the
aggregation counters and the values were added up. The aggregation
count has the values to be printed in the end. It should reset the
counters at the beginning of each run. But the current code does that
only when -I/--interval-print option is given.

Fixes: 91f85f98da7ab8c3 ("perf stat: Display event stats using aggr counts")
Reported-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230616073211.1057936-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# 6a80d794 16-Jun-2023 Kan Liang <kan.liang@linux.intel.com>

perf stat: New metricgroup output for the default mode

In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and just makes the output
ha

perf stat: New metricgroup output for the default mode

In the default mode, the current output of the metricgroup include both
events and metrics, which is not necessary and just makes the output
hard to read. Since different ARCHs (even different generations in the
same ARCH) may use different events. The output also vary on different
platforms.

For a metricgroup, only outputting the value of each metric is good
enough.

Add a new field default_metricgroup in evsel to indicate an event of the
default metricgroup. For those events, printout() should print the
metricgroup name rather than each event.

Add perf_stat__skip_metric_event() to skip the evsel in the Default
metricgroup, if it's not running or not the metric event.

Add print_metricgroup_header_t to pass the functions which print the
display name of each metricgroup in the Default metricgroup. Support all
three output methods.

Factor out perf_stat__print_shadow_stats_metricgroup() to print out each
metrics.

On SPR:

Before:

./perf_old stat sleep 1

Performance counter stats for 'sleep 1':

0.54 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 125.445 K/sec
540,970 cycles:u # 0.998 GHz
556,325 instructions:u # 1.03 insn per cycle
123,602 branches:u # 228.018 M/sec
6,889 branch-misses:u # 5.57% of all branches
3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound
# 17.2 % tma_retiring
# 23.1 % tma_bad_speculation
# 41.4 % tma_frontend_bound
564,859 topdown-retiring:u
1,370,999 topdown-fe-bound:u
603,271 topdown-be-bound:u
744,874 topdown-bad-spec:u
12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec

1.001798215 seconds time elapsed

0.000193000 seconds user
0.001700000 seconds sys

After:

$ ./perf stat sleep 1

Performance counter stats for 'sleep 1':

0.51 msec task-clock:u # 0.001 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
68 page-faults:u # 132.683 K/sec
545,228 cycles:u # 1.064 GHz
555,509 instructions:u # 1.02 insn per cycle
123,574 branches:u # 241.120 M/sec
6,957 branch-misses:u # 5.63% of all branches
TopdownL1 # 17.5 % tma_backend_bound
# 22.6 % tma_bad_speculation
# 42.7 % tma_frontend_bound
# 17.1 % tma_retiring
TopdownL2 # 21.8 % tma_branch_mispredicts
# 11.5 % tma_core_bound
# 13.4 % tma_fetch_bandwidth
# 29.3 % tma_fetch_latency
# 2.7 % tma_heavy_operations
# 14.5 % tma_light_operations
# 0.8 % tma_machine_clears
# 6.1 % tma_memory_bound

1.001712086 seconds time elapsed

0.000151000 seconds user
0.001618000 seconds sys

Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# b0a9e8f8 15-Jun-2023 Kan Liang <kan.liang@linux.intel.com>

perf stat,jevents: Introduce Default tags for the default mode

Introduce a new metricgroup, Default, to tag all the metric groups which
will be collected in the default mode.

Add a new field, Defau

perf stat,jevents: Introduce Default tags for the default mode

Introduce a new metricgroup, Default, to tag all the metric groups which
will be collected in the default mode.

Add a new field, DefaultMetricgroupName, in the JSON file to indicate
the real metric group name. It will be printed in the default output
to replace the event names.

There is nothing changed for the output format.

On SPR, both TopdownL1 and TopdownL2 are displayed in the default
output.

On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is
displayed in the default output.

Suggested-by: Stephane Eranian <eranian@google.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


12345678910>>...26