#
f08cc258 |
| 07-Sep-2024 |
Ian Rogers <irogers@google.com> |
perf evsel: Add accessor for tool_event
Currently tool events use a dedicated variable within the evsel. Later changes will move this to the unused struct perf_event_attr config for these events. Ad
perf evsel: Add accessor for tool_event
Currently tool events use a dedicated variable within the evsel. Later changes will move this to the unused struct perf_event_attr config for these events. Add an accessor to allow the later change to be well typed and avoid changing all uses.
Signed-off-by: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20240907050830.6752-4-irogers@google.com Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Yang Jihong <yangjihong@bytedance.com> Cc: Dominique Martinet <asmadeus@codewreck.org> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Colin Ian King <colin.i.king@gmail.com> Cc: Howard Chu <howardchu95@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Junhao He <hejunhao3@huawei.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Weilin Wang <weilin.wang@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Jing Zhang <renyu.zj@linux.alibaba.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Cc: Xu Yang <xu.yang_2@nxp.com> Cc: John Garry <john.g.garry@oracle.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Veronika Molnarova <vmolnaro@redhat.com> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
d546e3ac |
| 20-Jul-2024 |
Weilin Wang <weilin.wang@intel.com> |
perf stat: Add command line option for enabling TPEBS recording
With this command line option, TPEBS recording is turned off in 'perf stat' on default. It will only be turned on when this option is
perf stat: Add command line option for enabling TPEBS recording
With this command line option, TPEBS recording is turned off in 'perf stat' on default. It will only be turned on when this option is given in 'perf stat' command.
Example with --record-tpebs:
perf stat -M tma_split_loads -C1-4 --record-tpebs sleep 1
[ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.044 MB - ]
Performance counter stats for 'CPU(s) 1-4':
53,259,156,071 cpu_core/TOPDOWN.SLOTS/ # 1.6 % tma_split_loads (50.00%) 15,867,565,250 cpu_core/topdown-retiring/ (50.00%) 15,655,580,731 cpu_core/topdown-mem-bound/ (50.00%) 11,738,022,218 cpu_core/topdown-bad-spec/ (50.00%) 6,151,265,424 cpu_core/topdown-fe-bound/ (50.00%) 20,445,917,581 cpu_core/topdown-be-bound/ (50.00%) 6,925,098,013 cpu_core/L1D_PEND_MISS.PENDING/ (50.00%) 3,838,653,421 cpu_core/MEMORY_ACTIVITY.STALLS_L1D_MISS/ (50.00%) 4,797,059,783 cpu_core/EXE_ACTIVITY.BOUND_ON_LOADS/ (50.00%) 11,931,916,714 cpu_core/CPU_CLK_UNHALTED.THREAD/ (50.00%) 102,576,164 cpu_core/MEM_LOAD_COMPLETED.L1_MISS_ANY/ (50.00%) 64,071,854 cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/ (50.00%) 3 cpu_core/MEM_INST_RETIRED.SPLIT_LOADS/R
1.003049679 seconds time elapsed
Example without --record-tpebs:
perf stat -M tma_contested_accesses -C1 sleep 1
Performance counter stats for 'CPU(s) 1':
50,203,891 cpu_core/TOPDOWN.SLOTS/ # 0.0 % tma_contested_accesses (63.60%) 10,040,777 cpu_core/topdown-retiring/ (63.60%) 6,890,729 cpu_core/topdown-mem-bound/ (63.60%) 2,756,463 cpu_core/topdown-bad-spec/ (63.60%) 10,828,288 cpu_core/topdown-fe-bound/ (63.60%) 28,350,432 cpu_core/topdown-be-bound/ (63.60%) 98 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HITM/ (63.70%) 577,520 cpu_core/MEMORY_ACTIVITY.STALLS_L2_MISS/ (54.62%) 313,339 cpu_core/MEMORY_ACTIVITY.STALLS_L3_MISS/ (54.62%) 14,155 cpu_core/MEM_LOAD_RETIRED.L1_MISS/ (45.54%) 0 cpu_core/OCR.DEMAND_DATA_RD.L3_HIT.SNOOP_HIT_WITH_FWD/ (36.30%) 8,468,077 cpu_core/CPU_CLK_UNHALTED.THREAD/ (45.38%) 198 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/ (45.38%) 8,324 cpu_core/MEM_LOAD_RETIRED.FB_HIT/ (45.38%) 3,388,031,520 TSC 23,226,785 cpu_core/CPU_CLK_UNHALTED.REF_TSC/ (54.46%) 80 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/ (54.46%) 0 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD/R 0 cpu_core/MEM_LOAD_L3_HIT_RETIRED.XSNP_MISS/R 1,006,816,667 ns duration_time
1.002537737 seconds time elapsed
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240720062102.444578-7-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
8db5cabc |
| 20-Jul-2024 |
Weilin Wang <weilin.wang@intel.com> |
perf stat: Fork and launch 'perf record' when 'perf stat' needs to get retire latency value for a metric.
When retire_latency value is used in a metric formula, evsel would fork a 'perf record' proc
perf stat: Fork and launch 'perf record' when 'perf stat' needs to get retire latency value for a metric.
When retire_latency value is used in a metric formula, evsel would fork a 'perf record' process with "-e" and "-W" options. 'perf record' will collect required retire_latency values in parallel while 'perf stat' is collecting counting values.
At the point of time that 'perf stat' stops counting, evsel would stop 'perf record' by sending sigterm signal to 'perf record' process. Sampled data will be processed to get retire latency value. Another thread is required to synchronize between 'perf stat' and 'perf record' when we pass data through pipe.
Retire_latency evsel is not opened for 'perf stat' so that there is no counter wasted on it. This commit includes code suggested by Namhyung to adjust reading size for groups that include retire_latency evsels.
In current :R parsing implementation, the parser would recognize events with retire_latency modifier and insert them into the evlist like a normal event. Ideally, we need to avoid counting these events.
In this commit, at the time when a retire_latency evsel is read, set the retire latency value processed from the sampled data to count value. This sampled retire latency value will be used for metric calculation and final event count print out. No special metric calculation and event print out code required for retire_latency events.
Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240720062102.444578-4-weilin.wang@intel.com [ Squashed the 3rd and 4th commit in the series to keep it building patch by patch ] [ Constified the 'struct perf_tool' pointer in process_sample_event() ] [ Use perf_tool__init(&tool, false) to address a segfault I reported and Ian/Weilin diagnosed ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
071b117e |
| 12-Aug-2024 |
Ian Rogers <irogers@google.com> |
perf stat: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.
perf stat: Use perf_tool__init()
Use perf_tool__init() so that more uses of 'struct perf_tool' can be const and not relying on perf_tool__fill_defaults().
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-17-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
30f29bae |
| 12-Aug-2024 |
Ian Rogers <irogers@google.com> |
perf tool: Constify tool pointers
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be co
perf tool: Constify tool pointers
The tool pointer (to a struct largely of function pointers) is passed around but is unchanged except at initialization. Change parameter and variable types to be const to lower the possibilities of what could happen with a tool.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Leo Yan <leo.yan@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nick Terrell <terrelln@fb.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
966854e7 |
| 03-Jul-2024 |
Namhyung Kim <namhyung@kernel.org> |
perf bpf-filter: Pass 'target' to perf_bpf_filter__prepare()
This is needed to prepare target-specific actions in the later patch. We want to reuse the pinned BPF program and map for regular users t
perf bpf-filter: Pass 'target' to perf_bpf_filter__prepare()
This is needed to prepare target-specific actions in the later patch. We want to reuse the pinned BPF program and map for regular users to profile their own processes.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: KP Singh <kpsingh@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20240703223035.2024586-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
6828d692 |
| 03-May-2024 |
Ian Rogers <irogers@google.com> |
perf evsel: Refactor tool events
Tool events unnecessarily open a dummy perf event which is useless even with `perf record` which will still open a dummy event. Change the behavior of tool events so
perf evsel: Refactor tool events
Tool events unnecessarily open a dummy perf event which is useless even with `perf record` which will still open a dummy event. Change the behavior of tool events so:
- duration_time - call `rdclock` on open and then report the count as a delta since the start in evsel__read_counter. This moves code out of builtin-stat making it more general purpose.
- user_time/system_time - open the fd as either `/proc/pid/stat` or `/proc/stat` for cases like system wide. evsel__read_counter will read the appropriate field out of the procfs file. These values were previously supplied by wait4, if the procfs read fails then the wait4 values are used, assuming the process/thread terminated. By reading user_time and system_time this way, interval mode, per PID and per CPU can be supported although there are restrictions given what the files provide (e.g. per PID can't be combined with per CPU).
Opening any of the tool events for `perf record` is changed to return invalid.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Weilin Wang <weilin.wang@intel.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: James Clark <james.clark@arm.com> Cc: Dmitrii Dolgov <9erthalion6@gmail.com> Cc: Ze Gao <zegao2021@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Leo Yan <leo.yan@linux.dev> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240503232849.17752-1-irogers@google.com
show more ...
|
#
f5803651 |
| 05-Jun-2024 |
Ian Rogers <irogers@google.com> |
perf stat: Choose the most disaggregate command line option
When multiple aggregation options are passed to perf stat the behavior isn't clear. Consider "perf stat -A --per-socket .." and "perf stat
perf stat: Choose the most disaggregate command line option
When multiple aggregation options are passed to perf stat the behavior isn't clear. Consider "perf stat -A --per-socket .." and "perf stat --per-socket -A ..", the first won't aggregate at all while the second will do per-socket aggregation, even though the same options were passed.
Rather than set an enum value, gather the options in a struct and process them from most to least aggregate. This ensures the least aggregate option always applies, so no aggregation if "-A" is passed.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605063828.195700-2-irogers@google.com
show more ...
|
#
0dddd91a |
| 05-Jun-2024 |
Ian Rogers <irogers@google.com> |
perf stat: Make options local
Reduce the scope of stat_options to cmd_stat, and pass as an argument to __cmd_record. This is done to make more localized changes to the options in later patches. A si
perf stat: Make options local
Reduce the scope of stat_options to cmd_stat, and pass as an argument to __cmd_record. This is done to make more localized changes to the options in later patches. A side-effect of the change is to reduce the size of a stripped PIE perf binary by 5952 bytes. The savings come mainly in the dynamic relocation section.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Stephane Eranian <eranian@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240605063828.195700-1-irogers@google.com
show more ...
|
#
a8cd4766 |
| 07-May-2024 |
Ian Rogers <irogers@google.com> |
perf cpumap: Remove refcnt from 'struct cpu_aggr_map'
It is assigned a value of 1 and never incremented. Remove and replace puts with delete.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adri
perf cpumap: Remove refcnt from 'struct cpu_aggr_map'
It is assigned a value of 1 and never incremented. Remove and replace puts with delete.
Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Gainey <ben.gainey@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Li Dong <lidong@vivo.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sun Haiyong <sunhaiyong@loongson.cn> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: Yicong Yang <yangyicong@hisilicon.com> Link: https://lore.kernel.org/r/20240507183545.1236093-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
03f23570 |
| 12-Apr-2024 |
Weilin Wang <weilin.wang@intel.com> |
perf stat: Add new field in stat_config to enable hardware aware grouping
Hardware counter and event information could be used to help creating event groups that better utilize hardware counters and
perf stat: Add new field in stat_config to enable hardware aware grouping
Hardware counter and event information could be used to help creating event groups that better utilize hardware counters and improve multiplexing.
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Weilin Wang <weilin.wang@intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Caleb Biggers <caleb.biggers@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Perry Taylor <perry.taylor@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Samantha Alt <samantha.alt@intel.com> Link: https://lore.kernel.org/r/20240412210756.309828-2-weilin.wang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
954ac1b4 |
| 02-Feb-2024 |
Ian Rogers <irogers@google.com> |
perf stat: Remove duplicate cpus_map_matched function
Use libperf's perf_cpu_map__equal() that performs the same function.
Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <
perf stat: Remove duplicate cpus_map_matched function
Use libperf's perf_cpu_map__equal() that performs the same function.
Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-8-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
3e5deb70 |
| 02-Feb-2024 |
Ian Rogers <irogers@google.com> |
perf cpumap: Clean up use of perf_cpu_map__has_any_cpu_or_is_empty
Most uses of what was perf_cpu_map__empty but is now perf_cpu_map__has_any_cpu_or_is_empty want to do something with the CPU map if
perf cpumap: Clean up use of perf_cpu_map__has_any_cpu_or_is_empty
Most uses of what was perf_cpu_map__empty but is now perf_cpu_map__has_any_cpu_or_is_empty want to do something with the CPU map if it contains CPUs. Replace uses of perf_cpu_map__has_any_cpu_or_is_empty with other helpers so that CPUs within the map can be handled.
Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Link: https://lore.kernel.org/r/20240202234057.2085863-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
cbc917a1 |
| 08-Feb-2024 |
Yicong Yang <yangyicong@hisilicon.com> |
perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel Ja
perf stat: Support per-cluster aggregation
Some platforms have 'cluster' topology and CPUs in the cluster will share resources like L3 Cache Tag (for HiSilicon Kunpeng SoC) or L2 cache (for Intel Jacobsville). Currently parsing and building cluster topology have been supported since [1].
perf stat has already supported aggregation for other topologies like die or socket, etc. It'll be useful to aggregate per-cluster to find problems like L3T bandwidth contention.
This patch add support for "--per-cluster" option for per-cluster aggregation. Also update the docs and related test. The output will be like:
[root@localhost tmp]# perf stat -a -e LLC-load --per-cluster -- sleep 5
Performance counter stats for 'system wide':
S56-D0-CLS158 4 1,321,521,570 LLC-load S56-D0-CLS594 4 794,211,453 LLC-load S56-D0-CLS1030 4 41,623 LLC-load S56-D0-CLS1466 4 41,646 LLC-load S56-D0-CLS1902 4 16,863 LLC-load S56-D0-CLS2338 4 15,721 LLC-load S56-D0-CLS2774 4 22,671 LLC-load [...]
On a legacy system without cluster or cluster support, the output will be look like: [root@localhost perf]# perf stat -a -e cycles --per-cluster -- sleep 1
Performance counter stats for 'system wide':
S56-D0-CLS0 64 18,011,485 cycles S7182-D0-CLS0 64 16,548,835 cycles
Note that this patch doesn't mix the cluster information in the outputs of --per-core to avoid breaking any tools/scripts using it.
Note that perf recently supports "--per-cache" aggregation, but it's not the same with the cluster although cluster CPUs may share some cache resources. For example on my machine all clusters within a die share the same L3 cache: $ cat /sys/devices/system/cpu/cpu0/cache/index3/shared_cpu_list 0-31 $ cat /sys/devices/system/cpu/cpu0/topology/cluster_cpus_list 0-3
[1] commit c5e22feffdd7 ("topology: Represent clusters of CPUs within a die")
Tested-by: Jie Zhan <zhanjie9@hisilicon.com> Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: james.clark@arm.com Cc: 21cnbao@gmail.com Cc: prime.zeng@hisilicon.com Cc: Jonathan.Cameron@huawei.com Cc: fanghao11@huawei.com Cc: linuxarm@huawei.com Cc: tim.c.chen@intel.com Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240208024026.2691-1-yangyicong@huawei.com
show more ...
|
#
6f33e6fa |
| 14-Dec-2023 |
Ian Rogers <irogers@google.com> |
perf stat: Combine the -A/--no-aggr and --no-merge options
The -A or --no-aggr option disables aggregation of core events:
$ perf stat -A -e cycles,data_total -a true
Performance counter stat
perf stat: Combine the -A/--no-aggr and --no-merge options
The -A or --no-aggr option disables aggregation of core events:
$ perf stat -A -e cycles,data_total -a true
Performance counter stats for 'system wide':
CPU0 1,287,665 cycles CPU1 1,831,681 cycles CPU2 27,345,998 cycles CPU3 1,964,799 cycles CPU4 236,174 cycles CPU5 3,302,825 cycles CPU6 9,201,446 cycles CPU7 1,403,043 cycles CPU0 110.90 MiB data_total
0.008961761 seconds time elapsed
The --no-merge option disables the aggregation of uncore events:
$ perf stat --no-merge -e cycles,data_total -a true
Performance counter stats for 'system wide':
38,482,778 cycles 15.04 MiB data_total [uncore_imc_free_running_1] 15.00 MiB data_total [uncore_imc_free_running_0]
0.005915155 seconds time elapsed
Having two options confuses users who generally don't appreciate the difference in PMUs. Keep all the options but make it so they all disable aggregation both of core and uncore events:
$ perf stat -A -e cycles,data_total -a true
Performance counter stats for 'system wide':
CPU0 85,878 cycles CPU1 88,179 cycles CPU2 60,872 cycles CPU3 3,265,567 cycles CPU4 82,357 cycles CPU5 83,383 cycles CPU6 84,156 cycles CPU7 220,803 cycles CPU0 2.38 MiB data_total [uncore_imc_free_running_0] CPU0 2.38 MiB data_total [uncore_imc_free_running_1]
0.001397205 seconds time elapsed
Update the relevant 'perf stat' man page information.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kaige Ye <ye@kaige.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231214060256.2094017-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
923ca62a |
| 29-Nov-2023 |
Ian Rogers <irogers@google.com> |
libperf cpumap: Rename perf_cpu_map__empty() to perf_cpu_map__has_any_cpu_or_is_empty()
The name perf_cpu_map_empty is misleading as true is also returned when the map contains an "any" CPU (aka dum
libperf cpumap: Rename perf_cpu_map__empty() to perf_cpu_map__has_any_cpu_or_is_empty()
The name perf_cpu_map_empty is misleading as true is also returned when the map contains an "any" CPU (aka dummy) map.
Rename to perf_cpu_map__has_any_cpu_or_is_empty(), later changes will (re)introduce perf_cpu_map__empty() and perf_cpu_map__has_any_cpu().
Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Andrew Jones <ajones@ventanamicro.com> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Atish Patra <atishp@rivosinc.com> Cc: Changbin Du <changbin.du@huawei.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Paran Lee <p4ranlee@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Steinar H. Gunderson <sesse@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Yang Li <yang.lee@linux.alibaba.com> Cc: Yanteng Si <siyanteng@loongson.cn> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/r/20231129060211.1890454-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
8596ba32 |
| 29-Nov-2023 |
Ian Rogers <irogers@google.com> |
perf stat: Fix help message for --metric-no-threshold option
Copy-paste error led to help message for metric-no-threshold repeating that of metric-no-merge.
Fixes: 1fd09e299bdd434b ("perf metric: A
perf stat: Fix help message for --metric-no-threshold option
Copy-paste error led to help message for metric-no-threshold repeating that of metric-no-merge.
Fixes: 1fd09e299bdd434b ("perf metric: Add --metric-no-threshold option") Reported-by: Stephane Eranian <eranian@google.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231129223540.2247030-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
0713ab3b |
| 06-Dec-2023 |
Ian Rogers <irogers@google.com> |
perf stat: Exit perf stat if parse groups fails
Metrics were added by a callback but commit a4b8cfcabb1d90ec ("perf stat: Delay metric parsing") postponed this to allow optimizations based on the CP
perf stat: Exit perf stat if parse groups fails
Metrics were added by a callback but commit a4b8cfcabb1d90ec ("perf stat: Delay metric parsing") postponed this to allow optimizations based on the CPU configuration.
In doing so it stopped errors in metric parsing from causing 'perf stat' termination.
This change adds the termination for bad metric names back in.
Fixes: a4b8cfcabb1d90ec ("perf stat: Delay metric parsing") Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Closes: https://lore.kernel.org/lkml/ZXByT1K6enTh2EHT@kernel.org/ Link: https://lore.kernel.org/r/20231206183533.972028-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
eb2eac0c |
| 21-Nov-2023 |
Ian Rogers <irogers@google.com> |
perf evsel: Fallback to "task-clock" when not system wide
When the "cycles" event isn't available evsel will fallback to the "cpu-clock" software event.
"task-clock" is similar to "cpu-clock" but o
perf evsel: Fallback to "task-clock" when not system wide
When the "cycles" event isn't available evsel will fallback to the "cpu-clock" software event.
"task-clock" is similar to "cpu-clock" but only runs when the process is running.
Falling back to "cpu-clock" when not system wide leads to confusion, by falling back to "task-clock" it is hoped the confusion is less.
Pass the target to determine if "task-clock" is more appropriate.
Update a nearby comment and debug string for the change.
Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ajay Kaher <akaher@vmware.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Makhalov <amakhalov@vmware.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20231121000420.368075-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
a84fbf20 |
| 06-Sep-2023 |
Ian Rogers <irogers@google.com> |
perf stat: Fix aggr mode initialization
Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory
perf stat: Fix aggr mode initialization
Generating metrics llc_code_read_mpi_demand_plus_prefetch, llc_data_read_mpi_demand_plus_prefetch, llc_miss_local_memory_bandwidth_read, llc_miss_local_memory_bandwidth_write, nllc_miss_remote_memory_bandwidth_read, memory_bandwidth_read, memory_bandwidth_write, uncore_frequency, upi_data_transmit_bw, C2_Pkg_Residency, C3_Core_Residency, C3_Pkg_Residency, C6_Core_Residency, C6_Pkg_Residency, C7_Core_Residency, C7_Pkg_Residency, UNCORE_FREQ and tma_info_system_socket_clks would trigger an address sanitizer heap-buffer-overflows on a SkylakeX.
``` ==2567752==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x5020003ed098 at pc 0x5621a816654e bp 0x7fffb55d4da0 sp 0x7fffb55d4d98 READ of size 4 at 0x5020003eee78 thread T0 #0 0x558265d6654d in aggr_cpu_id__is_empty tools/perf/util/cpumap.c:694:12 #1 0x558265c914da in perf_stat__get_aggr tools/perf/builtin-stat.c:1490:6 #2 0x558265c914da in perf_stat__get_global_cached tools/perf/builtin-stat.c:1530:9 #3 0x558265e53290 in should_skip_zero_counter tools/perf/util/stat-display.c:947:31 #4 0x558265e53290 in print_counter_aggrdata tools/perf/util/stat-display.c:985:18 #5 0x558265e51931 in print_counter tools/perf/util/stat-display.c:1110:3 #6 0x558265e51931 in evlist__print_counters tools/perf/util/stat-display.c:1571:5 #7 0x558265c8ec87 in print_counters tools/perf/builtin-stat.c:981:2 #8 0x558265c8cc71 in cmd_stat tools/perf/builtin-stat.c:2837:3 #9 0x558265bb9bd4 in run_builtin tools/perf/perf.c:323:11 #10 0x558265bb98eb in handle_internal_command tools/perf/perf.c:377:8 #11 0x558265bb9389 in run_argv tools/perf/perf.c:421:2 #12 0x558265bb9389 in main tools/perf/perf.c:537:3 ```
The issue was the use of testing a cpumap with NULL rather than using empty, as a map containing the dummy value isn't NULL and the -1 results in an empty aggr map being allocated which legitimately overflows when any member is accessed.
Fixes: 8a96f454f5668572 ("perf stat: Avoid SEGV if core.cpus isn't set") Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230906003912.3317462-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
db1f5f10 |
| 14-Jun-2023 |
Yang Jihong <yangjihong1@huawei.com> |
perf stat: Add missing newline in pr_err messages
The newline is missing for error messages in add_default_attributes()
Before:
# perf stat --topdown Topdown requested but the topdown metric g
perf stat: Add missing newline in pr_err messages
The newline is missing for error messages in add_default_attributes()
Before:
# perf stat --topdown Topdown requested but the topdown metric groups aren't present. (See perf list the metric groups have names like TopdownL1)#
After:
# perf stat --topdown Topdown requested but the topdown metric groups aren't present. (See perf list the metric groups have names like TopdownL1) #
In addition, perf_stat_init_aggr_mode() and perf_stat_init_aggr_mode_file() have the same problem, fixed by the way.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Reviewed-by: James Clark <james.clark@arm.com> Link: https://lore.kernel.org/r/20230614021505.59856-1-yangjihong1@huawei.com Signed-off-by: Namhyung Kim <namhyung@kernel.org>
show more ...
|
#
dada1a1f |
| 16-Jun-2023 |
Namhyung Kim <namhyung@kernel.org> |
perf stat: Show average value on multiple runs
When -r option is used, perf stat runs the command multiple times and update stats in the evsel->stats.res_stats for global aggregation. But the value
perf stat: Show average value on multiple runs
When -r option is used, perf stat runs the command multiple times and update stats in the evsel->stats.res_stats for global aggregation. But the value is never used and the value it prints at the end is just the value from the last run. I think we should print the average number of multiple runs.
Add evlist__copy_res_stats() to update the aggr counter (for display) using the values in the evsel->stats.res_stats.
Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230616073211.1057936-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
ed4090a2 |
| 16-Jun-2023 |
Namhyung Kim <namhyung@kernel.org> |
perf stat: Reset aggr stats for each run
When it runs multiple times with -r option, it missed to reset the aggregation counters and the values were added up. The aggregation count has the values t
perf stat: Reset aggr stats for each run
When it runs multiple times with -r option, it missed to reset the aggregation counters and the values were added up. The aggregation count has the values to be printed in the end. It should reset the counters at the beginning of each run. But the current code does that only when -I/--interval-print option is given.
Fixes: 91f85f98da7ab8c3 ("perf stat: Display event stats using aggr counts") Reported-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230616073211.1057936-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
6a80d794 |
| 16-Jun-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf stat: New metricgroup output for the default mode
In the default mode, the current output of the metricgroup include both events and metrics, which is not necessary and just makes the output ha
perf stat: New metricgroup output for the default mode
In the default mode, the current output of the metricgroup include both events and metrics, which is not necessary and just makes the output hard to read. Since different ARCHs (even different generations in the same ARCH) may use different events. The output also vary on different platforms.
For a metricgroup, only outputting the value of each metric is good enough.
Add a new field default_metricgroup in evsel to indicate an event of the default metricgroup. For those events, printout() should print the metricgroup name rather than each event.
Add perf_stat__skip_metric_event() to skip the evsel in the Default metricgroup, if it's not running or not the metric event.
Add print_metricgroup_header_t to pass the functions which print the display name of each metricgroup in the Default metricgroup. Support all three output methods.
Factor out perf_stat__print_shadow_stats_metricgroup() to print out each metrics.
On SPR:
Before:
./perf_old stat sleep 1
Performance counter stats for 'sleep 1':
0.54 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 125.445 K/sec 540,970 cycles:u # 0.998 GHz 556,325 instructions:u # 1.03 insn per cycle 123,602 branches:u # 228.018 M/sec 6,889 branch-misses:u # 5.57% of all branches 3,245,820 TOPDOWN.SLOTS:u # 18.4 % tma_backend_bound # 17.2 % tma_retiring # 23.1 % tma_bad_speculation # 41.4 % tma_frontend_bound 564,859 topdown-retiring:u 1,370,999 topdown-fe-bound:u 603,271 topdown-be-bound:u 744,874 topdown-bad-spec:u 12,661 INT_MISC.UOP_DROPPING:u # 23.357 M/sec
1.001798215 seconds time elapsed
0.000193000 seconds user 0.001700000 seconds sys
After:
$ ./perf stat sleep 1
Performance counter stats for 'sleep 1':
0.51 msec task-clock:u # 0.001 CPUs utilized 0 context-switches:u # 0.000 /sec 0 cpu-migrations:u # 0.000 /sec 68 page-faults:u # 132.683 K/sec 545,228 cycles:u # 1.064 GHz 555,509 instructions:u # 1.02 insn per cycle 123,574 branches:u # 241.120 M/sec 6,957 branch-misses:u # 5.63% of all branches TopdownL1 # 17.5 % tma_backend_bound # 22.6 % tma_bad_speculation # 42.7 % tma_frontend_bound # 17.1 % tma_retiring TopdownL2 # 21.8 % tma_branch_mispredicts # 11.5 % tma_core_bound # 13.4 % tma_fetch_bandwidth # 29.3 % tma_fetch_latency # 2.7 % tma_heavy_operations # 14.5 % tma_light_operations # 0.8 % tma_machine_clears # 6.1 % tma_memory_bound
1.001712086 seconds time elapsed
0.000151000 seconds user 0.001618000 seconds sys
Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230616031420.3751973-3-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
b0a9e8f8 |
| 15-Jun-2023 |
Kan Liang <kan.liang@linux.intel.com> |
perf stat,jevents: Introduce Default tags for the default mode
Introduce a new metricgroup, Default, to tag all the metric groups which will be collected in the default mode.
Add a new field, Defau
perf stat,jevents: Introduce Default tags for the default mode
Introduce a new metricgroup, Default, to tag all the metric groups which will be collected in the default mode.
Add a new field, DefaultMetricgroupName, in the JSON file to indicate the real metric group name. It will be printed in the default output to replace the event names.
There is nothing changed for the output format.
On SPR, both TopdownL1 and TopdownL2 are displayed in the default output.
On ARM, Intel ICL and later platforms (before SPR), only TopdownL1 is displayed in the default output.
Suggested-by: Stephane Eranian <eranian@google.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ahmad Yasin <ahmad.yasin@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20230615135315.3662428-4-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|