xref: /freebsd/usr.sbin/pmcstat/pmcstat.8 (revision 535af610)
1.\" Copyright (c) 2003-2008 Joseph Koshy
2.\" Copyright (c) 2007 The FreeBSD Foundation
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions and the following disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\"
14.\" This software is provided by Joseph Koshy ``as is'' and
15.\" any express or implied warranties, including, but not limited to, the
16.\" implied warranties of merchantability and fitness for a particular purpose
17.\" are disclaimed.  in no event shall Joseph Koshy be liable
18.\" for any direct, indirect, incidental, special, exemplary, or consequential
19.\" damages (including, but not limited to, procurement of substitute goods
20.\" or services; loss of use, data, or profits; or business interruption)
21.\" however caused and on any theory of liability, whether in contract, strict
22.\" liability, or tort (including negligence or otherwise) arising in any way
23.\" out of the use of this software, even if advised of the possibility of
24.\" such damage.
25.\"
26.\" $FreeBSD$
27.\"
28.Dd May 31, 2023
29.Dt PMCSTAT 8
30.Os
31.Sh NAME
32.Nm pmcstat
33.Nd "performance measurement with performance monitoring hardware"
34.Sh SYNOPSIS
35.Nm
36.Op Fl A
37.Op Fl C
38.Op Fl D Ar pathname
39.Op Fl E
40.Op Fl F Ar pathname
41.Op Fl G Ar pathname
42.Op Fl I
43.Op Fl L
44.Op Fl M Ar mapfilename
45.Op Fl N
46.Op Fl O Ar logfilename
47.Op Fl P Ar event-spec
48.Op Fl R Ar logfilename
49.Op Fl S Ar event-spec
50.Op Fl T
51.Op Fl U
52.Op Fl W
53.Op Fl a Ar pathname
54.Op Fl c Ar cpu-spec
55.Op Fl d
56.Op Fl e
57.Op Fl f Ar pluginopt
58.Op Fl g
59.Op Fl i Ar lwp
60.Op Fl l Ar secs
61.Op Fl m Ar pathname
62.Op Fl n Ar rate
63.Op Fl o Ar outputfile
64.Op Fl p Ar event-spec
65.Op Fl q
66.Op Fl r Ar fsroot
67.Op Fl s Ar event-spec
68.Op Fl t Ar process-spec
69.Op Fl u Ar event-spec
70.Op Fl v
71.Op Fl w Ar secs
72.Op Fl z Ar graphdepth
73.Op Ar command Op Ar args
74.Sh DESCRIPTION
75The
76.Nm
77utility measures system performance using the facilities provided by
78.Xr hwpmc 4 .
79.Pp
80The
81.Nm
82utility can measure both hardware events seen by the system as a
83whole, and those seen when a specified set of processes are executing
84on the system's CPUs.
85If a specific set of processes is being targeted (for example,
86if the
87.Fl t Ar process-spec
88option is specified, or if a command line is specified using
89.Ar command ) ,
90then measurement occurs till
91.Ar command
92exits, or till all target processes specified by the
93.Fl t Ar process-spec
94options exit, or till the
95.Nm
96utility is interrupted by the user.
97If a specific set of processes is not targeted for measurement, then
98.Nm
99will perform system-wide measurements till interrupted by the
100user.
101.Pp
102A given invocation of
103.Nm
104can mix allocations of system-mode and process-mode PMCs, of both
105counting and sampling flavors.
106The values of all counting PMCs are printed in human readable form
107at regular intervals by
108.Nm .
109The format of
110.Nm Ns 's
111human-readable textual output is not stable, and could change
112in the future.
113The output of sampling PMCs may be configured to go to a log file for
114subsequent offline analysis, or, at the expense of greater
115overhead, may be configured to be printed in text form on the fly.
116.Pp
117Hardware events to measure are specified to
118.Nm
119using event specifier strings
120.Ar event-spec .
121The syntax of these event specifiers is machine dependent and is
122documented in
123.Xr pmc 3 .
124.Pp
125A process-mode PMC may be configured to be inheritable by the target
126process' current and future children.
127.Sh OPTIONS
128The following options are available:
129.Bl -tag -width indent
130.It Fl A
131Skip symbol lookup and display address instead.
132.It Fl C
133Toggle between showing cumulative or incremental counts for
134subsequent counting mode PMCs specified on the command line.
135The default is to show incremental counts.
136.It Fl D Ar pathname
137Create files with per-program samples in the directory named
138by
139.Ar pathname .
140The default is to create these files in the current directory.
141.It Fl E
142Toggle showing per-process counts at the time a tracked process
143exits for subsequent process-mode PMCs specified on the command line.
144This option is useful for mapping the performance characteristics of a
145complex pipeline of processes when used in conjunction with the
146.Fl d
147option.
148The default is to not to enable per-process tracking.
149.It Fl F Ar pathname
150Print calltree (Kcachegrind) information to file
151.Ar pathname .
152If argument
153.Ar pathname
154is a
155.Dq Li -
156this information is sent to the output file specified by the
157.Fl o
158option.
159.It Fl G Ar pathname
160Print callchain information to file
161.Ar pathname .
162If argument
163.Ar pathname
164is a
165.Dq Li -
166this information is sent to the output file specified by the
167.Fl o
168option.
169.It Fl I
170Show the offset of the instruction pointer into the symbol.
171.It Fl L
172List all event names.
173.It Fl M Ar mapfilename
174Write the mapping between executable objects encountered in the event
175log and the abbreviated pathnames used for
176.Xr gprof 1
177profiles to file
178.Ar mapfilename .
179If this option is not specified, mapping information is not written.
180Argument
181.Ar mapfilename
182may be a
183.Dq Li -
184in which case this mapping information is sent to the output
185file configured by the
186.Fl o
187option.
188.It Fl N
189Toggle capturing callchain information for subsequent sampling PMCs.
190The default is for sampling PMCs to capture callchain information.
191.It Fl O Ar logfilename
192Send logging output to file
193.Ar logfilename .
194If
195.Ar logfilename
196is of the form
197.Ar hostname Ns : Ns Ar port ,
198where
199.Ar hostname
200does not start with a
201.Ql \&.
202or a
203.Ql / ,
204then
205.Nm
206will open a network socket to host
207.Ar hostname
208on port
209.Ar port .
210.Pp
211If the
212.Fl O
213option is not specified and one of the logging options is requested,
214then
215.Nm
216will print a textual form of the logged events to the configured
217output file.
218.It Fl P Ar event-spec
219Allocate a process mode sampling PMC measuring hardware events
220specified in
221.Ar event-spec .
222.It Fl R Ar logfilename
223Perform offline analysis using sampling data in file
224.Ar logfilename .
225.It Fl S Ar event-spec
226Allocate a system mode sampling PMC measuring hardware events
227specified in
228.Ar event-spec .
229.It Fl T
230Use a
231.Xr top 1 Ns -like
232mode for sampling PMCs.
233The following hotkeys can be used:
234.Pp
235.Bl -tag -compact -width "Ctrl+a" -offset 4n
236.It Ic A
237Toggle symbol resolution
238.Sm off
239.It Ic Ctrl + a
240.Sm on
241Switch to accumulative mode
242.Sm off
243.It Ic Ctrl + d
244.Sm on
245Switch to delta mode
246.It Ic f
247Represent the
248.Dq f
249cost under
250threshold as a dot (calltree only)
251.It Ic I
252Toggle showing offsets into symbols
253.It Ic m
254Merge PMCs
255.It Ic n
256Change view
257.It Ic p
258Show next PMC
259.It Ic q
260Quit
261.It Ic Space
262Pause
263.El
264.It Fl U
265Toggle capturing user-space call traces while in kernel mode.
266The default is for sampling PMCs to capture user-space callchain information
267while in user-space mode, and kernel callchain information while in kernel mode.
268.It Fl W
269Toggle logging the incremental counts seen by the threads of a
270tracked process each time they are scheduled on a CPU.
271This is an experimental feature intended to help analyse the
272dynamic behaviour of processes in the system.
273It may incur substantial overhead if enabled.
274The default is for this feature to be disabled.
275.It Fl a Ar pathname
276Perform a symbol and file:line lookup for each address in each
277callgraph and save the output to
278.Ar pathname .
279Unlike
280.Fl m
281that only resolves the first symbol in the graph, this resolves
282every node in the callgraph, or prints out addresses if no
283lookup information is available.
284This option requires the
285.Fl R
286option to read in samples that were previously collected and
287saved with the
288.Fl O
289option.
290.It Fl c Ar cpu-spec
291Set the cpus for subsequent system mode PMCs specified on the
292command line to
293.Ar cpu-spec .
294Argument
295.Ar cpu-spec
296is a comma separated list of CPU numbers, or the literal
297.Sq *
298denoting all available CPUs.
299The default is to allocate system mode PMCs on all available
300CPUs.
301.It Fl d
302Toggle between process mode PMCs measuring events for the target
303process' current and future children or only measuring events for
304the target process.
305The default is to measure events for the target process alone.
306(it has to be passed in the command line prior to
307.Fl p ,
308.Fl s ,
309.Fl P ,
310or
311.Fl S ) .
312.It Fl e
313Specify that the gprof profile files will use a wide history counter.
314These files are produced in a format compatible with
315.Xr gprof 1 .
316However, other tools that cannot fully parse a BSD-style
317gmon header might be unable to correctly parse these files.
318.It Fl f Ar pluginopt
319Pass option string to the active plugin.
320.br
321threshold=<float> do not display cost under specified value (Top).
322.br
323skiplink=0|1 replace node with cost under threshold by a dot (Top).
324.It Fl g
325Produce profiles in a format compatible with
326.Xr gprof 1 .
327A separate profile file is generated for each executable object
328encountered.
329Profile files are placed in sub-directories named by their PMC
330event name.
331.It Fl i Ar lwp
332Filter on thread ID
333.Ar lwp ,
334which you can get from
335.Xr ps 1
336.Fl o
337.Li lwp .
338.It Fl l Ar secs
339Set system-wide performance measurement duration for
340.Ar secs
341seconds.
342The argument
343.Ar secs
344may be a fractional value.
345.It Fl m Ar pathname
346Print the sampled PCs with the name, the start and ending addresses
347of the function within they live.
348The
349.Ar pathname
350argument is mandatory and indicates where the information will be stored.
351If argument
352.Ar pathname
353is a
354.Dq Li -
355this information is sent to the output file specified by the
356.Fl o
357option.
358This option requires the
359.Fl R
360option to read in samples that were previously collected and
361saved with the
362.Fl O
363option.
364.It Fl n Ar rate
365Set the default sampling rate for subsequent sampling mode
366PMCs specified on the command line.
367The default is to configure PMCs to sample the CPU's instruction
368pointer every 65536 events.
369.It Fl o Ar outputfile
370Send counter readings and textual representations of logged data
371to file
372.Ar outputfile .
373The default is to send output to
374.Pa stderr
375when collecting live data and to
376.Pa stdout
377when processing a pre-existing logfile.
378.It Fl p Ar event-spec
379Allocate a process mode counting PMC measuring hardware events
380specified in
381.Ar event-spec .
382.It Fl q
383Decrease verbosity.
384.It Fl r Ar fsroot
385Set the top of the filesystem hierarchy under which executables
386are located to argument
387.Ar fsroot .
388The default is
389.Pa / .
390.It Fl s Ar event-spec
391Allocate a system mode counting PMC measuring hardware events
392specified in
393.Ar event-spec .
394.It Fl t Ar process-spec
395Attach process mode PMCs to the processes named by argument
396.Ar process-spec .
397Argument
398.Ar process-spec
399may be a non-negative integer denoting a specific process id, or a
400regular expression for selecting processes based on their command names.
401.It Fl u Ar event-spec
402Provide short description of event.
403.It Fl v
404Increase verbosity.
405.It Fl w Ar secs
406Print the values of all counting mode PMCs or sampling mode PMCs
407for top mode every
408.Ar secs
409seconds.
410The argument
411.Ar secs
412may be a fractional value.
413The default interval is 5 seconds.
414.It Fl z Ar graphdepth
415When printing system-wide callgraphs, limit callgraphs to the depth
416specified by argument
417.Ar graphdepth .
418.El
419.Pp
420If
421.Ar command
422is specified, it is executed using
423.Xr execvp 3 .
424.Sh EXAMPLES
425To perform system-wide statistical sampling on an AMD Athlon CPU with
426samples taken every 32768 instruction retirals and data being sampled
427to file
428.Pa sample.stat ,
429use:
430.Dl "pmcstat -O sample.stat -n 32768 -S k7-retired-instructions"
431.Pp
432To execute
433.Nm firefox
434and measure the number of data cache misses suffered
435by it and its children every 12 seconds on an AMD Athlon, use:
436.Dl "pmcstat -d -w 12 -p k7-dc-misses firefox"
437.Pp
438To measure instructions retired for all processes named
439.Dq emacs
440use:
441.Dl "pmcstat -t '^emacs$' -p instructions"
442.Pp
443To measure instructions retired for processes named
444.Dq emacs
445for a period of 10 seconds use:
446.Dl "pmcstat -t '^emacs$' -p instructions sleep 10"
447.Pp
448To count instruction tlb-misses on CPUs 0 and 2 on a Intel
449Pentium Pro/Pentium III SMP system use:
450.Dl "pmcstat -c 0,2 -s p6-itlb-miss"
451.Pp
452To collect profiling information for a specific process with pid 1234
453based on instruction cache misses seen by it use:
454.Dl "pmcstat -P ic-misses -t 1234 -O /tmp/sample.out"
455.Pp
456To perform system-wide sampling on all configured processors
457based on processor instructions retired use:
458.Dl "pmcstat -S instructions -O /tmp/sample.out"
459If callgraph capture is not desired use:
460.Dl "pmcstat -N -S instructions -O /tmp/sample.out"
461.Pp
462To send the generated event log to a remote machine use:
463.Dl "pmcstat -S instructions -O remotehost:port"
464On the remote machine, the sample log can be collected using
465.Xr nc 1 :
466.Dl "nc -l remotehost port > /tmp/sample.out"
467.Pp
468To generate
469.Xr gprof 1
470compatible profiles from a sample file use:
471.Dl "pmcstat -R /tmp/sample.out -g"
472.Pp
473To print a system-wide profile with callgraphs to file
474.Pa "foo.graph"
475use:
476.Dl "pmcstat -R /tmp/sample.out -G foo.graph"
477.Sh DIAGNOSTICS
478If option
479.Fl v
480is specified,
481.Nm
482may issue the following diagnostic messages:
483.Bl -diag
484.It "#callchain/dubious-frames"
485The number of callchain records that had an
486.Dq impossible
487value for a return address.
488.It "#exec handling errors"
489The number of
490.Xr exec 2
491events in the log file that named executables that could not be
492analyzed.
493.It "#exec/elf"
494The number of
495.Xr exec 2
496events that named ELF executables.
497.It "#exec/unknown"
498The number of
499.Xr exec 2
500events that named executables with unrecognized formats.
501.It "#samples/total"
502The total number of samples in the log file.
503.It "#samples/unclaimed"
504The number of samples that could not be correlated to a known
505executable object (i.e., to an executable, shared library, the
506kernel or the runtime loader).
507.It "#samples/unknown-object"
508The number of samples that were associated with an executable
509with an unrecognized object format.
510.El
511.Pp
512.Ex -std
513.Sh COMPATIBILITY
514Due to the limitations of the
515.Pa gmon.out
516file format,
517.Xr gprof 1
518compatible profiles generated by the
519.Fl g
520option do not contain information about calls that cross executable
521boundaries.
522The generated
523.Pa gmon.out
524files are also only meaningful for native executables.
525.Sh SEE ALSO
526.Xr gprof 1 ,
527.Xr nc 1 ,
528.Xr execvp 3 ,
529.Xr pmc 3 ,
530.Xr pmclog 3 ,
531.Xr hwpmc 4 ,
532.Xr pmccontrol 8 ,
533.Xr sysctl 8
534.Sh HISTORY
535The
536.Nm
537utility first appeared in
538.Fx 6.0 .
539.Sh AUTHORS
540.An Joseph Koshy Aq Mt jkoshy@FreeBSD.org
541.Sh BUGS
542The
543.Nm
544utility cannot yet analyse
545.Xr hwpmc 4
546logs generated by non-native architectures.
547