1'\" te
2.\" CDDL HEADER START
3.\"
4.\" The contents of this file are subject to the terms of the
5.\" Common Development and Distribution License (the "License").
6.\" You may not use this file except in compliance with the License.
7.\"
8.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
9.\" or http://www.opensolaris.org/os/licensing.
10.\" See the License for the specific language governing permissions
11.\" and limitations under the License.
12.\"
13.\" When distributing Covered Code, include this CDDL HEADER in each
14.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
15.\" If applicable, add the following below this CDDL HEADER, with the
16.\" fields enclosed by brackets "[]" replaced with your own identifying
17.\" information: Portions Copyright [yyyy] [name of copyright owner]
18.\"
19.\" CDDL HEADER END
20.\" Copyright (c) 2008, Sun Microsystems, Inc. All Rights Reserved.
21.\"
22.\" $FreeBSD$
23.\"
24.Dd September 29, 2015
25.Dt LOCKSTAT 1
26.Os
27.Sh NAME
28.Nm lockstat
29.Nd report kernel lock and profiling statistics
30.Sh SYNOPSIS
31.Nm
32.Op Fl ACEHIV
33.Op Fl e Ar event-list
34.Op Fl i Ar rate
35.Op Fl b | t | h | s Ar depth
36.Op Fl n Ar num-records
37.Op Fl l Ar lock Oo Ns , Ns Ar size Oc
38.Op Fl d Ar duration
39.Op Fl f Ar function Oo Ns , Ns Ar size Oc
40.Op Fl T
41.Op Fl kgwWRpP
42.Op Fl D Ar count
43.Op Fl o filename
44.Op Fl x Ar opt Oo Ns = Ns Ar val Oc
45.Ar command
46.Op Oo Ar args Oc
47.Sh DESCRIPTION
48The
49.Nm
50utility gathers and displays kernel locking and profiling statistics.
51.Nm
52allows you to specify which events to watch (for example, spin on adaptive
53mutex, block on read access to rwlock due to waiting writers, and so forth), how
54much data to gather for each event, and how to display the data.
55By default,
56.Nm
57monitors all lock contention events, gathers frequency and timing data about
58those events, and displays the data in decreasing frequency order, so that the
59most common events appear first.
60.Pp
61.Nm
62gathers data until the specified command completes.
63For example, to gather statistics for a fixed-time interval, use
64.Xr sleep 1
65as the command, as follows:
66.Pp
67.Dl # lockstat sleep 5
68.Pp
69When the
70.Fl I
71option is specified,
72.Nm lockstat
73establishes a per-processor high-level periodic interrupt source to gather
74profiling data.
75The interrupt handler simply generates a
76.Nm
77event whose caller is the interrupted PC (program counter).
78The profiling event is just like any other
79.Nm lockstat
80event, so all of the normal
81.Nm lockstat
82options are applicable.
83.Pp
84.Nm
85relies on DTrace to modify the running kernel's text to intercept events of
86interest.
87This imposes a small but measurable overhead on all system activity, so access
88to
89.Nm
90is restricted to super-user by default.
91.Sh OPTIONS
92The following options are supported:
93.Bl -tag -width indent
94.It Fl V
95Print the D program used to gather the requested data.
96.El
97.Ss Event Selection
98If no event selection options are specified, the default is
99.Fl C .
100.Bl -tag -width indent
101.It Fl A
102Watch all lock events.
103.Fl A
104is equivalent to
105.Fl CH .
106.It Fl C
107Watch contention events.
108.It Fl E
109Watch error events.
110.It Fl e Ar event-list
111Only watch the specified events.
112.Ar event-list
113is a comma-separated list of events or ranges of events such as 1,4-7,35.
114Run
115.Nm
116with no arguments to get a brief description of all events.
117.It Fl H
118Watch hold events.
119.It Fl I
120Watch profiling interrupt events.
121.It Fl i Ar rate
122Interrupt rate (per second) for
123.Fl I .
124The default is 97 Hz, so that profiling doesn't run in lockstep with the clock
125interrupt (which runs at 100 Hz).
126.El
127.Ss Data Gathering
128.Bl -tag -width indent
129.It Fl x Ar arg Oo Ns = Ns Ar val Oc
130Enable or modify a
131.Xr dtrace 1
132runtime option or D compiler option.
133Boolean options are enabled by specifying their name.
134Options with values are set by separating the option name and value with an
135equals sign.
136.El
137.Ss "Data Gathering (Mutually Exclusive)"
138.Bl -tag -width indent
139.It Fl b
140Basic statistics: lock, caller, number of events.
141.It Fl h
142Histogram: timing plus time-distribution histograms.
143.It Fl s Ar depth
144Stack trace: histogram plus stack traces up to
145.Ar depth
146frames deep.
147.It Fl t
148Timing: Basic plus timing for all events (default).
149.El
150.Ss "Data Filtering"
151.Bl -tag -width indent
152.It Fl d Ar duration
153Only watch events longer than
154.Ar duration .
155.It Fl f Ar func Ns Oo Ns , Ns Ar size Oc Ns
156Only watch events generated by
157.Ar func ,
158which can be specified as a symbolic name or hex address.
159.Ar size
160defaults to the ELF symbol size if available, or 1 if not.
161.It Fl l Ar lock Ns Oo Ns , Ns Ar size Oc Ns
162Only watch
163.Ar lock ,
164which can be specified as a symbolic name or hex address.
165.Ar size
166defaults to the ELF symbol size or 1 if the symbol size is not available.
167.It Fl n Ar num-records
168Maximum number of data records.
169.It Fl T
170Trace (rather than sample) events.
171This is off by default.
172.El
173.Ss Data Reporting
174.Bl -tag -width indent
175.It Fl D Ar count
176Only display the top
177.Ar count
178events of each type.
179.It Fl g
180Show total events generated by function.
181For example, if
182.Fn foo
183calls
184.Fn bar
185in a loop, the work done by
186.Fn bar
187counts as work generated by
188.Fn foo
189(along with any work done by
190.Fn foo
191itself).
192The
193.Fl g
194option works by counting the total number of stack frames in which each function
195appears.
196This implies two things: (1) the data reported by
197.Fl g
198can be misleading if the stack traces are not deep enough, and (2) functions
199that are called recursively might show greater than 100% activity.
200In light of issue (1), the default data gathering mode when using
201.Fl g
202is
203.Fl s 50 .
204.It Fl k
205Coalesce PCs within functions.
206.It Fl o Ar filename
207Direct output to
208.Ar filename .
209.It Fl P
210Sort data by (\fIcount * time\fR) product.
211.It Fl p
212Parsable output format.
213.It Fl R
214Display rates (events per second) rather than counts.
215.It Fl W
216Whichever: distinguish events only by caller, not by lock.
217.It Fl w
218Wherever: distinguish events only by lock, not by caller.
219.El
220.Sh DISPLAY FORMATS
221The following headers appear over various columns of data.
222.Bl -tag -width indent
223.It Count or ops/s
224Number of times this event occurred, or the rate (times per second) if
225.Fl R
226was specified.
227.It indv
228Percentage of all events represented by this individual event.
229.It genr
230Percentage of all events generated by this function.
231.It cuml
232Cumulative percentage; a running total of the individuals.
233.It rcnt
234Average reference count.
235This will always be 1 for exclusive locks (mutexes,
236spin locks, rwlocks held as writer) but can be greater than 1 for shared locks
237(rwlocks held as reader).
238.It nsec
239Average duration of the events in nanoseconds, as appropriate for the event.
240For the profiling event, duration means interrupt latency.
241.It Lock
242Address of the lock; displayed symbolically if possible.
243.It CPU+Pri_Class
244CPU plus the priority class of the interrupted thread.
245For example, if CPU 4 is interrupted while running a timeshare thread, this
246will be reported as
247.Ql cpu[4]+TShar .
248.It Caller
249Address of the caller; displayed symbolically if possible.
250.El
251.Sh EXAMPLES
252.Bl -tag -width 0n
253.It Example 1 Measuring Kernel Lock Contention
254.Pp
255.Li # lockstat sleep 5
256.Bd -literal
257Adaptive mutex spin: 41411 events in 5.011 seconds (8263 events/sec)
258
259Count indv cuml rcnt     nsec Lock                   Caller
260-------------------------------------------------------------------------------
26113750  33%  33% 0.00       72 vm_page_queue_free_mtx vm_page_free_toq+0x12e
26213648  33%  66% 0.00       66 vm_page_queue_free_mtx vm_page_alloc+0x138
263 4023  10%  76% 0.00       51 vm_dom+0x80            vm_page_dequeue+0x68
264 2672   6%  82% 0.00      186 vm_dom+0x80            vm_page_enqueue+0x63
265  618   1%  84% 0.00       31 0xfffff8000cd83a88     qsyncvp+0x37
266  506   1%  85% 0.00      164 0xfffff8000cb3f098     vputx+0x5a
267  477   1%  86% 0.00       69 0xfffff8000c7eb180     uma_dbg_getslab+0x5b
268  288   1%  87% 0.00       77 0xfffff8000cd8b000     vn_finished_write+0x29
269  263   1%  88% 0.00      103 0xfffff8000cbad448     vinactive+0xdc
270  259   1%  88% 0.00       53 0xfffff8000cd8b000     vfs_ref+0x24
271  237   1%  89% 0.00       20 0xfffff8000cbad448     vfs_hash_get+0xcc
272  233   1%  89% 0.00       22 0xfffff8000bfd9480     uma_dbg_getslab+0x5b
273  223   1%  90% 0.00       20 0xfffff8000cb3f098     cache_lookup+0x561
274  193   0%  90% 0.00       16 0xfffff8000cb40ba8     vref+0x27
275  175   0%  91% 0.00       34 0xfffff8000cbad448     vputx+0x5a
276  169   0%  91% 0.00       51 0xfffff8000cd8b000     vfs_unbusy+0x27
277  164   0%  92% 0.00       31 0xfffff8000cb40ba8     vputx+0x5a
278[...]
279
280Adaptive mutex block: 10 events in 5.011 seconds (2 events/sec)
281
282Count indv cuml rcnt     nsec Lock                   Caller
283-------------------------------------------------------------------------------
284    3  30%  30% 0.00    17592 vm_page_queue_free_mtx vm_page_alloc+0x138
285    2  20%  50% 0.00    20528 vm_dom+0x80            vm_page_enqueue+0x63
286    2  20%  70% 0.00    55502 0xfffff8000cb40ba8     vputx+0x5a
287    1  10%  80% 0.00    12007 vm_page_queue_free_mtx vm_page_free_toq+0x12e
288    1  10%  90% 0.00     9125 0xfffff8000cbad448     vfs_hash_get+0xcc
289    1  10% 100% 0.00     7864 0xfffff8000cd83a88     qsyncvp+0x37
290-------------------------------------------------------------------------------
291[...]
292.Ed
293.It Example 2 Measuring Hold Times
294.Pp
295.Li # lockstat -H -D 10 sleep 1
296.Bd -literal
297Adaptive mutex hold: 109589 events in 1.039 seconds (105526 events/sec)
298
299Count indv cuml rcnt     nsec Lock                   Caller
300-------------------------------------------------------------------------------
301 8998   8%   8% 0.00      617 0xfffff8000c7eb180     uma_dbg_getslab+0xd4
302 5901   5%  14% 0.00      917 vm_page_queue_free_mtx vm_object_terminate+0x16a
303 5040   5%  18% 0.00      902 vm_dom+0x80            vm_page_free_toq+0x88
304 4884   4%  23% 0.00     1056 vm_page_queue_free_mtx vm_page_alloc+0x44e
305 4664   4%  27% 0.00      759 vm_dom+0x80            vm_fault_hold+0x1a13
306 4011   4%  31% 0.00      888 vm_dom                 vm_page_advise+0x11b
307 4010   4%  34% 0.00      957 vm_dom+0x80            _vm_page_deactivate+0x5c
308 3743   3%  38% 0.00      582 0xfffff8000cf04838     pmap_is_prefaultable+0x158
309 2254   2%  40% 0.00      952 vm_dom                 vm_page_free_toq+0x88
310 1639   1%  41% 0.00      591 0xfffff800d60065b8     trap_pfault+0x1f7
311-------------------------------------------------------------------------------
312[...]
313
314R/W writer hold: 64314 events in 1.039 seconds (61929 events/sec)
315
316Count indv cuml rcnt     nsec Lock                   Caller
317-------------------------------------------------------------------------------
318 7421  12%  12% 0.00     2994 pvh_global_lock        pmap_page_is_mapped+0xb6
319 4668   7%  19% 0.00     3313 pvh_global_lock        pmap_enter+0x9ae
320 1639   3%  21% 0.00      733 0xfffff80168d10200     vm_object_deallocate+0x683
321 1639   3%  24% 0.00     3061 0xfffff80168d10200     unlock_and_deallocate+0x2b
322 1639   3%  26% 0.00     2966 0xfffff80168d10200     vm_fault_hold+0x16ee
323 1567   2%  29% 0.00      733 0xfffff80168d10200     vm_fault_hold+0x19bc
324  821   1%  30% 0.00      786 0xfffff801eb0cc000     vm_object_madvise+0x32d
325  649   1%  31% 0.00     4918 0xfffff80191105300     vm_fault_hold+0x16ee
326  648   1%  32% 0.00     8112 0xfffff80191105300     unlock_and_deallocate+0x2b
327  647   1%  33% 0.00     1261 0xfffff80191105300     vm_object_deallocate+0x683
328-------------------------------------------------------------------------------
329.Ed
330.It Example 3 Measuring Hold Times for Stack Traces Containing a Specific Function
331.Pp
332.Li # lockstat -H -f tcp_input -s 50 -D 10 sleep 1
333.Bd -literal
334Adaptive mutex hold: 68 events in 1.026 seconds (66 events/sec)
335
336-------------------------------------------------------------------------------
337Count indv cuml rcnt     nsec Lock                   Caller
338   32  47%  47% 0.00     1631 0xfffff800686f50d8     tcp_do_segment+0x284b
339
340      nsec ------ Time Distribution ------ count     Stack
341      1024 |@@@@@@@@@@                     11        tcp_input+0xf54
342      2048 |@@@@@@@@@@@@@                  14        ip_input+0xc8
343      4096 |@@@@@                          6         swi_net+0x192
344      8192 |                               1         intr_event_execute_handlers+0x93
345                                                     ithread_loop+0xa6
346                                                     fork_exit+0x84
347                                                     0xffffffff808cf9ee
348-------------------------------------------------------------------------------
349Count indv cuml rcnt     nsec Lock                   Caller
350   29  43%  90% 0.00     4851 0xfffff800686f50d8     sowakeup+0xf8
351
352      nsec ------ Time Distribution ------ count     Stack
353      4096 |@@@@@@@@@@@@@@@                15        tcp_do_segment+0x2423
354      8192 |@@@@@@@@@@@@                   12        tcp_input+0xf54
355     16384 |@@                             2         ip_input+0xc8
356                                                     swi_net+0x192
357                                                     intr_event_execute_handlers+0x93
358                                                     ithread_loop+0xa6
359                                                     fork_exit+0x84
360                                                     0xffffffff808cf9ee
361-------------------------------------------------------------------------------
362[...]
363.Ed
364.El
365.Sh SEE ALSO
366.Xr dtrace 1 ,
367.Xr ksyms 4 ,
368.Xr locking 9
369.Sh NOTES
370Tail-call elimination can affect call sites.
371For example, if
372.Fn foo Ns +0x50
373calls
374.Fn bar
375and the last thing
376.Fn bar
377does is call
378.Fn mtx_unlock ,
379the compiler can arrange for
380.Fn bar
381to branch to
382.Fn mtx_unlock
383with a return address of
384.Fn foo Ns +0x58.
385Thus, the
386.Fn mtx_unlock
387in
388.Fn bar
389will appear as though it occurred at
390.Fn foo Ns +0x58.
391.Pp
392The PC in the stack frame in which an interrupt occurs can be bogus because,
393between function calls, the compiler is free to use the return address register
394for local storage.
395.Pp
396When using the
397.Fl I
398and
399.Fl s
400options together, the interrupted PC will usually not appear anywhere in the
401stack since the interrupt handler is entered asynchronously, not by a function
402call from that PC.
403