xref: /freebsd/share/man/man9/devstat.9 (revision c697fb7f)
1.\"
2.\" Copyright (c) 1998, 1999 Kenneth D. Merry.
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions and the following disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\" 3. The name of the author may not be used to endorse or promote products
14.\"    derived from this software without specific prior written permission.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.\" $FreeBSD$
29.\"
30.Dd August 22, 2018
31.Dt DEVSTAT 9
32.Os
33.Sh NAME
34.Nm devstat ,
35.Nm devstat_add_entry ,
36.Nm devstat_end_transaction ,
37.Nm devstat_end_transaction_bio ,
38.Nm devstat_end_transaction_bio_bt ,
39.Nm devstat_remove_entry ,
40.Nm devstat_start_transaction ,
41.Nm devstat_start_transaction_bio
42.Nd kernel interface for keeping device statistics
43.Sh SYNOPSIS
44.In sys/devicestat.h
45.Ft void
46.Fo devstat_add_entry
47.Fa "struct devstat *ds"
48.Fa "const char *dev_name"
49.Fa "int unit_number"
50.Fa "uint32_t block_size"
51.Fa "devstat_support_flags flags"
52.Fa "devstat_type_flags device_type"
53.Fa "devstat_priority priority"
54.Fc
55.Ft void
56.Fn devstat_remove_entry "struct devstat *ds"
57.Ft void
58.Fo devstat_start_transaction
59.Fa "struct devstat *ds"
60.Fa "const struct bintime *now"
61.Fc
62.Ft void
63.Fo devstat_start_transaction_bio
64.Fa "struct devstat *ds"
65.Fa "struct bio *bp"
66.Fc
67.Ft void
68.Fo devstat_end_transaction
69.Fa "struct devstat *ds"
70.Fa "uint32_t bytes"
71.Fa "devstat_tag_type tag_type"
72.Fa "devstat_trans_flags flags"
73.Fa "const struct bintime *now"
74.Fa "const struct bintime *then"
75.Fc
76.Ft void
77.Fo devstat_end_transaction_bio
78.Fa "struct devstat *ds"
79.Fa "const struct bio *bp"
80.Fc
81.Fc
82.Ft void
83.Fo devstat_end_transaction_bio_bt
84.Fa "struct devstat *ds"
85.Fa "const struct bio *bp"
86.Fa "const struct bintime *now"
87.Fc
88.Sh DESCRIPTION
89The devstat subsystem is an interface for recording device
90statistics, as its name implies.
91The idea is to keep reasonably detailed
92statistics while utilizing a minimum amount of CPU time to record them.
93Thus, no statistical calculations are actually performed in the kernel
94portion of the
95.Nm
96code.
97Instead, that is left for user programs to handle.
98.Pp
99The historical and antiquated
100.Nm
101model assumed a single active IO operation per device, which is not accurate
102for most disk-like drivers in the 2000s and beyond.
103New consumers of the interface should almost certainly use only the "bio"
104variants of the start and end transacation routines.
105.Pp
106.Fn devstat_add_entry
107registers a device with the
108.Nm
109subsystem.
110The caller is expected to have already allocated \fBand zeroed\fR
111the devstat structure before calling this function.
112.Fn devstat_add_entry
113takes several arguments:
114.Bl -tag -width device_type
115.It ds
116The
117.Va devstat
118structure, allocated and zeroed by the client.
119.It dev_name
120The device name, e.g., da, cd, sa.
121.It unit_number
122Device unit number.
123.It block_size
124Block size of the device, if supported.
125If the device does not support a
126block size, or if the blocksize is unknown at the time the device is added
127to the
128.Nm
129list, it should be set to 0.
130.It flags
131Flags indicating operations supported or not supported by the device.
132See below for details.
133.It device_type
134The device type.
135This is broken into three sections: base device type
136(e.g., direct access, CDROM, sequential access), interface type (IDE, SCSI
137or other) and a pass-through flag to indicate pas-through devices.
138See below for a complete list of types.
139.It priority
140The device priority.
141The priority is used to determine how devices are
142sorted within
143.Nm devstat Ns 's
144list of devices.
145Devices are sorted first by priority (highest to lowest),
146and then by attach order.
147See below for a complete list of available
148priorities.
149.El
150.Pp
151.Fn devstat_remove_entry
152removes a device from the
153.Nm
154subsystem.
155It takes the devstat structure for the device in question as
156an argument.
157The
158.Nm
159generation number is incremented and the number of devices is decremented.
160.Pp
161.Fn devstat_start_transaction
162registers the start of a transaction with the
163.Nm
164subsystem.
165Optionally, if the caller already has a
166.Fn binuptime
167value available, it may be passed in
168.Fa *now .
169Usually the caller can just pass
170.Dv NULL
171for
172.Fa now ,
173and the routine will gather the current
174.Fn binuptime
175itself.
176The busy count is incremented with each transaction start.
177When a device goes from idle to busy, the system uptime is recorded in the
178.Va busy_from
179field of the
180.Va devstat
181structure.
182.Pp
183.Fn devstat_start_transaction_bio
184records the
185.Fn binuptime
186in the provided bio's
187.Fa bio_t0
188and then invokes
189.Fn devstat_start_transaction .
190.Pp
191.Fn devstat_end_transaction
192registers the end of a transaction with the
193.Nm
194subsystem.
195It takes six arguments:
196.Bl -tag -width tag_type
197.It ds
198The
199.Va devstat
200structure for the device in question.
201.It bytes
202The number of bytes transferred in this transaction.
203.It tag_type
204Transaction tag type.
205See below for tag types.
206.It flags
207Transaction flags indicating whether the transaction was a read, write, or
208whether no data was transferred.
209.It now
210The
211.Fn binuptime
212at the end of the transaction, or
213.Dv NULL .
214.It then
215The
216.Fn binuptime
217at the beginning of the transaction, or
218.Dv NULL .
219.El
220.Pp
221If
222.Fa now
223is
224.Dv NULL ,
225it collects the current time from
226.Fn binuptime .
227If
228.Fa then
229is
230.Dv NULL ,
231the operation is not tracked in the
232.Va devstat
233.Fa duration
234table.
235.Pp
236.Fn devstat_end_transaction_bio
237is a thin wrapper for
238.Fn devstat_end_transaction_bio_bt
239with a
240.Dv NULL
241.Fa now
242parameter.
243.Pp
244.Fn devstat_end_transaction_bio_bt
245is a wrapper for
246.Fn devstat_end_transaction
247which pulls all needed information from a
248.Va "struct bio"
249prepared by
250.Fn devstat_start_transaction_bio .
251The bio must be ready for
252.Fn biodone
253(i.e.,
254.Fa bio_bcount
255and
256.Fa bio_resid
257must be correctly initialized).
258.Pp
259The
260.Va devstat
261structure is composed of the following fields:
262.Bl -tag -width dev_creation_time
263.It sequence0 ,
264.It sequence1
265An implementation detail used to gather consistent snapshots of device
266statistics.
267.It start_count
268Number of operations started.
269.It end_count
270Number of operations completed.
271The
272.Dq busy_count
273can be calculated by subtracting
274.Fa end_count
275from
276.Fa start_count .
277.Fa ( sequence0
278and
279.Fa sequence1
280are used to get a consistent snapshot.)
281This is the current number of outstanding transactions for the device.
282This should never go below zero, and on an idle device it should be zero.
283If either one of these conditions is not true, it indicates a problem.
284.Pp
285There should be one and only one
286transaction start event and one transaction end event for each transaction.
287.It dev_links
288Each
289.Va devstat
290structure is placed in a linked list when it is registered.
291The
292.Va dev_links
293field contains a pointer to the next entry in the list of
294.Va devstat
295structures.
296.It device_number
297The device number is a unique identifier for each device.
298The device
299number is incremented for each new device that is registered.
300The device
301number is currently only a 32-bit integer, but it could be enlarged if
302someone has a system with more than four billion device arrival events.
303.It device_name
304The device name is a text string given by the registering driver to
305identify itself.
306(e.g.,
307.Dq da ,
308.Dq cd ,
309.Dq sa ,
310etc.)
311.It unit_number
312The unit number identifies the particular instance of the peripheral driver
313in question.
314.It bytes[4]
315This array contains the number of bytes that have been read (index
316.Dv DEVSTAT_READ ) ,
317written (index
318.Dv DEVSTAT_WRITE ) ,
319freed or erased (index
320.Dv DEVSTAT_FREE ) ,
321or other (index
322.Dv DEVSTAT_NO_DATA ) .
323All values are unsigned 64-bit integers.
324.It operations[4]
325This array contains the number of operations of a given type that have been
326performed.
327The indices are identical to those for
328.Fa bytes
329above.
330.Dv DEVSTAT_NO_DATA
331or "other" represents the number of transactions to the device which are
332neither reads, writes, nor frees.
333For instance,
334.Tn SCSI
335drivers often send a test unit ready command to
336.Tn SCSI
337devices.
338The test unit ready command does not read or write any data.
339It merely causes the device to return its status.
340.It duration[4]
341This array contains the total bintime corresponding to completed operations of
342a given type.
343The indices are identical to those for
344.Fa bytes
345above.
346(Operations that complete using the historical
347.Fn devstat_end_transaction
348API and do not provide a non-NULL
349.Fa then
350are not accounted for.)
351.It busy_time
352This is the amount of time that the device busy count has been greater than
353zero.
354This is only updated when the busy count returns to zero.
355.It creation_time
356This is the time, as reported by
357.Fn getmicrotime
358that the device was registered.
359.It block_size
360This is the block size of the device, if the device has a block size.
361.It tag_types
362This is an array of counters to record the number of various tag types that
363are sent to a device.
364See below for a list of tag types.
365.It busy_from
366If the device is not busy, this was the time that a transaction last completed.
367If the device is busy, this the most recent of either the time that the device
368became busy, or the time that the last transaction completed.
369.It flags
370These flags indicate which statistics measurements are supported by a
371particular device.
372These flags are primarily intended to serve as an aid
373to userland programs that decipher the statistics.
374.It device_type
375This is the device type.
376It consists of three parts: the device type
377(e.g., direct access, CDROM, sequential access, etc.), the interface (IDE,
378SCSI or other) and whether or not the device in question is a pass-through
379driver.
380See below for a complete list of device types.
381.It priority
382This is the priority.
383This is the first parameter used to determine where
384to insert a device in the
385.Nm
386list.
387The second parameter is attach order.
388See below for a list of available priorities.
389.El
390.Pp
391Each device is given a device type.
392Pass-through devices have the same underlying device type and interface as the
393device they provide an interface for, but they also have the pass-through flag
394set.
395The base device types are identical to the
396.Tn SCSI
397device type numbers, so with
398.Tn SCSI
399peripherals, the device type returned from an inquiry is usually ORed with the
400.Tn SCSI
401interface type and the pass-through flag if appropriate.
402The device type
403flags are as follows:
404.Bd -literal -offset indent
405typedef enum {
406	DEVSTAT_TYPE_DIRECT	= 0x000,
407	DEVSTAT_TYPE_SEQUENTIAL	= 0x001,
408	DEVSTAT_TYPE_PRINTER	= 0x002,
409	DEVSTAT_TYPE_PROCESSOR	= 0x003,
410	DEVSTAT_TYPE_WORM	= 0x004,
411	DEVSTAT_TYPE_CDROM	= 0x005,
412	DEVSTAT_TYPE_SCANNER	= 0x006,
413	DEVSTAT_TYPE_OPTICAL	= 0x007,
414	DEVSTAT_TYPE_CHANGER	= 0x008,
415	DEVSTAT_TYPE_COMM	= 0x009,
416	DEVSTAT_TYPE_ASC0	= 0x00a,
417	DEVSTAT_TYPE_ASC1	= 0x00b,
418	DEVSTAT_TYPE_STORARRAY	= 0x00c,
419	DEVSTAT_TYPE_ENCLOSURE	= 0x00d,
420	DEVSTAT_TYPE_FLOPPY	= 0x00e,
421	DEVSTAT_TYPE_MASK	= 0x00f,
422	DEVSTAT_TYPE_IF_SCSI	= 0x010,
423	DEVSTAT_TYPE_IF_IDE	= 0x020,
424	DEVSTAT_TYPE_IF_OTHER	= 0x030,
425	DEVSTAT_TYPE_IF_MASK	= 0x0f0,
426	DEVSTAT_TYPE_PASS	= 0x100
427} devstat_type_flags;
428.Ed
429.Pp
430Devices have a priority associated with them, which controls roughly where
431they are placed in the
432.Nm
433list.
434The priorities are as follows:
435.Bd -literal -offset indent
436typedef enum {
437	DEVSTAT_PRIORITY_MIN	= 0x000,
438	DEVSTAT_PRIORITY_OTHER	= 0x020,
439	DEVSTAT_PRIORITY_PASS	= 0x030,
440	DEVSTAT_PRIORITY_FD	= 0x040,
441	DEVSTAT_PRIORITY_WFD	= 0x050,
442	DEVSTAT_PRIORITY_TAPE	= 0x060,
443	DEVSTAT_PRIORITY_CD	= 0x090,
444	DEVSTAT_PRIORITY_DISK	= 0x110,
445	DEVSTAT_PRIORITY_ARRAY	= 0x120,
446	DEVSTAT_PRIORITY_MAX	= 0xfff
447} devstat_priority;
448.Ed
449.Pp
450Each device has associated with it flags to indicate what operations are
451supported or not supported.
452The
453.Va devstat_support_flags
454values are as follows:
455.Bl -tag -width DEVSTAT_NO_ORDERED_TAGS
456.It DEVSTAT_ALL_SUPPORTED
457Every statistic type is supported by the device.
458.It DEVSTAT_NO_BLOCKSIZE
459This device does not have a blocksize.
460.It DEVSTAT_NO_ORDERED_TAGS
461This device does not support ordered tags.
462.It DEVSTAT_BS_UNAVAILABLE
463This device supports a blocksize, but it is currently unavailable.
464This
465flag is most often used with removable media drives.
466.El
467.Pp
468Transactions to a device fall into one of three categories, which are
469represented in the
470.Va flags
471passed into
472.Fn devstat_end_transaction .
473The transaction types are as follows:
474.Bd -literal -offset indent
475typedef enum {
476	DEVSTAT_NO_DATA	= 0x00,
477	DEVSTAT_READ	= 0x01,
478	DEVSTAT_WRITE	= 0x02,
479	DEVSTAT_FREE	= 0x03
480} devstat_trans_flags;
481.Ed
482.Pp
483There are four possible values for the
484.Va tag_type
485argument to
486.Fn devstat_end_transaction :
487.Bl -tag -width DEVSTAT_TAG_ORDERED
488.It DEVSTAT_TAG_SIMPLE
489The transaction had a simple tag.
490.It DEVSTAT_TAG_HEAD
491The transaction had a head of queue tag.
492.It DEVSTAT_TAG_ORDERED
493The transaction had an ordered tag.
494.It DEVSTAT_TAG_NONE
495The device does not support tags.
496.El
497.Pp
498The tag type values correspond to the lower four bits of the
499.Tn SCSI
500tag definitions.
501In CAM, for instance, the
502.Va tag_action
503from the CCB is ORed with 0xf to determine the tag type to pass in to
504.Fn devstat_end_transaction .
505.Pp
506There is a macro,
507.Dv DEVSTAT_VERSION
508that is defined in
509.In sys/devicestat.h .
510This is the current version of the
511.Nm
512subsystem, and it should be incremented each time a change is made that
513would require recompilation of userland programs that access
514.Nm
515statistics.
516Userland programs use this version, via the
517.Va kern.devstat.version
518.Nm sysctl
519variable to determine whether they are in sync with the kernel
520.Nm
521structures.
522.Sh SEE ALSO
523.Xr systat 1 ,
524.Xr devstat 3 ,
525.Xr iostat 8 ,
526.Xr rpc.rstatd 8 ,
527.Xr vmstat 8
528.Sh HISTORY
529The
530.Nm
531statistics system appeared in
532.Fx 3.0 .
533.Sh AUTHORS
534.An Kenneth Merry Aq Mt ken@FreeBSD.org
535.Sh BUGS
536There may be a need for
537.Fn spl
538protection around some of the
539.Nm
540list manipulation code to ensure, for example, that the list of devices
541is not changed while someone is fetching the
542.Va kern.devstat.all
543.Nm sysctl
544variable.
545