xref: /freebsd/share/man/man9/fail.9 (revision 4b9d6057)
1.\"
2.\" Copyright (c) 2009-2019 Dell EMC Isilon http://www.isilon.com/
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice(s), this list of conditions and the following disclaimer as
9.\"    the first lines of this file unmodified other than the possible
10.\"    addition of one or more copyright notices.
11.\" 2. Redistributions in binary form must reproduce the above copyright
12.\"    notice(s), this list of conditions and the following disclaimer in the
13.\"    documentation and/or other materials provided with the distribution.
14.\"
15.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
16.\" EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
17.\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
18.\" DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY
19.\" DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
20.\" (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
21.\" SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
22.\" CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
23.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
24.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
25.\" DAMAGE.
26.\"
27.Dd June 6, 2019
28.Dt FAIL 9
29.Os
30.Sh NAME
31.Nm DEBUG_FP ,
32.Nm KFAIL_POINT_CODE ,
33.Nm KFAIL_POINT_CODE_FLAGS ,
34.Nm KFAIL_POINT_CODE_COND ,
35.Nm KFAIL_POINT_ERROR ,
36.Nm KFAIL_POINT_EVAL ,
37.Nm KFAIL_POINT_DECLARE ,
38.Nm KFAIL_POINT_DEFINE ,
39.Nm KFAIL_POINT_GOTO ,
40.Nm KFAIL_POINT_RETURN ,
41.Nm KFAIL_POINT_RETURN_VOID ,
42.Nm KFAIL_POINT_SLEEP_CALLBACKS ,
43.Nm fail_point
44.Nd fail points
45.Sh SYNOPSIS
46.In sys/fail.h
47.Fn KFAIL_POINT_CODE "parent" "name" "code"
48.Fn KFAIL_POINT_CODE_FLAGS "parent" "name" "flags" "code"
49.Fn KFAIL_POINT_CODE_COND "parent" "name" "cond" "flags" "code"
50.Fn KFAIL_POINT_ERROR "parent" "name" "error_var"
51.Fn KFAIL_POINT_EVAL "name" "code"
52.Fn KFAIL_POINT_DECLARE "name"
53.Fn KFAIL_POINT_DEFINE "parent" "name" "flags"
54.Fn KFAIL_POINT_GOTO "parent" "name" "error_var" "label"
55.Fn KFAIL_POINT_RETURN "parent" "name"
56.Fn KFAIL_POINT_RETURN_VOID "parent" "name"
57.Fn KFAIL_POINT_SLEEP_CALLBACKS "parent" "name" "pre_func" "pre_arg" "post_func" "post_arg" "code"
58.Sh DESCRIPTION
59Fail points are used to add code points where errors may be injected
60in a user controlled fashion.
61Fail points provide a convenient wrapper around user-provided error
62injection code, providing a
63.Xr sysctl 9
64MIB, and a parser for that MIB that describes how the error
65injection code should fire.
66.Pp
67The base fail point macro is
68.Fn KFAIL_POINT_CODE
69where
70.Fa parent
71is a sysctl tree (frequently
72.Sy DEBUG_FP
73for kernel fail points, but various subsystems may wish to provide
74their own fail point trees), and
75.Fa name
76is the name of the MIB in that tree, and
77.Fa code
78is the error injection code.
79The
80.Fa code
81argument does not require braces, but it is considered good style to
82use braces for any multi-line code arguments.
83Inside the
84.Fa code
85argument, the evaluation of
86.Sy RETURN_VALUE
87is derived from the
88.Fn return
89value set in the sysctl MIB.
90.Pp
91Additionally,
92.Fn KFAIL_POINT_CODE_FLAGS
93provides a
94.Fa flags
95argument which controls the fail point's behaviour.
96This can be used to e.g., mark the fail point's context as non-sleepable,
97which causes the
98.Sy sleep
99action to be coerced to a busy wait.
100The supported flags are:
101.Bl -ohang -offset indent
102.It FAIL_POINT_USE_TIMEOUT_PATH
103Rather than sleeping on a
104.Fn sleep
105call, just fire the post-sleep function after a timeout fires.
106.It FAIL_POINT_NONSLEEPABLE
107Mark the fail point as being in a non-sleepable context, which coerces
108.Fn sleep
109calls to
110.Fn delay
111calls.
112.El
113.Pp
114Likewise,
115.Fn KFAIL_POINT_CODE_COND
116supplies a
117.Fa cond
118argument, which allows you to set the condition under which the fail point's
119code may fire.
120This is equivalent to:
121.Bd -literal
122	if (cond)
123		KFAIL_POINT_CODE_FLAGS(...);
124
125.Ed
126See
127.Sx SYSCTL VARIABLES
128below.
129.Pp
130The remaining
131.Fn KFAIL_POINT_*
132macros are wrappers around common error injection paths:
133.Bl -inset
134.It Fn KFAIL_POINT_RETURN parent name
135is the equivalent of
136.Sy KFAIL_POINT_CODE(..., return RETURN_VALUE)
137.It Fn KFAIL_POINT_RETURN_VOID parent name
138is the equivalent of
139.Sy KFAIL_POINT_CODE(..., return)
140.It Fn KFAIL_POINT_ERROR parent name error_var
141is the equivalent of
142.Sy KFAIL_POINT_CODE(..., error_var = RETURN_VALUE)
143.It Fn KFAIL_POINT_GOTO parent name error_var label
144is the equivalent of
145.Sy KFAIL_POINT_CODE(..., { error_var = RETURN_VALUE; goto label;})
146.El
147.Pp
148You can also introduce fail points by separating the declaration,
149definition, and evaluation portions.
150.Bl -inset
151.It Fn KFAIL_POINT_DECLARE name
152is used to declare the
153.Sy fail_point
154struct.
155.It Fn KFAIL_POINT_DEFINE parent name flags
156defines and initializes the
157.Sy fail_point
158and sets up its
159.Xr sysctl 9 .
160.It Fn KFAIL_POINT_EVAL name code
161is used at the point that the fail point is executed.
162.El
163.Sh SYSCTL VARIABLES
164The
165.Fn KFAIL_POINT_*
166macros add sysctl MIBs where specified.
167Many base kernel MIBs can be found in the
168.Sy debug.fail_point
169tree (referenced in code by
170.Sy DEBUG_FP ) .
171.Pp
172The sysctl variable may be set in a number of ways:
173.Bd -literal
174  [<pct>%][<cnt>*]<type>[(args...)][-><more terms>]
175.Ed
176.Pp
177The <type> argument specifies which action to take; it can be one of:
178.Bl -tag -width ".Dv return"
179.It Sy off
180Take no action (does not trigger fail point code)
181.It Sy return
182Trigger fail point code with specified argument
183.It Sy sleep
184Sleep the specified number of milliseconds
185.It Sy panic
186Panic
187.It Sy break
188Break into the debugger, or trap if there is no debugger support
189.It Sy print
190Print that the fail point executed
191.It Sy pause
192Threads sleep at the fail point until the fail point is set to
193.Sy off
194.It Sy yield
195Thread yields the cpu when the fail point is evaluated
196.It Sy delay
197Similar to sleep, but busy waits the cpu.
198(Useful in non-sleepable contexts.)
199.El
200.Pp
201The <pct>% and <cnt>* modifiers prior to <type> control when
202<type> is executed.
203The <pct>% form (e.g. "1.2%") can be used to specify a
204probability that <type> will execute.
205This is a decimal in the range (0, 100] which can specify up to
2061/10,000% precision.
207The <cnt>* form (e.g. "5*") can be used to specify the number of
208times <type> should be executed before this <term> is disabled.
209Only the last probability and the last count are used if multiple
210are specified, i.e. "1.2%2%" is the same as "2%".
211When both a probability and a count are specified, the probability
212is evaluated before the count, i.e. "2%5*" means "2% of the time,
213but only 5 times total".
214.Pp
215The operator -> can be used to express cascading terms.
216If you specify <term1>-><term2>, it means that if <term1> does not
217.Ql execute ,
218<term2> is evaluated.
219For the purpose of this operator, the
220.Fn return
221and
222.Fn print
223operators are the only types that cascade.
224A
225.Fn return
226term only cascades if the code executes, and a
227.Fn print
228term only cascades when passed a non-zero argument.
229A pid can optionally be specified.
230The fail point term is only executed when invoked by a process with a
231matching p_pid.
232.Sh EXAMPLES
233.Bl -tag -width Sy
234.It Sy sysctl debug.fail_point.foobar="2.1%return(5)"
23521/1000ths of the time, execute
236.Fa code
237with RETURN_VALUE set to 5.
238.It Sy sysctl debug.fail_point.foobar="2%return(5)->5%return(22)"
2392/100ths of the time, execute
240.Fa code
241with RETURN_VALUE set to 5.
242If that does not happen, 5% of the time execute
243.Fa code
244with RETURN_VALUE set to 22.
245.It Sy sysctl debug.fail_point.foobar="5*return(5)->0.1%return(22)"
246For 5 times, return 5.
247After that, 1/1000th of the time, return 22.
248.It Sy sysctl debug.fail_point.foobar="0.1%5*return(5)"
249Return 5 for 1 in 1000 executions, but only 5 times total.
250.It Sy sysctl debug.fail_point.foobar="1%*sleep(50)"
2511/100th of the time, sleep 50ms.
252.It Sy sysctl debug.fail_point.foobar="1*return(5)[pid 1234]"
253Return 5 once, when pid 1234 executes the fail point.
254.El
255.Sh AUTHORS
256.An -nosplit
257This manual page was written by
258.Pp
259.An Matthew Bryan Aq Mt matthew.bryan@isilon.com
260and
261.Pp
262.An Zach Loafman Aq Mt zml@FreeBSD.org .
263.Sh CAVEATS
264It is easy to shoot yourself in the foot by setting fail points too
265aggressively or setting too many in combination.
266For example, forcing
267.Fn malloc
268to fail consistently is potentially harmful to uptime.
269.Pp
270The
271.Fn sleep
272sysctl setting may not be appropriate in all situations.
273Currently,
274.Fn fail_point_eval
275does not verify whether the context is appropriate for calling
276.Fn msleep .
277You can force it to evaluate a
278.Sy sleep
279action as a
280.Sy delay
281action by specifying the
282.Sy FAIL_POINT_NONSLEEPABLE
283flag at the point the fail point is declared.
284