xref: /freebsd/share/man/man4/nvme.4 (revision 266f97b5)
1.\"
2.\" Copyright (c) 2012-2016 Intel Corporation
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions, and the following disclaimer,
10.\"    without modification.
11.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer
12.\"    substantially similar to the "NO WARRANTY" disclaimer below
13.\"    ("Disclaimer") and any redistribution must be conditioned upon
14.\"    including a substantially similar Disclaimer requirement for further
15.\"    binary redistribution.
16.\"
17.\" NO WARRANTY
18.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
20.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
21.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
22.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
26.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
27.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGES.
29.\"
30.\" nvme driver man page.
31.\"
32.\" Author: Jim Harris <jimharris@FreeBSD.org>
33.\"
34.\" $FreeBSD$
35.\"
36.Dd June 6, 2020
37.Dt NVME 4
38.Os
39.Sh NAME
40.Nm nvme
41.Nd NVM Express core driver
42.Sh SYNOPSIS
43To compile this driver into your kernel,
44place the following line in your kernel configuration file:
45.Bd -ragged -offset indent
46.Cd "device nvme"
47.Ed
48.Pp
49Or, to load the driver as a module at boot, place the following line in
50.Xr loader.conf 5 :
51.Bd -literal -offset indent
52nvme_load="YES"
53.Ed
54.Pp
55Most users will also want to enable
56.Xr nvd 4
57or
58.Xr nda 4
59to expose NVM Express namespaces as disk devices which can be
60partitioned.
61Note that in NVM Express terms, a namespace is roughly equivalent to a
62SCSI LUN.
63.Sh DESCRIPTION
64The
65.Nm
66driver provides support for NVM Express (NVMe) controllers, such as:
67.Bl -bullet
68.It
69Hardware initialization
70.It
71Per-CPU IO queue pairs
72.It
73API for registering NVMe namespace consumers such as
74.Xr nvd 4
75or
76.Xr nda 4
77.It
78API for submitting NVM commands to namespaces
79.It
80Ioctls for controller and namespace configuration and management
81.El
82.Pp
83The
84.Nm
85driver creates controller device nodes in the format
86.Pa /dev/nvmeX
87and namespace device nodes in
88the format
89.Pa /dev/nvmeXnsY .
90Note that the NVM Express specification starts numbering namespaces at 1,
91not 0, and this driver follows that convention.
92.Sh CONFIGURATION
93By default,
94.Nm
95will create an I/O queue pair for each CPU, provided enough MSI-X vectors
96and NVMe queue pairs can be allocated.
97If not enough vectors or queue
98pairs are available, nvme(4) will use a smaller number of queue pairs and
99assign multiple CPUs per queue pair.
100.Pp
101To force a single I/O queue pair shared by all CPUs, set the following
102tunable value in
103.Xr loader.conf 5 :
104.Bd -literal -offset indent
105hw.nvme.per_cpu_io_queues=0
106.Ed
107.Pp
108To assign more than one CPU per I/O queue pair, thereby reducing the number
109of MSI-X vectors consumed by the device, set the following tunable value in
110.Xr loader.conf 5 :
111.Bd -literal -offset indent
112hw.nvme.min_cpus_per_ioq=X
113.Ed
114.Pp
115To force legacy interrupts for all
116.Nm
117driver instances, set the following tunable value in
118.Xr loader.conf 5 :
119.Bd -literal -offset indent
120hw.nvme.force_intx=1
121.Ed
122.Pp
123Note that use of INTx implies disabling of per-CPU I/O queue pairs.
124.Pp
125To control maximum amount of system RAM in bytes to use as Host Memory
126Buffer for capable devices, set the following tunable:
127.Bd -literal -offset indent
128hw.nvme.hmb_max
129.Ed
130.Pp
131The default value is 5% of physical memory size per device.
132.Pp
133The
134.Xr nvd 4
135driver is used to provide a disk driver to the system by default.
136The
137.Xr nda 4
138driver can also be used instead.
139The
140.Xr nvd 4
141driver performs better with smaller transactions and few TRIM
142commands.
143It sends all commands directly to the drive immediately.
144The
145.Xr nda 4
146driver performs better with larger transactions and also collapses
147TRIM commands giving better performance.
148It can queue commands to the drive; combine
149.Dv BIO_DELETE
150commands into a single trip; and
151use the CAM I/O scheduler to bias one type of operation over another.
152To select the
153.Xr nda 4
154driver, set the following tunable value in
155.Xr loader.conf 5 :
156.Bd -literal -offset indent
157hw.nvme.use_nvd=0
158.Ed
159.Pp
160This value may also be set in the kernel config file with
161.Bd -literal -offset indent
162.Cd options NVME_USE_NVD=0
163.Ed
164.Pp
165When there is an error,
166.Nm
167prints only the most relevant information about the command by default.
168To enable dumping of all information about the command, set the following tunable
169value in
170.Xr loader.conf 5 :
171.Bd -literal -offset indent
172hw.nvme.verbose_cmd_dump=1
173.Ed
174.Pp
175Prior versions of the driver reset the card twice on boot.
176This proved to be unnecessary and inefficient, so the driver now resets drive
177controller only once.
178The old behavior may be restored in the kernel config file with
179.Bd -literal -offset indent
180.Cd options NVME_2X_RESET
181.Ed
182.Pp
183.Sh SYSCTL VARIABLES
184The following controller-level sysctls are currently implemented:
185.Bl -tag -width indent
186.It Va dev.nvme.0.num_cpus_per_ioq
187(R) Number of CPUs associated with each I/O queue pair.
188.It Va dev.nvme.0.int_coal_time
189(R/W) Interrupt coalescing timer period in microseconds.
190Set to 0 to disable.
191.It Va dev.nvme.0.int_coal_threshold
192(R/W) Interrupt coalescing threshold in number of command completions.
193Set to 0 to disable.
194.El
195.Pp
196The following queue pair-level sysctls are currently implemented.
197Admin queue sysctls take the format of dev.nvme.0.adminq and I/O queue sysctls
198take the format of dev.nvme.0.ioq0.
199.Bl -tag -width indent
200.It Va dev.nvme.0.ioq0.num_entries
201(R) Number of entries in this queue pair's command and completion queue.
202.It Va dev.nvme.0.ioq0.num_tr
203(R) Number of nvme_tracker structures currently allocated for this queue pair.
204.It Va dev.nvme.0.ioq0.num_prp_list
205(R) Number of nvme_prp_list structures currently allocated for this queue pair.
206.It Va dev.nvme.0.ioq0.sq_head
207(R) Current location of the submission queue head pointer as observed by
208the driver.
209The head pointer is incremented by the controller as it takes commands off
210of the submission queue.
211.It Va dev.nvme.0.ioq0.sq_tail
212(R) Current location of the submission queue tail pointer as observed by
213the driver.
214The driver increments the tail pointer after writing a command
215into the submission queue to signal that a new command is ready to be
216processed.
217.It Va dev.nvme.0.ioq0.cq_head
218(R) Current location of the completion queue head pointer as observed by
219the driver.
220The driver increments the head pointer after finishing
221with a completion entry that was posted by the controller.
222.It Va dev.nvme.0.ioq0.num_cmds
223(R) Number of commands that have been submitted on this queue pair.
224.It Va dev.nvme.0.ioq0.dump_debug
225(W) Writing 1 to this sysctl will dump the full contents of the submission
226and completion queues to the console.
227.El
228.Pp
229In addition to the typical pci attachment, the
230.Nm
231driver supports attaching to a
232.Xr ahci 4
233device.
234Intel's Rapid Storage Technology (RST) hides the nvme device
235behind the AHCI device due to limitations in Windows.
236However, this effectively hides it from the
237.Fx
238kernel.
239To work around this limitation,
240.Fx
241detects that the AHCI device supports RST and when it is enabled.
242See
243.Xr ahci 4
244for more details.
245.Sh SEE ALSO
246.Xr nda 4 ,
247.Xr nvd 4 ,
248.Xr pci 4 ,
249.Xr nvmecontrol 8 ,
250.Xr disk 9
251.Sh HISTORY
252The
253.Nm
254driver first appeared in
255.Fx 9.2 .
256.Sh AUTHORS
257.An -nosplit
258The
259.Nm
260driver was developed by Intel and originally written by
261.An Jim Harris Aq Mt jimharris@FreeBSD.org ,
262with contributions from
263.An Joe Golio
264at EMC.
265.Pp
266This man page was written by
267.An Jim Harris Aq Mt jimharris@FreeBSD.org .
268