1.\" Copyright (c) 2016 The DragonFly Project. All rights reserved. 2.\" 3.\" This code is derived from software contributed to The DragonFly Project 4.\" by Matthew Dillon <dillon@backplane.com> 5.\" 6.\" Redistribution and use in source and binary forms, with or without 7.\" modification, are permitted provided that the following conditions 8.\" are met: 9.\" 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in 14.\" the documentation and/or other materials provided with the 15.\" distribution. 16.\" 3. Neither the name of The DragonFly Project nor the names of its 17.\" contributors may be used to endorse or promote products derived 18.\" from this software without specific, prior written permission. 19.\" 20.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 21.\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 22.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 23.\" FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 24.\" COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 25.\" INCIDENTAL, SPECIAL, EXEMPLARY OR CONSEQUENTIAL DAMAGES (INCLUDING, 26.\" BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 27.\" LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 28.\" AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 29.\" OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT 30.\" OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 31.\" SUCH DAMAGE. 32.\" 33.Dd June 5, 2015 34.Dt NVME 4 35.Os 36.Sh NAME 37.Nm nvme 38.Nd NVM Express Controller for PCIe-based SSDs 39.Sh SYNOPSIS 40To compile this driver into the kernel, 41place the following line in your 42kernel configuration file: 43.Bd -ragged -offset indent 44.Cd "device nvme" 45.Ed 46.Pp 47Alternatively, to load the driver as a 48module at boot time, place the following line in 49.Xr loader.conf 5 : 50.Bd -literal -offset indent 51nvme_load="YES" 52.Ed 53.Sh DESCRIPTION 54The 55.Nm 56driver provides support for PCIe storage controllers conforming to the 57NVM Express Controller Interface specification. 58NVMe controllers have a direct PCIe host interface to the controller 59which in turn has a direct connection to the underlying non-volatile 60(typically flash) storage, yielding a huge reduction in latency and 61increase in performance over 62.Xr ahci 4 . 63.Pp 64In addition, NVMe controllers are capable of supporting up to 65535 65independent submission and completion queues each able to support upwards 66of 16384 queue entries. Each queue may be assigned its own interrupt 67vector out of the controller's pool (up to 2048). 68.Pp 69Actual controllers typically implement lower limits. While most controllers 70allow a maximal number of queue entries, the total number of queues is often 71limited to far less than 65535. 8-32 queues are commonly supported. 72Similarly, while up to 2048 MSI-X vectors can be supported by the spec, 73actual controllers typically support fewer vectors. Still, having several 74MSI-X vectors allows interrupts to be distributed to multiple CPUs, 75reducing bottlenecks and improving performance. The multiple queues can 76be divvied up across available cpu cores by the driver, as well as split-up 77based on the type of I/O operation being performed (such as giving read 78and write I/O commands their own queues). This also significantly 79reduces bottlenecks and improves performance, particularly in mixed 80read-write environments. 81.Sh FORM FACTOR 82NVMe boards usually come in one of two flavors, either a tiny form-factor 83with a M.2 or NGFF connector, supplying 2 or 4 PCIe lanes, or in a larger 84form that slips into a normal PCIe slot. The larger form typically 85implements 2, 4, or 8 lanes. Also note that adapter cards that fit 86into normal PCIe slots and can mount the smaller M.2/NGFF NVME cards can 87be cheaply purchased. 88.Sh PERFORMANCE 89Typical performance for a 2-lane (x2) board is in the 700MB/s to 1.5 GByte/s 90range. 4-lane (x4) boards typically range from 1.0 GBytes/s to 2.5 GBytes/s. 91Full-blown PCIe cards run the whole gamut, 2.5 GBytes/sec is fairly typical 92but performance can exceed 5 GBytes/sec in a high-end card. 93.Pp 94Multi-threaded random-read performance can exceed 300,000 IOPS on an x4 board. 95Single-threaded performance is usually in the 40,000 to 100,000 IOPS range. 96Sequential submission/completion latencies are typically below 35uS while 97random submission/completion latencies are typically below 110uS. 98Performance (uncached) through a filesystem will be bottlenecked by additional 99factors, particularly if testing is only being done on a single file. 100.Pp 101The biggest differentiation between boards is usually write performance. 102Small boards with only a few flash chips have relatively low write 103performance, usually in the 150MByte/sec range. Higher-end boards will have 104significantly better write performance, potentially exceeding 1.0 GByte/sec. 105.Pp 106For reference, the SATA-III physical interface is limited to 600 MBytes/sec 107and the extra phy layer results in higher latencies, and AHCI controllers are 108limited to a single 32-entry queue. 109.Sh FEATURES 110The 111.Dx 112.Nm 113driver automatically selects the best SMP-friendly and 114I/O-typing queue configuration possible based on what the controller 115supports. 116It uses a direct disk device API which bypasses CAM, so kernel code paths 117to read and write blocks are SMP-friendly and, depending on the queue 118configuration, potentially conflict-free. 119The driver is capable of submitting commands and processing responses on 120multiple queues simultaniously in a SMP environment. 121.Pp 122The driver pre-reserves DMA memory for all necessary descriptors, queue 123entries, and internal driver structures, and allows for a very generous 124number of queue entries (1024 x NQueues) for maximum performance. 125.Sh HINTS ON NVME CARDS 126So far I've only been able to test one Samsung NVME M.2 card and 127an Intel 750 HHHL (half-height / half-length) PCIe card. 128.Pp 129My recommendation is to go with Samsung. The firmware is pretty good. 130It appears to be implemented reasonably well regardless of the queue 131configuration or I/O blocksize employed, giving expected scaling without 132any quirky behavior. 133.Pp 134The intel 750 has very poorly-implemented firmware. 135For example, the more queues the driver configures, the poorer 136the single-threaded read performance is. And no matter the queue 137configuration it appears that adding a second concurrent reader drops 138performance drastically, then it slowly increases as you add more concurrent 139readers. In addition, on the 750, the firmware degrades horribly when 140reads use a blocksize of 64KB. The best performance is at 32KB. In fact, 141performance again degrades horribly if you drop down to 16KB. 142And if that weren't bad enough, the 750 takes over 13 seconds to become 143ready after a machine power-up or reset. 144.Pp 145The grand result of all of this is that filesystem performance through an 146Intel NVME card is going to be hit-or-miss, depending on inconseqential 147differences in blocksize and queue configuration. 148Regardless of whatever hacks Intel might be employing in their own drivers, 149this is just totally unacceptable driver behavior. 150.Pp 151I do not recommend rebranders like Plextor or Kingston. For one thing, 152if you do buy these, be very careful to get one that is actually a NVME 153card and not a M.2 card with an AHCI controller on it. Plextor's performance 154is particularly bad. Kingston seems to have done a better job and reading 155at 1.0GB/s+ is possible despite the cpu overhead of going through an AHCI 156controller (the flash in both cases is directly connected to the controller, 157so there is no SATA Phy to get in the way). Of course, if you actually want 158an AHCI card, then these might be the way to go, and you might even be able 159to boot from them. 160.Pp 161.Sh HINTS ON CONFIGURING MACHINES (BIOS) 162If nvme locks up while trying to probe the BIOS did something horrible to 163the PCIe card. If you have enabled your BIOS's FastBoot option, turn it 164off, this may fix the issue. 165.Pp 166Most BIOSes cannot boot from an NVME card, or if they can it generally 167requires doing a UEFI boot for which DragonFly support is a bit of a headache 168configuring. So take this into consideration as well. 169.Sh SEE ALSO 170.Xr intro 4 , 171.Xr pci 4 , 172.Xr ahci 4 , 173.Xr loader.conf 5 174.Sh HISTORY 175The 176.Nm 177driver first appeared in 178.Dx 4.5 . 179.Sh AUTHORS 180.An -nosplit 181The 182.Nm 183driver for 184.Dx 185was written from scratch by 186.An Matthew Dillon Aq Mt dillon@backplane.com 187based on the NVM Express 1.2a specification. 188