Revision tags: v6.2.1, v6.2.0, v6.3.0 |
|
#
b5b25080 |
| 25-Dec-2021 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Back-off if driver lies about reported queue limits
* Apparently some low-rent nvme controllers lie about how many queues they support.
* If the nvme driver lies about queue support and a
nvme - Back-off if driver lies about reported queue limits
* Apparently some low-rent nvme controllers lie about how many queues they support.
* If the nvme driver lies about queue support and a queue create command fails, attempt to back-off to fewer queues first, rather than giving up immediately. Complain mightily on the console.
show more ...
|
Revision tags: v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1 |
|
#
1014e37c |
| 11-Jul-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Do a better job backing out of probe errors
* AWS provides nvme interfaces which might not have attached volumes. Improve stability on AWS nvme (still needs more work). Play nice when qu
nvme - Do a better job backing out of probe errors
* AWS provides nvme interfaces which might not have attached volumes. Improve stability on AWS nvme (still needs more work). Play nice when queue creation fails. Do a better job tracking MSI-X interrupt installation and removal.
show more ...
|
Revision tags: v5.2.2, v5.2.1 |
|
#
049f03b7 |
| 13-Apr-2018 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Improve likelihood of dump success
* Get rid of blocking locks in the dump path. This can cause severe problems if curthread is the idle thread.
* Set aside a request on every queue for d
nvme - Improve likelihood of dump success
* Get rid of blocking locks in the dump path. This can cause severe problems if curthread is the idle thread.
* Set aside a request on every queue for dump operation. This request can be retrieved and returned trivially.
* Add a few functions to support dump requests and polling for completionsssssssss.
* Remove the unused 'ticks' argument from nvme_wait_request().
show more ...
|
Revision tags: v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0 |
|
#
b9045046 |
| 05-Oct-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Change index fields from unsigned to signed
* We use a signed trick for (j), fix the code so it actually works.
* The chipset field used to index (i) cannot exceed 1024 anyway.
Reported-b
kernel - Change index fields from unsigned to signed
* We use a signed trick for (j), fix the code so it actually works.
* The chipset field used to index (i) cannot exceed 1024 anyway.
Reported-by: lubos Bug #3020
show more ...
|
Revision tags: v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1 |
|
#
2d746837 |
| 19-May-2017 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Fix interrupt pin support when MSI-X is unavailable.
* Real hardware (so far) all supports MSI-X, but VMs emulating NVMe have been found not to.
* Fix numerous assertions that were getting
nvme - Fix interrupt pin support when MSI-X is unavailable.
* Real hardware (so far) all supports MSI-X, but VMs emulating NVMe have been found not to.
* Fix numerous assertions that were getting hit due to the non-MSI-X case not installing the sc->cputovect[i] mapping.
Install a fake cputovect[] mapping. This mapping is primarily to allow multiple submission queues (per-cpu when possible). Completion queues will be further limited to reduce loop-check overheads.
* For the non-MSI-X case, limit the number of completion queues to 4, since there is really no point having more there being only one interrupt vector. We use 4 to allow the chipset side to run optimally even though it is not necessarily useful to have that many on the cpu side. Though to be fair, in cases where the cpu-side driver polls for completions, having multiple completion queues CAN help even if there is only one interrupt as each completion queue is separately locked.
* Properly set the interrupt masking registers in the non-MSI-X case (probably not needed). Note that these registers are explicitly not supposed to be accessed by the host when MSI-X is used.
* Fix a bug where the maximum number of queues possible was one too high. This limit is *never* reached anyway, but fix the code just in case.
* Fix a bug where we assumed that the number of queues returned by the NVME_FID_NUMQUEUES command would always be <= the number of queues requested. In fact, this is not the case for at least one chipset or for some VM emulations. Limit the returned values to no more than the requested values.
* Set the queue->nqe field last when creating a completion queue. This prevents interrupts which poll multiple completion queues from attempting to poll a completion queue that has not finished getting set up. This case always occurs when pin-based interrupts are used and sometimes occurs when MSI-X vectors are used, depending on the topology.
* NOTES ON DISABLING MSI-X. Not all chipsets implement pin-based interrupts properly for NVMe. The BPX NVMe card, for example, appears to just leave the pin interrupt in a stuck state (the chipset docs say the level interrupt is cleared once all doorbell heads are synchronized for the completion queues, but this does not happen). So NVMe users should not explicitly disable MSI-X when it is nominally supported, except for testing.
Reported-by: sinetek
show more ...
|
Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0 |
|
#
14676c8a |
| 14-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Distribute queues in rw-sep map.
* Instead of forcing all cpus to share the same submission queue in the ncpus > nsubqs case, distribute available submission queues to the cpus to try t
kernel - Distribute queues in rw-sep map.
* Instead of forcing all cpus to share the same submission queue in the ncpus > nsubqs case, distribute available submission queues to the cpus to try to reduce conflicts.
* Will also distribute available completion queues to the submission queues.
show more ...
|
#
235fb4ac |
| 14-Jul-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Fix comq mappings when too many cpus.
* Fix the rw-sep, minimal, and basic comq mappings. These mappings occur when there are too many cpus to accomodate available submission and complet
nvme - Fix comq mappings when too many cpus.
* Fix the rw-sep, minimal, and basic comq mappings. These mappings occur when there are too many cpus to accomodate available submission and completion queues.
* Fixes bug where a bad completion queue was being specified in the creation of a submission queue.
show more ...
|
#
92a276eb |
| 24-Jun-2016 |
Sepherosa Ziehau <sephe@dragonflybsd.org> |
nvme: Use high frequency interrupt for CQ processing
Suggested-by: dillon@ Reviewed-by: dillon@
|
#
43844926 |
| 18-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Implement ioctl support to retrieve log pages
* Implement general ioctl support
* Implement NVMEIOCGETLOG which retrieves a log page.
|
#
911f2d4f |
| 18-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Fail gracefully if chip cannot be enabled
* Fail gracefully rather than lockup if the chip refuses to enable. The admin thread is not running yet, so don't wait forever for it to 'stop'.
|
#
6a504c32 |
| 10-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Add kernel dump support
* Add kernel dump support to the nvme driver.
* Issue a FLUSH and chip shutdown sequence after the dump completes.
|
#
23bba4b5 |
| 08-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Add interrupt coalescing support
* Add interrupt coalescing support. However, disable it in the code for now by setting its parameters to 0. I tried minimal parameters (time set to 1 wh
nvme - Add interrupt coalescing support
* Add interrupt coalescing support. However, disable it in the code for now by setting its parameters to 0. I tried minimal parameters (time set to 1 which is 100uS and aggregation threshold set to 4) and it completely destroyed performance in all my tests on the Intel 750.
Even in tests where the interrupt rate was less than 10,000/sec, the intel controller is clearly implementing a broken algorithm and is actually enforcing that 100uS of latency even if the interrupt rate has not exceeded the rate. So even relatively large transfers had horrible performance.
So for now the code is in, but its turned off.
show more ...
|
#
34885004 |
| 08-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Adjust queue mapping
* Add more fu to the manual page.
* Adjust queue mappings. Get rid of the multi-priority read and write for the optimal mapping (4 queues per cpu). Instead just have
nvme - Adjust queue mapping
* Add more fu to the manual page.
* Adjust queue mappings. Get rid of the multi-priority read and write for the optimal mapping (4 queues per cpu). Instead just have 2 (a read and a write queue), which allows the card to use an optimal mapping when 31 queues are supported.
show more ...
|
#
70394f3f |
| 07-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Check admin_cap
* Check admin command capabilities, do not attempt to query the controller list or namespace list if namespace management is not supported.
NOTE: The Intel 750 returns to
nvme - Check admin_cap
* Check admin command capabilities, do not attempt to query the controller list or namespace list if namespace management is not supported.
NOTE: The Intel 750 returns total garbage for unsupported ns management commands without return any error code in the status.
* Minor man-page fixes.
show more ...
|
#
28a5c21e |
| 06-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Fix minor cpu mapping issues
* Fix some issues with the cpu mapping. cpu 0 was not getting properly accounted for due to an array overflow bug. And do a few other things.
* With these ch
nvme - Fix minor cpu mapping issues
* Fix some issues with the cpu mapping. cpu 0 was not getting properly accounted for due to an array overflow bug. And do a few other things.
* With these changes, extints are nicely distributed across all cpus on large concurrent workloads, and IPIs are minimal-to-none.
show more ...
|
#
18d2384b |
| 06-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Implement MSIX and reverse comq mapping
* Implement MSIX. Map completion queues to cpus via a rotation.
* Adjust the comq mapping code. For now prioritize assigning a 1:1 cpu mapping for
nvme - Implement MSIX and reverse comq mapping
* Implement MSIX. Map completion queues to cpus via a rotation.
* Adjust the comq mapping code. For now prioritize assigning a 1:1 cpu mapping for submission and completion queues over creating separate queues for reads and writes.
* Tested, systat -pv 1 shows this is capable of pushing 50,000+ interrupts per second on EACH cpu (all 8 in the xeon box I tested), and run 250,000 IOPS x 2 cards (500,000 IOPS) using interrupt based comq handling.
show more ...
|
#
7d057aea |
| 05-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Iterate disk units for multiple devices
* Just iterate disk units starting at 0 for now, instead of trying to use the namespace id, preventing collisions when multiple nvme controllers ar
nvme - Iterate disk units for multiple devices
* Just iterate disk units starting at 0 for now, instead of trying to use the namespace id, preventing collisions when multiple nvme controllers are present.
show more ...
|
#
11759406 |
| 05-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
nvme - Flesh out the driver more
* Handle the case where there are an insufficient number of queue entries available to handle BIOs (tested by forcing maxqe to 4).
* Issue delete queue commands a
nvme - Flesh out the driver more
* Handle the case where there are an insufficient number of queue entries available to handle BIOs (tested by forcing maxqe to 4).
* Issue delete queue commands and issue and wait for controller shutdown on a normal halt/reboot as per spec.
* Disallow new device open()s during unload.
show more ...
|
#
7e782064 |
| 05-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Flesh out nvme interrupts (non-msi for now)
* MSI/MSIX not working currently so just turn it off for the moment.
* Normal interrupt now operational. Implement a real nvme_intr() and Cle
kernel - Flesh out nvme interrupts (non-msi for now)
* MSI/MSIX not working currently so just turn it off for the moment.
* Normal interrupt now operational. Implement a real nvme_intr() and Cleanup some of our polling hacks now that interrupts work.
* Rearrange shutdown so admin polling continues to work while the devfs disk infrastructure is being torn down.
* Tests with this little samsung mini-pcie nvme card:
120,000 IOPS (concurrent 512 byte dd) 1.5 GBytes/sec (sequential read uncompressable file through filesystem) 1.5 GBytes/sec reading via tar.
test40# ls -la /mnt2/test.dat -rw-r--r-- 1 root wheel 7516192768 Jun 4 23:50 /mnt2/test.dat test40# time tar cf /dev/null /mnt2/test.dat 0.062u 3.937s 0:04.84 82.4% 28+69k 28642+0io 0pf+0w (from media) 0.164u 1.367s 0:01.81 83.9% 29+71k 978+0io 0pf+0w (from buffer cache)
show more ...
|
#
97a077a0 |
| 05-Jun-2016 |
Matthew Dillon <dillon@apollo.backplane.com> |
kernel - Initial native DragonFly NVME driver commit
* Initial from-scratch NVME implementation using the NVM Express 1.2a chipset specification pdf. Nothing ported from anywhere else.
Basic i
kernel - Initial native DragonFly NVME driver commit
* Initial from-scratch NVME implementation using the NVM Express 1.2a chipset specification pdf. Nothing ported from anywhere else.
Basic implementation.
* Not yet connected to the build, interrupts are not yet functional (it currently just polls at 100hz for testing), some additional error handling is needed, and we will need ioctl support and a userland utility to do various administrative tasks like formatting.
* Near full header spec keyed in including the bits we don't use (yet).
* Full SMP BIO interface and four different queue topologies implemented depending on how many queues the chipset lets us create. The best is ncpus * 4 queues, i.e. (low, high priority) x (read, write) per cpu. The second best is just (low, high priority) x (read, write) shared between all cpus.
Extremely low BIO overhead. Full strategy support and beginnings of optimizations for low-latency I/Os (currently a hack).
* Initial testing with multiple concurrent sequential dd's on a little samsung nvme mini-pcie card:
1.2 GBytes/sec 16KB 2.0 GBytes/sec 32KB 2.5 GBytes/sec 64KB
show more ...
|