History log of /dragonfly/sys/dev/disk/nvme/nvme_admin.c (Results 1 – 20 of 20)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.2.1, v6.2.0, v6.3.0
# b5b25080 25-Dec-2021 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Back-off if driver lies about reported queue limits

* Apparently some low-rent nvme controllers lie about how many
queues they support.

* If the nvme driver lies about queue support and a

nvme - Back-off if driver lies about reported queue limits

* Apparently some low-rent nvme controllers lie about how many
queues they support.

* If the nvme driver lies about queue support and a queue
create command fails, attempt to back-off to fewer queues
first, rather than giving up immediately. Complain mightily
on the console.

show more ...


Revision tags: v6.0.1, v6.0.0, v6.0.0rc1, v6.1.0, v5.8.3, v5.8.2, v5.8.1, v5.8.0, v5.9.0, v5.8.0rc1, v5.6.3, v5.6.2, v5.6.1, v5.6.0, v5.6.0rc1, v5.7.0, v5.4.3, v5.4.2, v5.4.1, v5.4.0, v5.5.0, v5.4.0rc1
# 1014e37c 11-Jul-2018 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Do a better job backing out of probe errors

* AWS provides nvme interfaces which might not have attached volumes.
Improve stability on AWS nvme (still needs more work). Play nice
when qu

nvme - Do a better job backing out of probe errors

* AWS provides nvme interfaces which might not have attached volumes.
Improve stability on AWS nvme (still needs more work). Play nice
when queue creation fails. Do a better job tracking MSI-X
interrupt installation and removal.

show more ...


Revision tags: v5.2.2, v5.2.1
# 049f03b7 13-Apr-2018 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Improve likelihood of dump success

* Get rid of blocking locks in the dump path. This can cause severe
problems if curthread is the idle thread.

* Set aside a request on every queue for d

nvme - Improve likelihood of dump success

* Get rid of blocking locks in the dump path. This can cause severe
problems if curthread is the idle thread.

* Set aside a request on every queue for dump operation. This
request can be retrieved and returned trivially.

* Add a few functions to support dump requests and polling for
completionsssssssss.

* Remove the unused 'ticks' argument from nvme_wait_request().

show more ...


Revision tags: v5.2.0, v5.3.0, v5.2.0rc, v5.0.2, v5.0.1, v5.0.0
# b9045046 05-Oct-2017 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Change index fields from unsigned to signed

* We use a signed trick for (j), fix the code so it actually works.

* The chipset field used to index (i) cannot exceed 1024 anyway.

Reported-b

kernel - Change index fields from unsigned to signed

* We use a signed trick for (j), fix the code so it actually works.

* The chipset field used to index (i) cannot exceed 1024 anyway.

Reported-by: lubos Bug #3020

show more ...


Revision tags: v5.0.0rc2, v5.1.0, v5.0.0rc1, v4.8.1
# 2d746837 19-May-2017 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Fix interrupt pin support when MSI-X is unavailable.

* Real hardware (so far) all supports MSI-X, but VMs emulating NVMe
have been found not to.

* Fix numerous assertions that were getting

nvme - Fix interrupt pin support when MSI-X is unavailable.

* Real hardware (so far) all supports MSI-X, but VMs emulating NVMe
have been found not to.

* Fix numerous assertions that were getting hit due to the non-MSI-X
case not installing the sc->cputovect[i] mapping.

Install a fake cputovect[] mapping. This mapping is primarily to allow
multiple submission queues (per-cpu when possible). Completion queues
will be further limited to reduce loop-check overheads.

* For the non-MSI-X case, limit the number of completion queues to 4,
since there is really no point having more there being only one interrupt
vector. We use 4 to allow the chipset side to run optimally even though
it is not necessarily useful to have that many on the cpu side. Though
to be fair, in cases where the cpu-side driver polls for completions,
having multiple completion queues CAN help even if there is only one
interrupt as each completion queue is separately locked.

* Properly set the interrupt masking registers in the non-MSI-X case
(probably not needed). Note that these registers are explicitly not
supposed to be accessed by the host when MSI-X is used.

* Fix a bug where the maximum number of queues possible was one too high.
This limit is *never* reached anyway, but fix the code just in case.

* Fix a bug where we assumed that the number of queues returned by the
NVME_FID_NUMQUEUES command would always be <= the number of queues
requested. In fact, this is not the case for at least one chipset
or for some VM emulations. Limit the returned values to no more than
the requested values.

* Set the queue->nqe field last when creating a completion queue. This
prevents interrupts which poll multiple completion queues from attempting
to poll a completion queue that has not finished getting set up. This
case always occurs when pin-based interrupts are used and sometimes
occurs when MSI-X vectors are used, depending on the topology.

* NOTES ON DISABLING MSI-X. Not all chipsets implement pin-based interrupts
properly for NVMe. The BPX NVMe card, for example, appears to just leave
the pin interrupt in a stuck state (the chipset docs say the level
interrupt is cleared once all doorbell heads are synchronized for the
completion queues, but this does not happen). So NVMe users should not
explicitly disable MSI-X when it is nominally supported, except for
testing.

Reported-by: sinetek

show more ...


Revision tags: v4.8.0, v4.6.2, v4.9.0, v4.8.0rc, v4.6.1, v4.6.0, v4.6.0rc2, v4.6.0rc, v4.7.0
# 14676c8a 14-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Distribute queues in rw-sep map.

* Instead of forcing all cpus to share the same submission queue in
the ncpus > nsubqs case, distribute available submission queues
to the cpus to try t

kernel - Distribute queues in rw-sep map.

* Instead of forcing all cpus to share the same submission queue in
the ncpus > nsubqs case, distribute available submission queues
to the cpus to try to reduce conflicts.

* Will also distribute available completion queues to the submission
queues.

show more ...


# 235fb4ac 14-Jul-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Fix comq mappings when too many cpus.

* Fix the rw-sep, minimal, and basic comq mappings. These mappings occur
when there are too many cpus to accomodate available submission and
complet

nvme - Fix comq mappings when too many cpus.

* Fix the rw-sep, minimal, and basic comq mappings. These mappings occur
when there are too many cpus to accomodate available submission and
completion queues.

* Fixes bug where a bad completion queue was being specified in the creation
of a submission queue.

show more ...


# 92a276eb 24-Jun-2016 Sepherosa Ziehau <sephe@dragonflybsd.org>

nvme: Use high frequency interrupt for CQ processing

Suggested-by: dillon@
Reviewed-by: dillon@


# 43844926 18-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Implement ioctl support to retrieve log pages

* Implement general ioctl support

* Implement NVMEIOCGETLOG which retrieves a log page.


# 911f2d4f 18-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Fail gracefully if chip cannot be enabled

* Fail gracefully rather than lockup if the chip refuses to enable.
The admin thread is not running yet, so don't wait forever for
it to 'stop'.


# 6a504c32 10-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Add kernel dump support

* Add kernel dump support to the nvme driver.

* Issue a FLUSH and chip shutdown sequence after the dump completes.


# 23bba4b5 08-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Add interrupt coalescing support

* Add interrupt coalescing support. However, disable it in the code for
now by setting its parameters to 0. I tried minimal parameters (time
set to 1 wh

nvme - Add interrupt coalescing support

* Add interrupt coalescing support. However, disable it in the code for
now by setting its parameters to 0. I tried minimal parameters (time
set to 1 which is 100uS and aggregation threshold set to 4) and it
completely destroyed performance in all my tests on the Intel 750.

Even in tests where the interrupt rate was less than 10,000/sec, the
intel controller is clearly implementing a broken algorithm and is
actually enforcing that 100uS of latency even if the interrupt rate
has not exceeded the rate. So even relatively large transfers had
horrible performance.

So for now the code is in, but its turned off.

show more ...


# 34885004 08-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Adjust queue mapping

* Add more fu to the manual page.

* Adjust queue mappings. Get rid of the multi-priority read and write
for the optimal mapping (4 queues per cpu). Instead just have

nvme - Adjust queue mapping

* Add more fu to the manual page.

* Adjust queue mappings. Get rid of the multi-priority read and write
for the optimal mapping (4 queues per cpu). Instead just have 2 (a read
and a write queue), which allows the card to use an optimal mapping
when 31 queues are supported.

show more ...


# 70394f3f 07-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Check admin_cap

* Check admin command capabilities, do not attempt to query the controller
list or namespace list if namespace management is not supported.

NOTE: The Intel 750 returns to

nvme - Check admin_cap

* Check admin command capabilities, do not attempt to query the controller
list or namespace list if namespace management is not supported.

NOTE: The Intel 750 returns total garbage for unsupported ns management
commands without return any error code in the status.

* Minor man-page fixes.

show more ...


# 28a5c21e 06-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Fix minor cpu mapping issues

* Fix some issues with the cpu mapping. cpu 0 was not getting properly
accounted for due to an array overflow bug. And do a few other things.

* With these ch

nvme - Fix minor cpu mapping issues

* Fix some issues with the cpu mapping. cpu 0 was not getting properly
accounted for due to an array overflow bug. And do a few other things.

* With these changes, extints are nicely distributed across all cpus on
large concurrent workloads, and IPIs are minimal-to-none.

show more ...


# 18d2384b 06-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Implement MSIX and reverse comq mapping

* Implement MSIX. Map completion queues to cpus via a rotation.

* Adjust the comq mapping code. For now prioritize assigning a 1:1 cpu
mapping for

nvme - Implement MSIX and reverse comq mapping

* Implement MSIX. Map completion queues to cpus via a rotation.

* Adjust the comq mapping code. For now prioritize assigning a 1:1 cpu
mapping for submission and completion queues over creating separate
queues for reads and writes.

* Tested, systat -pv 1 shows this is capable of pushing 50,000+ interrupts
per second on EACH cpu (all 8 in the xeon box I tested), and run
250,000 IOPS x 2 cards (500,000 IOPS) using interrupt based comq handling.

show more ...


# 7d057aea 05-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Iterate disk units for multiple devices

* Just iterate disk units starting at 0 for now, instead of trying to
use the namespace id, preventing collisions when multiple nvme
controllers ar

nvme - Iterate disk units for multiple devices

* Just iterate disk units starting at 0 for now, instead of trying to
use the namespace id, preventing collisions when multiple nvme
controllers are present.

show more ...


# 11759406 05-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

nvme - Flesh out the driver more

* Handle the case where there are an insufficient number of queue entries
available to handle BIOs (tested by forcing maxqe to 4).

* Issue delete queue commands a

nvme - Flesh out the driver more

* Handle the case where there are an insufficient number of queue entries
available to handle BIOs (tested by forcing maxqe to 4).

* Issue delete queue commands and issue and wait for controller shutdown
on a normal halt/reboot as per spec.

* Disallow new device open()s during unload.

show more ...


# 7e782064 05-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Flesh out nvme interrupts (non-msi for now)

* MSI/MSIX not working currently so just turn it off for the moment.

* Normal interrupt now operational. Implement a real nvme_intr() and
Cle

kernel - Flesh out nvme interrupts (non-msi for now)

* MSI/MSIX not working currently so just turn it off for the moment.

* Normal interrupt now operational. Implement a real nvme_intr() and
Cleanup some of our polling hacks now that interrupts work.

* Rearrange shutdown so admin polling continues to work while the devfs
disk infrastructure is being torn down.

* Tests with this little samsung mini-pcie nvme card:

120,000 IOPS (concurrent 512 byte dd)
1.5 GBytes/sec (sequential read uncompressable file through filesystem)
1.5 GBytes/sec reading via tar.

test40# ls -la /mnt2/test.dat
-rw-r--r-- 1 root wheel 7516192768 Jun 4 23:50 /mnt2/test.dat
test40# time tar cf /dev/null /mnt2/test.dat
0.062u 3.937s 0:04.84 82.4% 28+69k 28642+0io 0pf+0w (from media)
0.164u 1.367s 0:01.81 83.9% 29+71k 978+0io 0pf+0w (from buffer cache)

show more ...


# 97a077a0 05-Jun-2016 Matthew Dillon <dillon@apollo.backplane.com>

kernel - Initial native DragonFly NVME driver commit

* Initial from-scratch NVME implementation using the NVM Express 1.2a
chipset specification pdf. Nothing ported from anywhere else.

Basic i

kernel - Initial native DragonFly NVME driver commit

* Initial from-scratch NVME implementation using the NVM Express 1.2a
chipset specification pdf. Nothing ported from anywhere else.

Basic implementation.

* Not yet connected to the build, interrupts are not yet functional
(it currently just polls at 100hz for testing), some additional error
handling is needed, and we will need ioctl support and a userland utility
to do various administrative tasks like formatting.

* Near full header spec keyed in including the bits we don't use (yet).

* Full SMP BIO interface and four different queue topologies implemented
depending on how many queues the chipset lets us create. The best is
ncpus * 4 queues, i.e. (low, high priority) x (read, write) per cpu.
The second best is just (low, high priority) x (read, write) shared between
all cpus.

Extremely low BIO overhead. Full strategy support and beginnings of
optimizations for low-latency I/Os (currently a hack).

* Initial testing with multiple concurrent sequential dd's on a little
samsung nvme mini-pcie card:

1.2 GBytes/sec 16KB
2.0 GBytes/sec 32KB
2.5 GBytes/sec 64KB

show more ...