#
55251fbd |
| 28-Mar-2024 |
Damien Le Moal <dlemoal@kernel.org> |
block: Do not force full zone append completion in req_bio_endio()
This reverts commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988.
Partial zone append completions cannot be supported as there is no g
block: Do not force full zone append completion in req_bio_endio()
This reverts commit 748dc0b65ec2b4b7b3dbd7befcc4a54fdcac7988.
Partial zone append completions cannot be supported as there is no guarantees that the fragmented data will be written sequentially in the same manner as with a full command. Commit 748dc0b65ec2 ("block: fix partial zone append completion handling in req_bio_endio()") changed req_bio_endio() to always advance a partially failed BIO by its full length, but this can lead to incorrect accounting. So revert this change and let low level device drivers handle this case by always failing completely zone append operations. With this revert, users will still see an IO error for a partially completed zone append BIO.
Fixes: 748dc0b65ec2 ("block: fix partial zone append completion handling in req_bio_endio()") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240328004409.594888-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
ec30b461 |
| 28-Feb-2024 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: don't change nr_hw_queues and nr_maps for kdump kernel
For most of ARCHs, 'nr_cpus=1' is passed for kdump kernel, so nr_hw_queues for each mapping is supposed to be 1 already.
More importan
blk-mq: don't change nr_hw_queues and nr_maps for kdump kernel
For most of ARCHs, 'nr_cpus=1' is passed for kdump kernel, so nr_hw_queues for each mapping is supposed to be 1 already.
More importantly, this way may cause trouble for driver, because blk-mq and driver see different queue mapping since driver should setup hardware queue setting before calling into allocating blk-mq tagset.
So not overriding nr_hw_queues and nr_maps for kdump kernel.
Cc: Wen Xiong <wenxiong@us.ibm.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240228040857.306483-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
af550e4c |
| 23-Feb-2024 |
Qais Yousef <qyousef@layalina.io> |
block/blk-mq: Don't complete locally if capacities are different
The logic in blk_mq_complete_need_ipi() assumes SMP systems where all CPUs have equal compute capacities and only LLC cache can make
block/blk-mq: Don't complete locally if capacities are different
The logic in blk_mq_complete_need_ipi() assumes SMP systems where all CPUs have equal compute capacities and only LLC cache can make a different on perceived performance. But this assumption falls apart on HMP systems where LLC is shared, but the CPUs have different capacities. Staying local then can have a big performance impact if the IO request was done from a CPU with higher capacity but the interrupt is serviced on a lower capacity CPU.
Use the new cpus_equal_capacity() function to check if we need to send an IPI.
Without the patch I see the BLOCK softirq always running on little cores (where the hardirq is serviced). With it I can see it running on all cores.
This was noticed after the topology change [1] where now on a big.LITTLE we truly get that the LLC is shared between all cores where as in the past it was being misrepresented for historical reasons. The logic exposed a missing dependency on capacities for such systems where there can be a big performance difference between the CPUs.
This of course introduced a noticeable change in behavior depending on how the topology is presented. Leading to regressions in some workloads as the performance of the BLOCK softirq on littles can be noticeably worse on some platforms.
Worth noting that we could have checked for capacities being greater than or equal instead for equality. This will lead to favouring higher performance always. But opted for equality instead to match the performance of the requester without making an assumption that can lead to power trade-offs which these systems tend to be sensitive about. If the requester would like to run faster, it's better to rely on the scheduler to give the IO requester via some facility to run on a faster core; and then if the interrupt triggered on a CPU with different capacity we'll make sure to match the performance the requester is supposed to run at.
[1] https://lpc.events/event/16/contributions/1342/attachments/962/1883/LPC-2022-Android-MC-Phantom-Domains.pdf
Signed-off-by: Qais Yousef <qyousef@layalina.io> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240223155749.2958009-3-qyousef@layalina.io Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0eb4db47 |
| 23-Feb-2024 |
Keith Busch <kbusch@kernel.org> |
block: io wait hang check helper
This is the same in two places, and another will be added soon. Create a helper for it.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <
block: io wait hang check helper
This is the same in two places, and another will be added soon. Create a helper for it.
Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240223155910.3622666-4-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
27e32cd2 |
| 13-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
block: pass a queue_limits argument to blk_mq_alloc_disk
Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of sett
block: pass a queue_limits argument to blk_mq_alloc_disk
Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-11-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
9ac4dd8c |
| 13-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
block: pass a queue_limits argument to blk_mq_init_queue
Pass a queue_limits to blk_mq_init_queue and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of sett
block: pass a queue_limits argument to blk_mq_init_queue
Pass a queue_limits to blk_mq_init_queue and apply it if non-NULL. This will allow allocating queues with valid queue limits instead of setting the values one at a time later.
Also rename the function to blk_mq_alloc_queue as that is a much better name for a function that allocates a queue and always pass the queuedata argument instead of having a separate version for the extra argument.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
ad751ba1 |
| 13-Feb-2024 |
Christoph Hellwig <hch@lst.de> |
block: pass a queue_limits argument to blk_alloc_queue
Pass a queue_limits to blk_alloc_queue and apply it after validating and capping the values using blk_validate_limits. This will allow allocat
block: pass a queue_limits argument to blk_alloc_queue
Pass a queue_limits to blk_alloc_queue and apply it after validating and capping the values using blk_validate_limits. This will allow allocating queues with valid queue limits instead of setting the values one at a time later.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240213073425.1621680-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
44981351 |
| 02-Feb-2024 |
Bart Van Assche <bvanassche@acm.org> |
block, fs: Restore the per-bio/request data lifetime fields
Restore support for passing data lifetime information from filesystems to block drivers. This patch reverts commit b179c98f7697 ("block: R
block, fs: Restore the per-bio/request data lifetime fields
Restore support for passing data lifetime information from filesystems to block drivers. This patch reverts commit b179c98f7697 ("block: Remove request.write_hint") and commit c75e707fe1aa ("block: remove the per-bio/request write hint").
This patch does not modify the size of struct bio because the new bi_write_hint member fills a hole in struct bio. pahole reports the following for struct bio on an x86_64 system with this patch applied:
/* size: 112, cachelines: 2, members: 20 */ /* sum members: 110, holes: 1, sum holes: 2 */ /* last cacheline: 48 bytes */
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240202203926.2478590-7-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
08420cf7 |
| 15-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
block: add blk_time_get_ns() and blk_time_get() helpers
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the curr
block: add blk_time_get_ns() and blk_time_get() helpers
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the current time in nanoseconds or as ktime.
No functional changes intended, this patch just wraps ktime_get_ns() and ktime_get() with a block helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
72e84e90 |
| 24-Jan-2024 |
Christoph Hellwig <hch@lst.de> |
blk-mq: special case cached requests less
Share the main merge / split / integrity preparation code between the cached request vs newly allocated request cases, and add comments explaining the cache
blk-mq: special case cached requests less
Share the main merge / split / integrity preparation code between the cached request vs newly allocated request cases, and add comments explaining the cached request handling.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240124092658.2258309-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
337e89fe |
| 24-Jan-2024 |
Christoph Hellwig <hch@lst.de> |
blk-mq: introduce a blk_mq_peek_cached_request helper
Add a new helper to check if there is suitable cached request in blk_mq_submit_bio. This removes open coded logic in blk_mq_submit_bio and move
blk-mq: introduce a blk_mq_peek_cached_request helper
Add a new helper to check if there is suitable cached request in blk_mq_submit_bio. This removes open coded logic in blk_mq_submit_bio and moves some checks that so far are in blk_mq_use_cached_rq to be performed earlier. This avoids the case where we first do check with the cached request but then later end up allocating a new one anyway and need to grab a queue reference.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240124092658.2258309-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0f299da5 |
| 24-Jan-2024 |
Christoph Hellwig <hch@lst.de> |
blk-mq: move blk_mq_attempt_bio_merge out blk_mq_get_new_requests
blk_mq_attempt_bio_merge has nothing to do with allocating a new request, it avoids allocating a new request. Move the call out of
blk-mq: move blk_mq_attempt_bio_merge out blk_mq_get_new_requests
blk_mq_attempt_bio_merge has nothing to do with allocating a new request, it avoids allocating a new request. Move the call out of blk_mq_get_new_requests and into the only caller.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20240124092658.2258309-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f3c89983 |
| 30-Jan-2024 |
Hongyu Jin <hongyu.jin@unisoc.com> |
block: Fix where bio IO priority gets set
Commit 82b74cac2849 ("blk-ioprio: Convert from rqos policy to direct call") pushed setting bio I/O priority down into blk_mq_submit_bio() -- which is too lo
block: Fix where bio IO priority gets set
Commit 82b74cac2849 ("blk-ioprio: Convert from rqos policy to direct call") pushed setting bio I/O priority down into blk_mq_submit_bio() -- which is too low within block core's submit_bio() because it skips setting I/O priority for block drivers that implement fops->submit_bio() (e.g. DM, MD, etc).
Fix this by moving bio_set_ioprio() up from blk-mq.c to blk-core.c and call it from submit_bio(). This ensures all block drivers call bio_set_ioprio() during initial bio submission.
Fixes: a78418e6a04c ("block: Always initialize bio IO priority on submit") Co-developed-by: Yibin Ding <yibin.ding@unisoc.com> Signed-off-by: Yibin Ding <yibin.ding@unisoc.com> Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Mikulas Patocka <mpatocka@redhat.com> [snitzer: revised commit header] Signed-off-by: Mike Snitzer <snitzer@kernel.org> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240130202638.62600-2-snitzer@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
7b4f36cd |
| 12-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
block: ensure we hold a queue reference when using queue limits
q_usage_counter is the only thing preventing us from the limits changing under us in __bio_split_to_limits, but blk_mq_submit_bio does
block: ensure we hold a queue reference when using queue limits
q_usage_counter is the only thing preventing us from the limits changing under us in __bio_split_to_limits, but blk_mq_submit_bio doesn't hold it while calling into it.
Move the splitting inside the region where we know we've got a queue reference. Ideally this could still remain a shared section of code, but let's keep the fix simple and defer any refactoring here to later.
Reported-by: Christoph Hellwig <hch@lst.de> Fixes: 900e08075202 ("block: move queue enter logic into blk_mq_submit_bio()") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
309ce674 |
| 11-Jan-2024 |
Christoph Hellwig <hch@lst.de> |
blk-mq: rename blk_mq_can_use_cached_rq
blk_mq_can_use_cached_rq doesn't just check if we can use the request, but also performs the work to actually use it. Remove the _can in the naming, and impr
blk-mq: rename blk_mq_can_use_cached_rq
blk_mq_can_use_cached_rq doesn't just check if we can use the request, but also performs the work to actually use it. Remove the _can in the naming, and improve the comment describing the function.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240111135705.2155518-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
5266caaf |
| 12-Jan-2024 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: fix IO hang from sbitmap wakeup race
In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered with the following blk_mq_get_driver_tag() in case of getting driver tag failure.
Then i
blk-mq: fix IO hang from sbitmap wakeup race
In blk_mq_mark_tag_wait(), __add_wait_queue() may be re-ordered with the following blk_mq_get_driver_tag() in case of getting driver tag failure.
Then in __sbitmap_queue_wake_up(), waitqueue_active() may not observe the added waiter in blk_mq_mark_tag_wait() and wake up nothing, meantime blk_mq_mark_tag_wait() can't get driver tag successfully.
This issue can be reproduced by running the following test in loop, and fio hang can be observed in < 30min when running it on my test VM in laptop.
modprobe -r scsi_debug modprobe scsi_debug delay=0 dev_size_mb=4096 max_queue=1 host_max_queue=1 submit_queues=4 dev=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename` fio --filename=/dev/"$dev" --direct=1 --rw=randrw --bs=4k --iodepth=1 \ --runtime=100 --numjobs=40 --time_based --name=test \ --ioengine=libaio
Fix the issue by adding one explicit barrier in blk_mq_mark_tag_wait(), which is just fine in case of running out of tag.
Cc: Jan Kara <jack@suse.cz> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Reported-by: Changhui Zhong <czhong@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240112122626.4181044-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
748dc0b6 |
| 10-Jan-2024 |
Damien Le Moal <dlemoal@kernel.org> |
block: fix partial zone append completion handling in req_bio_endio()
Partial completions of zone append request is not allowed but if a zone append completion indicates a number of completed bytes
block: fix partial zone append completion handling in req_bio_endio()
Partial completions of zone append request is not allowed but if a zone append completion indicates a number of completed bytes different from the original BIO size, only the BIO status is set to error. This leads to bio_advance() not setting the BIO size to 0 and thus to not call bio_endio() at the end of req_bio_endio().
Make sure a partially completed zone append is failed and completed immediately by forcing the completed number of bytes (nbytes) to be equal to the BIO size, thus ensuring that bio_endio() is called.
Fixes: 297db731847e ("block: fix req_bio_endio append error handling") Cc: stable@kernel.vger.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240110092942.442334-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
847c5bcd |
| 23-Nov-2023 |
Kundan Kumar <kundan.kumar@samsung.com> |
block: skip QUEUE_FLAG_STATS and rq-qos for passthrough io
Write-back throttling (WBT) enables QUEUE_FLAG_STATS on the request queue. But WBT does not make sense for passthrough io, so skip QUEUE_FL
block: skip QUEUE_FLAG_STATS and rq-qos for passthrough io
Write-back throttling (WBT) enables QUEUE_FLAG_STATS on the request queue. But WBT does not make sense for passthrough io, so skip QUEUE_FLAG_STATS processing.
Also skip rq_qos_issue/done for passthrough io.
Overall, the change gives ~11% hike in peak performance.
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Link: https://lore.kernel.org/r/20231123190331.7934-1-kundan.kumar@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0e4237ae |
| 01-Dec-2023 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: don't count completed flush data request as inflight in case of quiesce
Request queue quiesce may interrupt flush sequence, and the original request may have been marked as COMPLETE, but can
blk-mq: don't count completed flush data request as inflight in case of quiesce
Request queue quiesce may interrupt flush sequence, and the original request may have been marked as COMPLETE, but can't get finished because of queue quiesce.
This way is fine from driver viewpoint, because flush sequence is block layer concept, and it isn't related with driver.
However, driver(such as dm-rq) can call blk_mq_queue_inflight() to count & drain inflight requests, then the wait & drain never gets done because the completed & not-finished flush request is counted as inflight.
Fix this issue by not counting completed flush data request as inflight in case of quiesce.
Cc: Mike Snitzer <snitzer@kernel.org> Cc: David Jeffery <djeffery@redhat.com> Cc: John Pittman <jpittman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20231201085605.577730-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b0077e26 |
| 13-Nov-2023 |
Christoph Hellwig <hch@infradead.org> |
blk-mq: make sure active queue usage is held for bio_integrity_prep()
blk_integrity_unregister() can come if queue usage counter isn't held for one bio with integrity prepared, so this request may b
blk-mq: make sure active queue usage is held for bio_integrity_prep()
blk_integrity_unregister() can come if queue usage counter isn't held for one bio with integrity prepared, so this request may be completed with calling profile->complete_fn, then kernel panic.
Another constraint is that bio_integrity_prep() needs to be called before bio merge.
Fix the issue by:
- call bio_integrity_prep() with one queue usage counter grabbed reliably
- call bio_integrity_prep() before bio merge
Fixes: 900e080752025f00 ("block: move queue enter logic into blk_mq_submit_bio()") Reported-by: Yi Zhang <yi.zhang@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Link: https://lore.kernel.org/r/20231113035231.2708053-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
217b613a |
| 13-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
blk-mq: update driver tags request table when start request
Now we update driver tags request table in blk_mq_get_driver_tag(), so the driver that support queue_rqs() have to update that inflight ta
blk-mq: update driver tags request table when start request
Now we update driver tags request table in blk_mq_get_driver_tag(), so the driver that support queue_rqs() have to update that inflight table by itself.
Move it to blk_mq_start_request(), which is a better place where we setup the deadline for request timeout check. And it's just where the request becomes inflight.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-5-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
434097ee |
| 13-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
blk-mq: support batched queue_rqs() on shared tags queue
Since active requests have been accounted when allocate driver tags, we can remove this limit now.
Signed-off-by: Chengming Zhou <zhouchengm
blk-mq: support batched queue_rqs() on shared tags queue
Since active requests have been accounted when allocate driver tags, we can remove this limit now.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-4-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
48554df6 |
| 13-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
blk-mq: remove RQF_MQ_INFLIGHT
Since the previous patch change to only account active requests when we really allocate the driver tag, the RQF_MQ_INFLIGHT can be removed and no double account proble
blk-mq: remove RQF_MQ_INFLIGHT
Since the previous patch change to only account active requests when we really allocate the driver tag, the RQF_MQ_INFLIGHT can be removed and no double account problem.
1. none elevator: flush request will use the first pending request's driver tag, won't double account.
2. other elevator: flush request will be accounted when allocate driver tag when issue, and will be unaccounted when it put the driver tag.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-3-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b8643d68 |
| 13-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
blk-mq: account active requests when get driver tag
There is a limit that batched queue_rqs() can't work on shared tags queue, since the account of active requests can't be done there.
Now we accou
blk-mq: account active requests when get driver tag
There is a limit that batched queue_rqs() can't work on shared tags queue, since the account of active requests can't be done there.
Now we account the active requests only in blk_mq_get_driver_tag(), which is not the time we get driver tag actually (with none elevator).
To support batched queue_rqs() on shared tags queue, we move the account of active requests to where we get the driver tag:
1. none elevator: blk_mq_get_tags() and blk_mq_get_tag() 2. other elevator: __blk_mq_alloc_driver_tag()
This is clearer and match with the unaccount side, which just happen when we put the driver tag.
The other good point is that we don't need RQF_MQ_INFLIGHT trick anymore, which used to avoid double account of flush request. Now we only account when actually get the driver tag, so all is good. We will remove RQF_MQ_INFLIGHT in the next patch.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-2-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
6be6d112 |
| 08-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
blk-mq: fix tags UAF when shrinking q->nr_hw_queues
When nr_hw_queues shrink, we free the excess tags before realloc'ing hw_ctxs for each queue. During that resize, we may need to access those tags,
blk-mq: fix tags UAF when shrinking q->nr_hw_queues
When nr_hw_queues shrink, we free the excess tags before realloc'ing hw_ctxs for each queue. During that resize, we may need to access those tags, like blk_mq_tag_idle(hctx) will access queue shared tags.
This can cause a slab use-after-free, as reported by KASAN. Fix it by moving the releasing of excess tags to the end.
Fixes: e1dd7bc93029 ("blk-mq: fix tags leak when shrink nr_hw_queues") Reported-by: Yi Zhang <yi.zhang@redhat.com> Closes: https://lore.kernel.org/all/CAHj4cs_CK63uoDpGBGZ6DN4OCTpzkR3UaVgK=LX8Owr8ej2ieQ@mail.gmail.com/ Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20230908005702.2183908-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|