#
49e60333 |
| 17-Jan-2024 |
Bart Van Assche <bvanassche@acm.org> |
blk-mq: Remove the hctx 'run' debugfs attribute
Nobody uses the debugfs hctx 'run' attribute. Hence remove this attribute and also the code that updates the corresponding member variable.
Suggested
blk-mq: Remove the hctx 'run' debugfs attribute
Nobody uses the debugfs hctx 'run' attribute. Hence remove this attribute and also the code that updates the corresponding member variable.
Suggested-by: Jens Axboe <axboe@kernel.dk> Cc: Gabriel Ryan <gabe@cs.columbia.edu> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240117203609.4122520-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
48554df6 |
| 13-Sep-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
blk-mq: remove RQF_MQ_INFLIGHT
Since the previous patch change to only account active requests when we really allocate the driver tag, the RQF_MQ_INFLIGHT can be removed and no double account proble
blk-mq: remove RQF_MQ_INFLIGHT
Since the previous patch change to only account active requests when we really allocate the driver tag, the RQF_MQ_INFLIGHT can be removed and no double account problem.
1. none elevator: flush request will use the first pending request's driver tag, won't double account.
2. other elevator: flush request will be accounted when allocate driver tag when issue, and will be unaccounted when it put the driver tag.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-3-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
4f1731df |
| 10-Jun-2023 |
Yu Kuai <yukuai3@huawei.com> |
blk-mq: fix potential io hang by wrong 'wake_batch'
In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating 'wake_batch' is not atomic:
t1: t2: _blk_mq_tag_busy blk_mq_tag_busy inc
blk-mq: fix potential io hang by wrong 'wake_batch'
In __blk_mq_tag_busy/idle(), updating 'active_queues' and calculating 'wake_batch' is not atomic:
t1: t2: _blk_mq_tag_busy blk_mq_tag_busy inc active_queues // assume 1->2 inc active_queues // 2 -> 3 blk_mq_update_wake_batch // calculate based on 3 blk_mq_update_wake_batch /* calculate based on 2, while active_queues is actually 3. */
Fix this problem by protecting them wih 'tags->lock', this is not a hot path, so performance should not be concerned. And now that all writers are inside the lock, switch 'actives_queues' from atomic to unsigned int.
Fixes: 180dccb0dba4 ("blk-mq: fix tag_get wait task can't be awakened") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230610023043.2559121-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
9a67aa52 |
| 19-May-2023 |
Christoph Hellwig <hch@lst.de> |
blk-mq: don't use the requeue list to queue flush commands
Currently both requeues of commands that were already sent to the driver and flush commands submitted from the flush state machine share th
blk-mq: don't use the requeue list to queue flush commands
Currently both requeues of commands that were already sent to the driver and flush commands submitted from the flush state machine share the same requeue_list struct request_queue, despite requeues doing head insertions and flushes not. Switch to using two separate lists instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230519044050.107790-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
dd6216bb |
| 18-May-2023 |
Christoph Hellwig <hch@lst.de> |
blk-mq: make sure elevator callbacks aren't called for passthrough request
In case of q->elevator, passthrough request can still be marked as RQF_ELV, so some elevator callbacks will be called for t
blk-mq: make sure elevator callbacks aren't called for passthrough request
In case of q->elevator, passthrough request can still be marked as RQF_ELV, so some elevator callbacks will be called for them.
Fix this by splitting RQF_SCHED_TAGS, which is set for all requests that are issued on a queue that uses an I/O scheduler, and RQF_USE_SCHED for non-flush, non-passthrough requests on such a queue.
Roughly based on two different patches from Ming Lei <ming.lei@redhat.com>.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230518053101.760632-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
fdcab6cd |
| 18-May-2023 |
Christoph Hellwig <hch@lst.de> |
blk-mq: remove RQF_ELVPRIV
RQF_ELVPRIV is set for all non-flush requests that have RQF_ELV set. Expand this condition in the two users of the flag and remove it.
Signed-off-by: Christoph Hellwig <h
blk-mq: remove RQF_ELVPRIV
RQF_ELVPRIV is set for all non-flush requests that have RQF_ELV set. Expand this condition in the two users of the flag and remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230518053101.760632-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
d5fb8726 |
| 18-May-2023 |
Bart Van Assche <bvanassche@acm.org> |
block: Decode all flag names in the debugfs output
See also: * Commit 4d337cebcb1c ("blk-mq: avoid to touch q->elevator without any protection"). * Commit 414dd48e882c ("blk-mq: add tagset quiesce i
block: Decode all flag names in the debugfs output
See also: * Commit 4d337cebcb1c ("blk-mq: avoid to touch q->elevator without any protection"). * Commit 414dd48e882c ("blk-mq: add tagset quiesce interface").
Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230518222708.1190867-1-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
90110e04 |
| 13-Apr-2023 |
Christoph Hellwig <hch@lst.de> |
blk-mq: include <linux/blk-mq.h> in block/blk-mq.h
block/blk-mq.h needs various definitions from <linux/blk-mq.h>, include it there instead of relying on the source files to include both.
Signed-of
blk-mq: include <linux/blk-mq.h> in block/blk-mq.h
block/blk-mq.h needs various definitions from <linux/blk-mq.h>, include it there instead of relying on the source files to include both.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
bebe84eb |
| 13-Apr-2023 |
Christoph Hellwig <hch@lst.de> |
blk-mq: remove blk-mq-tag.h
blk-mq-tag.h is always included by blk-mq.h, and causes recursive inclusion hell with further changes. Just merge it into blk-mq.h instead.
Signed-off-by: Christoph Hel
blk-mq: remove blk-mq-tag.h
blk-mq-tag.h is always included by blk-mq.h, and causes recursive inclusion hell with further changes. Just merge it into blk-mq.h instead.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
54bdd67d |
| 20-Mar-2023 |
Keith Busch <kbusch@kernel.org> |
blk-mq: remove hybrid polling
io_uring provides the only way user space can poll completions, and that always sets BLK_POLL_NOSLEEP. This effectively makes hybrid polling dead code, so remove it and
blk-mq: remove hybrid polling
io_uring provides the only way user space can poll completions, and that always sets BLK_POLL_NOSLEEP. This effectively makes hybrid polling dead code, so remove it and everything supporting it.
Hybrid polling was effectively killed off with 9650b453a3d4b1, "block: ignore RWF_HIPRI hint for sync dio", but still potentially reachable through io_uring until d729cf9acb93119, "io_uring: don't sleep when polling for I/O", but hybrid polling probably should not have been reachable through that async interface from the beginning.
Fixes: 9650b453a3d4 ("block: ignore RWF_HIPRI hint for sync dio") Fixes: d729cf9acb93 ("io_uring: don't sleep when polling for I/O") Signed-off-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20230320194926.3353144-1-kbusch@meta.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
ba91c849 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos
This is what about half of the users already want, and it's only going to grow more.
Signed-off-by: Christoph Hellwig <hch@lst.
blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos
This is what about half of the users already want, and it's only going to grow more.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
9713a670 |
| 16-Sep-2022 |
Li Jinlin <lijinlin3@huawei.com> |
block/blk-rq-qos: delete useless enmu RQ_QOS_IOPRIO
Since blk-ioprio handing was converted from a rqos policy to a direct call, RQ_QOS_IOPRIO is not used anymore, just delete it.
Signed-off-by: Li
block/blk-rq-qos: delete useless enmu RQ_QOS_IOPRIO
Since blk-ioprio handing was converted from a rqos policy to a direct call, RQ_QOS_IOPRIO is not used anymore, just delete it.
Signed-off-by: Li Jinlin <lijinlin3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220916023241.32926-1-lijinlin3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
745ed372 |
| 08-Sep-2022 |
Jens Axboe <axboe@kernel.dk> |
block: add missing request flags to debugfs code
We're missing TIMED_OUT and RESV. Particularly the former is handy for debugging, let's get them added.
Reviewed-by: Bart Van Assche <bvanassche@acm
block: add missing request flags to debugfs code
We're missing TIMED_OUT and RESV. Particularly the former is handy for debugging, let's get them added.
Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
16458cf3 |
| 14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
block: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename the fu
block: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename the function arguments and also a structure member that hold a request operation and flags from 'rw' into 'opf'.
This patch does not change any functionality.
Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
77e7ffd7 |
| 14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
block: Use enum req_op where appropriate
Change the type of the arguments that are used to pass a REQ_OP_* value from int or unsigned int into enum req_op to improve static type checking.
Cc: Chris
block: Use enum req_op where appropriate
Change the type of the arguments that are used to pass a REQ_OP_* value from int or unsigned int into enum req_op to improve static type checking.
Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-3-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f3ec5d11 |
| 11-Jul-2022 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: don't create hctx debugfs dir until q->debugfs_dir is created
blk_mq_debugfs_register_hctx() can be called by blk_mq_update_nr_hw_queues when gendisk isn't added yet, such as nvme tcp.
Fixe
blk-mq: don't create hctx debugfs dir until q->debugfs_dir is created
blk_mq_debugfs_register_hctx() can be called by blk_mq_update_nr_hw_queues when gendisk isn't added yet, such as nvme tcp.
Fixes the warning of 'debugfs: Directory 'hctx0' with parent '/' already present!' which can be observed reliably when running blktests nvme/005.
Fixes: 6cfc0081b046 ("blk-mq: no need to check return value of debugfs_create functions") Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220711090808.259682-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
2dd6532e |
| 06-Jul-2022 |
John Garry <john.garry@huawei.com> |
blk-mq: Drop 'reserved' arg of busy_tag_iter_fn
We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter function so it may be dropped.
Signed-off-by: John Garry <john.garry@huawei.com>
blk-mq: Drop 'reserved' arg of busy_tag_iter_fn
We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter function so it may be dropped.
Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> #nvme Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
1f90307e |
| 19-Jun-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove QUEUE_FLAG_DEAD
Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag.
Sign
block: remove QUEUE_FLAG_DEAD
Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20220619060552.1850436-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
99d055b4 |
| 14-Jun-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove per-disk debugfs files in blk_unregister_queue
The block debugfs files are created in blk_register_queue, which is called by add_disk and use a naming scheme based on the disk_name. Af
block: remove per-disk debugfs files in blk_unregister_queue
The block debugfs files are created in blk_register_queue, which is called by add_disk and use a naming scheme based on the disk_name. After del_gendisk returns that name can be reused and thus we must not leave these debugfs files around, otherwise the kernel is unhappy and spews messages like:
Directory XXXXX with parent 'block' already present!
and the newly created devices will not have working debugfs files.
Move the unregistration to blk_unregister_queue instead (which matches the sysfs unregistration) to make sure the debugfs life time rules match those of the disk name.
As part of the move also make sure the whole debugfs unregistration is inside a single debugfs_mutex critical section.
Note that this breaks blktests block/002, which checks that the debugfs directory has not been removed while blktests is running, but that particular check should simply be removed from the test case.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
5cf9c91b |
| 14-Jun-2022 |
Christoph Hellwig <hch@lst.de> |
block: serialize all debugfs operations using q->debugfs_mutex
Various places like I/O schedulers or the QOS infrastructure try to register debugfs files on demans, which can race with creating and
block: serialize all debugfs operations using q->debugfs_mutex
Various places like I/O schedulers or the QOS infrastructure try to register debugfs files on demans, which can race with creating and removing the main queue debugfs directory. Use the existing debugfs_mutex to serialize all debugfs operations that rely on q->debugfs_dir or the directories hanging off it.
To make the teardown code a little simpler declare all debugfs dentry pointers and not just the main one uncoditionally in blkdev.h.
Move debugfs_mutex next to the dentries that it protects and document what it is used for.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220614074827.458955-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
44abff2c |
| 15-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD
Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper i
block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD
Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
70200574 |
| 15-Apr-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes.
The only places where needs special attention
block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes.
The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
4f481208 |
| 08-Mar-2022 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: prepare for implementing hctx table via xarray
It is inevitable to cause use-after-free on q->queue_hw_ctx between queue_for_each_hw_ctx() and blk_mq_update_nr_hw_queues(). And converting to
blk-mq: prepare for implementing hctx table via xarray
It is inevitable to cause use-after-free on q->queue_hw_ctx between queue_for_each_hw_ctx() and blk_mq_update_nr_hw_queues(). And converting to xarray can fix the uaf, meantime code gets cleaner.
Prepare for converting q->queue_hctx_ctx into xarray, one thing is that xa_for_each() can only accept 'unsigned long' as index, so changes type of hctx index of queue_for_each_hw_ctx() into 'unsigned long'.
Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20220308073219.91173-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
c75e707f |
| 04-Mar-2022 |
Christoph Hellwig <hch@lst.de> |
block: remove the per-bio/request write hint
With the NVMe support for this gone, there are no consumers of these hints left, so remove them.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: htt
block: remove the per-bio/request write hint
With the NVMe support for this gone, there are no consumers of these hints left, so remove them.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
18d78171 |
| 02-Dec-2021 |
Ming Lei <ming.lei@redhat.com> |
blk-mq: check q->poll_stat in queue_poll_stat_show
Without checking q->poll_stat in queue_poll_stat_show(), kernel panic may be caused if q->poll_stat isn't allocated.
Fixes: 48b5c1fbcd8c ("block:
blk-mq: check q->poll_stat in queue_poll_stat_show
Without checking q->poll_stat in queue_poll_stat_show(), kernel panic may be caused if q->poll_stat isn't allocated.
Fixes: 48b5c1fbcd8c ("block: only allocate poll_stats if there's a user of them") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211202090716.3292244-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|