#
08420cf7 |
| 15-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
block: add blk_time_get_ns() and blk_time_get() helpers
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the curr
block: add blk_time_get_ns() and blk_time_get() helpers
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the current time in nanoseconds or as ktime.
No functional changes intended, this patch just wraps ktime_get_ns() and ktime_get() with a block helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
faffaab2 |
| 13-Apr-2023 |
Tejun Heo <tj@kernel.org> |
blkcg: Restructure blkg_conf_prep() and friends
We want to support lazy init of rq-qos policies so that iolatency is enabled lazily on configuration instead of gendisk initialization. The way blkg c
blkcg: Restructure blkg_conf_prep() and friends
We want to support lazy init of rq-qos policies so that iolatency is enabled lazily on configuration instead of gendisk initialization. The way blkg config helpers are structured now is a bit awkward for that. Let's restructure:
* blkcg_conf_open_bdev() is renamed to blkg_conf_open_bdev(). The blkcg_ prefix was used because the bdev opening step is blkg-independent. However, the distinction is too subtle and confuses more than helps. Let's switch to blkg prefix so that it's consistent with the type and other helper names.
* struct blkg_conf_ctx now remembers the original input string and is always initialized by the new blkg_conf_init().
* blkg_conf_open_bdev() is updated to take a pointer to blkg_conf_ctx like blkg_conf_prep() and can be called multiple times safely. Instead of modifying the double pointer to input string directly, blkg_conf_open_bdev() now sets blkg_conf_ctx->body.
* blkg_conf_finish() is renamed to blkg_conf_exit() for symmetry and now must be called on all blkg_conf_ctx's which were initialized with blkg_conf_init().
Combined, this allows the users to either open the bdev first or do it altogether with blkg_conf_prep() which will help implementing lazy init of rq-qos policies.
blkg_conf_init/exit() will also be used implement synchronization against device removal. This is necessary because iolat / iocost are configured through cgroupfs instead of one of the files under /sys/block/DEVICE. As cgroupfs operations aren't synchronized with block layer, the lazy init and other configuration operations may race against device removal. This patch makes blkg_conf_init/exit() used consistently for all cgroup-orginating configurations making them a good place to implement explicit synchronization.
Users are updated accordingly. No behavior change is intended by this patch.
v2: bfq wasn't updated in v1 causing a build error. Fixed.
v3: Update the description to include future use of blkg_conf_init/exit() as synchronization points.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yu Kuai <yukuai1@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230413000649.115785-3-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
650e2cb5 |
| 06-Apr-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
blk-cgroup: delete cpd_init_fn of blkcg_policy
blkcg_policy cpd_init_fn() is used to just initialize some default fields of policy data, which is enough to do in cpd_alloc_fn().
This patch delete t
blk-cgroup: delete cpd_init_fn of blkcg_policy
blkcg_policy cpd_init_fn() is used to just initialize some default fields of policy data, which is enough to do in cpd_alloc_fn().
This patch delete the only user bfq_cpd_init(), and remove cpd_init_fn from blkcg_policy.
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230406145050.49914-4-zhouchengming@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
e9f2f3f5 |
| 06-Apr-2023 |
Chengming Zhou <zhouchengming@bytedance.com> |
block, bfq: remove BFQ_WEIGHT_LEGACY_DFL
BFQ_WEIGHT_LEGACY_DFL is the same as CGROUP_WEIGHT_DFL, which means we don't need cpd_bind_fn() callback to update default weight when attached to a hierarch
block, bfq: remove BFQ_WEIGHT_LEGACY_DFL
BFQ_WEIGHT_LEGACY_DFL is the same as CGROUP_WEIGHT_DFL, which means we don't need cpd_bind_fn() callback to update default weight when attached to a hierarchy.
This patch remove BFQ_WEIGHT_LEGACY_DFL and cpd_bind_fn().
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230406145050.49914-2-zhouchengming@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a06377c5 |
| 14-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/2
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
1231039d |
| 14-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
Revert "blk-cgroup: move the cgroup information to struct gendisk"
This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch it depends on caused a few problems.
Signed-off-by: Christ
Revert "blk-cgroup: move the cgroup information to struct gendisk"
This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch it depends on caused a few problems.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f37bf75c |
| 02-Feb-2023 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: cleanup 'bfqg->online'
After commit dfd6200a0954 ("blk-cgroup: support to track if policy is online"), there is no need to do this again in bfq.
However, 'pd->online' is not protected b
block, bfq: cleanup 'bfqg->online'
After commit dfd6200a0954 ("blk-cgroup: support to track if policy is online"), there is no need to do this again in bfq.
However, 'pd->online' is not protected by 'bfqd->lock', in order to make sure bfq won't see that 'pd->online' is still set after bfq_pd_offline(), clear it before bfq_pd_offline() is called. This is fine because other polices doesn't use 'pd->online' and bfq_pd_offline() will move active bfqq to root cgroup anyway.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230202134913.2364549-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3f13ab7c |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: move the cgroup information to struct gendisk
cgroup information only makes sense on a live gendisk that allows file system I/O (which includes the raw block device). So move over the c
blk-cgroup: move the cgroup information to struct gendisk
cgroup information only makes sense on a live gendisk that allows file system I/O (which includes the raw block device). So move over the cgroup related members.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0a0b4f79 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: pass a gendisk to pd_alloc_fn
No need to the request_queue here, pass a gendisk and extract the node ids from that.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas He
blk-cgroup: pass a gendisk to pd_alloc_fn
No need to the request_queue here, pass a gendisk and extract the node ids from that.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
40e4996e |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: pass a gendisk to blkcg_{de,}activate_policy
Prepare for storing the blkcg information in the gendisk instead of the request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewe
blk-cgroup: pass a gendisk to blkcg_{de,}activate_policy
Prepare for storing the blkcg information in the gendisk instead of the request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
84d7d462 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: pin the gendisk in struct blkcg_gq
Currently each blkcg_gq holds a request_queue reference, which is what is used in the policies. But a lot of these interfaces will move over to use a
blk-cgroup: pin the gendisk in struct blkcg_gq
Currently each blkcg_gq holds a request_queue reference, which is what is used in the policies. But a lot of these interfaces will move over to use a gendisk, so store a disk in struct blkcg_gq and hold a reference to it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b600de2d |
| 30-Jan-2023 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: fix uaf for bfqq in bic_set_bfqq()
After commit 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'"), bic->bfqq will be accessed in bic_set_bfqq(), however, in some context bic-
block, bfq: fix uaf for bfqq in bic_set_bfqq()
After commit 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'"), bic->bfqq will be accessed in bic_set_bfqq(), however, in some context bic->bfqq will be freed, and bic_set_bfqq() is called with the freed bic->bfqq.
Fix the problem by always freeing bfqq after bic_set_bfqq().
Fixes: 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") Reported-and-tested-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230130014136.591038-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
2d31c684 |
| 03-Jan-2023 |
Davide Zini <davidezini2@gmail.com> |
block, bfq: inject I/O to underutilized actuators
The main service scheme of BFQ for sync I/O is serving one sync bfq_queue at a time, for a while. In particular, BFQ enforces this scheme when it de
block, bfq: inject I/O to underutilized actuators
The main service scheme of BFQ for sync I/O is serving one sync bfq_queue at a time, for a while. In particular, BFQ enforces this scheme when it deems the latter necessary to boost throughput or to preserve service guarantees. Unfortunately, when BFQ enforces this policy, only one actuator at a time gets served for a while, because each bfq_queue contains I/O only for one actuator. The other actuators may remain underutilized.
Actually, BFQ may serve (inject) extra I/O, taken from other bfq_queues, in parallel with that of the in-service queue. This injection mechanism may provide the ground for dealing also with the above actuator-underutilization problem. Yet BFQ does not take the actuator load into account when choosing which queue to pick extra I/O from. In addition, BFQ may happen to inject extra I/O only when the in-service queue is temporarily empty.
In view of these facts, this commit extends the injection mechanism in such a way that the latter: (1) takes into account also the actuator load; (2) checks such a load on each dispatch, and injects I/O for an underutilized actuator, if there is one and there is I/O for it.
To perform the check in (2), this commit introduces a load threshold, currently set to 4. A linear scan of each actuator is performed, until an actuator is found for which the following two conditions hold: the load of the actuator is below the threshold, and there is at least one non-in-service queue that contains I/O for that actuator. If such a pair (actuator, queue) is found, then the head request of that queue is returned for dispatch, instead of the head request of the in-service queue.
We have set the threshold, empirically, to the minimum possible value for which an actuator is fully utilized, or close to be fully utilized. By doing so, injected I/O 'steals' as few drive-queue slots as possibile to the in-service queue. This reduces as much as possible the probability that the service of I/O from the in-service bfq_queue gets delayed because of slot exhaustion, i.e., because all the slots of the drive queue are filled with I/O injected from other queues (NCQ provides for 32 slots).
This new mechanism also counters actuator underutilization in the case of asymmetric configurations of bfq_queues. Namely if there are few bfq_queues containing I/O for some actuators and many bfq_queues containing I/O for other actuators. Or if the bfq_queues containing I/O for some actuators have lower weights than the other bfq_queues.
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Davide Zini <davidezini2@gmail.com> Link: https://lore.kernel.org/r/20230103145503.71712-8-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
9778369a |
| 03-Jan-2023 |
Paolo Valente <paolo.valente@linaro.org> |
block, bfq: split sync bfq_queues on a per-actuator basis
Single-LUN multi-actuator SCSI drives, as well as all multi-actuator SATA drives appear as a single device to the I/O subsystem [1]. Yet th
block, bfq: split sync bfq_queues on a per-actuator basis
Single-LUN multi-actuator SCSI drives, as well as all multi-actuator SATA drives appear as a single device to the I/O subsystem [1]. Yet they address commands to different actuators internally, as a function of Logical Block Addressing (LBAs). A given sector is reachable by only one of the actuators. For example, Seagate’s Serial Advanced Technology Attachment (SATA) version contains two actuators and maps the lower half of the SATA LBA space to the lower actuator and the upper half to the upper actuator.
Evidently, to fully utilize actuators, no actuator must be left idle or underutilized while there is pending I/O for it. The block layer must somehow control the load of each actuator individually. This commit lays the ground for allowing BFQ to provide such a per-actuator control.
BFQ associates an I/O-request sync bfq_queue with each process doing synchronous I/O, or with a group of processes, in case of queue merging. Then BFQ serves one bfq_queue at a time. While in service, a bfq_queue is emptied in request-position order. Yet the same process, or group of processes, may generate I/O for different actuators. In this case, different streams of I/O (each for a different actuator) get all inserted into the same sync bfq_queue. So there is basically no individual control on when each stream is served, i.e., on when the I/O requests of the stream are picked from the bfq_queue and dispatched to the drive.
This commit enables BFQ to control the service of each actuator individually for synchronous I/O, by simply splitting each sync bfq_queue into N queues, one for each actuator. In other words, a sync bfq_queue is now associated to a pair (process, actuator). As a consequence of this split, the per-queue proportional-share policy implemented by BFQ will guarantee that the sync I/O generated for each actuator, by each process, receives its fair share of service.
This is just a preparatory patch. If the I/O of the same process happens to be sent to different queues, then each of these queues may undergo queue merging. To handle this event, the bfq_io_cq data structure must be properly extended. In addition, stable merging must be disabled to avoid loss of control on individual actuators. Finally, also async queues must be split. These issues are described in detail and addressed in next commits. As for this commit, although multiple per-process bfq_queues are provided, the I/O of each process or group of processes is still sent to only one queue, regardless of the actuator the I/O is for. The forwarding to distinct bfq_queues will be enabled after addressing the above issues.
[1] https://www.linaro.org/blog/budget-fair-queueing-bfq-linux-io-scheduler-optimizations-for-multi-actuator-sata-hard-drives/
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Gabriele Felici <felicigb@gmail.com> Signed-off-by: Carmine Zaccagnino <carmine@carminezacc.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20230103145503.71712-2-paolo.valente@linaro.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
216f7647 |
| 03-Jan-2023 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: switch 'bfqg->ref' to use atomic refcount apis
The updating of 'bfqg->ref' should be protected by 'bfqd->lock', however, during code review, we found that bfq_pd_free() update 'bfqg->ref
block, bfq: switch 'bfqg->ref' to use atomic refcount apis
The updating of 'bfqg->ref' should be protected by 'bfqd->lock', however, during code review, we found that bfq_pd_free() update 'bfqg->ref' without holding the lock, which is problematic:
1) bfq_pd_free() triggered by removing cgroup is called asynchronously; 2) bfqq will grab bfqg reference, and exit bfqq will drop the reference, which can concurrent with 1).
Unfortunately, 'bfqd->lock' can't be held here because 'bfqd' might already be freed in bfq_pd_free(). Fix the problem by using atomic refcount apis.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230103084755.1256479-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
337366e0 |
| 14-Dec-2022 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: replace 0/1 with false/true in bic apis
Just to make the code a litter cleaner, there are no functional changes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@
block, bfq: replace 0/1 with false/true in bic apis
Just to make the code a litter cleaner, there are no functional changes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221214033155.3455754-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
452af7dc |
| 14-Dec-2022 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: don't return bfqg from __bfq_bic_change_cgroup()
The return value is not used, hence remove it.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: ht
block, bfq: don't return bfqg from __bfq_bic_change_cgroup()
The return value is not used, hence remove it.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221214033155.3455754-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
f02be900 |
| 08-Nov-2022 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: fix null pointer dereference in bfq_bio_bfqg()
Out test found a following problem in kernel 5.10, and the same problem should exist in mainline:
BUG: kernel NULL pointer dereference, ad
block, bfq: fix null pointer dereference in bfq_bio_bfqg()
Out test found a following problem in kernel 5.10, and the same problem should exist in mainline:
BUG: kernel NULL pointer dereference, address: 0000000000000094 PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 7 PID: 155 Comm: kworker/7:1 Not tainted 5.10.0-01932-g19e0ace2ca1d-dirty 4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-b4 Workqueue: kthrotld blk_throtl_dispatch_work_fn RIP: 0010:bfq_bio_bfqg+0x52/0xc0 Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002 RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000 RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18 RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10 R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000 R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: bfq_bic_update_cgroup+0x3c/0x350 ? ioc_create_icq+0x42/0x270 bfq_init_rq+0xfd/0x1060 bfq_insert_requests+0x20f/0x1cc0 ? ioc_create_icq+0x122/0x270 blk_mq_sched_insert_requests+0x86/0x1d0 blk_mq_flush_plug_list+0x193/0x2a0 blk_flush_plug_list+0x127/0x170 blk_finish_plug+0x31/0x50 blk_throtl_dispatch_work_fn+0x151/0x190 process_one_work+0x27c/0x5f0 worker_thread+0x28b/0x6b0 ? rescuer_thread+0x590/0x590 kthread+0x153/0x1b0 ? kthread_flush_work+0x170/0x170 ret_from_fork+0x1f/0x30 Modules linked in: CR2: 0000000000000094 ---[ end trace e2e59ac014314547 ]--- RIP: 0010:bfq_bio_bfqg+0x52/0xc0 Code: 94 00 00 00 00 75 2e 48 8b 40 30 48 83 05 35 06 c8 0b 01 48 85 c0 74 3d 4b RSP: 0018:ffffc90001a1fba0 EFLAGS: 00010002 RAX: ffff888100d60400 RBX: ffff8881132e7000 RCX: 0000000000000000 RDX: 0000000000000017 RSI: ffff888103580a18 RDI: ffff888103580a18 RBP: ffff8881132e7000 R08: 0000000000000000 R09: ffffc90001a1fe10 R10: 0000000000000a20 R11: 0000000000034320 R12: 0000000000000000 R13: ffff888103580a18 R14: ffff888114447000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88881fdc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000094 CR3: 0000000100cdb000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Root cause is quite complex:
1) use bfq elevator for the test device. 2) create a cgroup CG 3) config blk throtl in CG
blkg_conf_prep blkg_create
4) create a thread T1 and issue async io in CG:
bio_init bio_associate_blkg ... submit_bio submit_bio_noacct blk_throtl_bio -> io is throttled // io submit is done
5) switch elevator:
bfq_exit_queue blkcg_deactivate_policy list_for_each_entry(blkg, &q->blkg_list, q_node) blkg->pd[] = NULL // bfq policy is removed
5) thread t1 exist, then remove the cgroup CG:
blkcg_unpin_online blkcg_destroy_blkgs blkg_destroy list_del_init(&blkg->q_node) // blkg is removed from queue list
6) switch elevator back to bfq
bfq_init_queue bfq_create_group_hierarchy blkcg_activate_policy list_for_each_entry_reverse(blkg, &q->blkg_list) // blkg is removed from list, hence bfq policy is still NULL
7) throttled io is dispatched to bfq:
bfq_insert_requests bfq_init_rq bfq_bic_update_cgroup bfq_bio_bfqg bfqg = blkg_to_bfqg(blkg) // bfqg is NULL because bfq policy is NULL
The problem is only possible in bfq because only bfq can be deactivated and activated while queue is online, while others can only be deactivated while the device is removed.
Fix the problem in bfq by checking if blkg is online before calling blkg_to_bfqg().
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221108103434.2853269-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
aa625117 |
| 02-Nov-2022 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: don't declare 'bfqd' as type 'void *' in bfq_group
Prevent unnecessary format conversion for bfqg->bfqd in multiple places.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan
block, bfq: don't declare 'bfqd' as type 'void *' in bfq_group
Prevent unnecessary format conversion for bfqg->bfqd in multiple places.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Paolo Valente <paolo.valente@unimore.it> Link: https://lore.kernel.org/r/20221102022542.3621219-6-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
60a6e10c |
| 16-Sep-2022 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: record how many queues have pending requests
Prepare to refactor the counting of 'num_groups_with_pending_reqs'.
Add a counter in bfq_group, update it while tracking if bfqq have pendin
block, bfq: record how many queues have pending requests
Prepare to refactor the counting of 'num_groups_with_pending_reqs'.
Add a counter in bfq_group, update it while tracking if bfqq have pending requests and when bfq_bfqq_move() is called.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Paolo Valente <paolo.valente@linaro.org> Link: https://lore.kernel.org/r/20220916071942.214222-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
c2090eab |
| 16-Aug-2022 |
Yu Kuai <yukuai3@huawei.com> |
block, bfq: remove unused functions
While doing code coverage testing(CONFIG_BFQ_CGROUP_DEBUG is disabled), we found that some functions doesn't have caller, thus remove them.
Signed-off-by: Yu Kua
block, bfq: remove unused functions
While doing code coverage testing(CONFIG_BFQ_CGROUP_DEBUG is disabled), we found that some functions doesn't have caller, thus remove them.
Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20220816015631.1323948-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
dc469ba2 |
| 14-Jul-2022 |
Bart Van Assche <bvanassche@acm.org> |
block/bfq: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename th
block/bfq: Use the new blk_opf_t type
Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename those variables from 'op' into 'opf'.
This patch does not change any functionality.
Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-8-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
1d87be82 |
| 17-Jun-2022 |
Bart Van Assche <bvanassche@acm.org> |
block: bfq: Fix kernel-doc headers
Fix the following warnings:
block/bfq-cgroup.c:721: warning: Function parameter or member 'bfqg' not described in '__bfq_bic_change_cgroup' block/bfq-cgroup.c:721
block: bfq: Fix kernel-doc headers
Fix the following warnings:
block/bfq-cgroup.c:721: warning: Function parameter or member 'bfqg' not described in '__bfq_bic_change_cgroup' block/bfq-cgroup.c:721: warning: Excess function parameter 'blkcg' description in '__bfq_bic_change_cgroup' block/bfq-cgroup.c:870: warning: Function parameter or member 'ioprio_class' not described in 'bfq_reparent_leaf_entity' block/bfq-cgroup.c:900: warning: Function parameter or member 'ioprio_class' not described in 'bfq_reparent_active_queues'
Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220617210859.106623-1-bvanassche@acm.org
show more ...
|
#
c28c49b0 |
| 17-Jun-2022 |
Bart Van Assche <bvanassche@acm.org> |
block: bfq: Remove an unused function definition
This patch is the result of the analysis of a sparse report.
Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-
block: bfq: Remove an unused function definition
This patch is the result of the analysis of a sparse report.
Cc: Jan Kara <jack@suse.cz> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220617204433.102022-1-bvanassche@acm.org
show more ...
|
#
075a53b7 |
| 01-Apr-2022 |
Jan Kara <jack@suse.cz> |
bfq: Make sure bfqg for which we are queueing requests is online
Bios queued into BFQ IO scheduler can be associated with a cgroup that was already offlined. This may then cause insertion of this bf
bfq: Make sure bfqg for which we are queueing requests is online
Bios queued into BFQ IO scheduler can be associated with a cgroup that was already offlined. This may then cause insertion of this bfq_group into a service tree. But this bfq_group will get freed as soon as last bio associated with it is completed leading to use after free issues for service tree users. Fix the problem by making sure we always operate on online bfq_group. If the bfq_group associated with the bio is not online, we pick the first online parent.
CC: stable@vger.kernel.org Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support") Tested-by: "yukuai (C)" <yukuai3@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|