#
08420cf7 |
| 15-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
block: add blk_time_get_ns() and blk_time_get() helpers
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the curr
block: add blk_time_get_ns() and blk_time_get() helpers
Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the current time in nanoseconds or as ktime.
No functional changes intended, this patch just wraps ktime_get_ns() and ktime_get() with a block helper.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
18267a03 |
| 10-Aug-2023 |
Jens Axboe <axboe@kernel.dk> |
block: fix bad lockdep annotation in blk-iolatency
A previous commit added a lockdep annotation, but botched it. Use the right type.
Fixes: 4eb44d10766a ("block: remove init_mutex and open-code blk
block: fix bad lockdep annotation in blk-iolatency
A previous commit added a lockdep annotation, but botched it. Use the right type.
Fixes: 4eb44d10766a ("block: remove init_mutex and open-code blk_iolatency_try_init") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
4eb44d10 |
| 10-Aug-2023 |
Li Lingfeng <lilingfeng3@huawei.com> |
block: remove init_mutex and open-code blk_iolatency_try_init
Commit a13696b83da4 ("blk-iolatency: Make initialization lazy") adds a mutex named "init_mutex" in blk_iolatency_try_init for the race c
block: remove init_mutex and open-code blk_iolatency_try_init
Commit a13696b83da4 ("blk-iolatency: Make initialization lazy") adds a mutex named "init_mutex" in blk_iolatency_try_init for the race condition of initializing RQ_QOS_LATENCY. Now a new lock has been add to struct request_queue by commit a13bd91be223 ("block/rq_qos: protect rq_qos apis with a new lock"). And it has been held in blkg_conf_open_bdev before calling blk_iolatency_init. So it's not necessary to keep init_mutex in blk_iolatency_try_init, just remove it.
Since init_mutex has been removed, blk_iolatency_try_init can be open-coded back to iolatency_set_limit() like ioc_qos_write().
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Michal Koutný <mkoutny@suse.com> Link: https://lore.kernel.org/r/20230810035111.2236335-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a13696b8 |
| 13-Apr-2023 |
Tejun Heo <tj@kernel.org> |
blk-iolatency: Make initialization lazy
Other rq_qos policies such as wbt and iocost are lazy-initialized when they are configured for the first time for the device but iolatency is initialized unco
blk-iolatency: Make initialization lazy
Other rq_qos policies such as wbt and iocost are lazy-initialized when they are configured for the first time for the device but iolatency is initialized unconditionally from blkcg_init_disk() during gendisk init. Lazy init is beneficial because rq_qos policies add runtime overhead when initialized as every IO has to walk all registered rq_qos callbacks.
This patch switches iolatency to lazy initialization too so that it only registered its rq_qos policy when it is first configured.
Note that there is a known race condition between blkcg config file writes and del_gendisk() and this patch makes iolatency susceptible to it by exposing the init path to race against the deletion path. However, that problem already exists in iocost and is being worked on.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230413000649.115785-5-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
33049187 |
| 13-Apr-2023 |
Tejun Heo <tj@kernel.org> |
blk-iolatency: s/blkcg_rq_qos/iolat_rq_qos/
The name was too generic given that there are multiple blkcg rq-qos policies.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hc
blk-iolatency: s/blkcg_rq_qos/iolat_rq_qos/
The name was too generic given that there are multiple blkcg rq-qos policies.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20230413000649.115785-4-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
faffaab2 |
| 13-Apr-2023 |
Tejun Heo <tj@kernel.org> |
blkcg: Restructure blkg_conf_prep() and friends
We want to support lazy init of rq-qos policies so that iolatency is enabled lazily on configuration instead of gendisk initialization. The way blkg c
blkcg: Restructure blkg_conf_prep() and friends
We want to support lazy init of rq-qos policies so that iolatency is enabled lazily on configuration instead of gendisk initialization. The way blkg config helpers are structured now is a bit awkward for that. Let's restructure:
* blkcg_conf_open_bdev() is renamed to blkg_conf_open_bdev(). The blkcg_ prefix was used because the bdev opening step is blkg-independent. However, the distinction is too subtle and confuses more than helps. Let's switch to blkg prefix so that it's consistent with the type and other helper names.
* struct blkg_conf_ctx now remembers the original input string and is always initialized by the new blkg_conf_init().
* blkg_conf_open_bdev() is updated to take a pointer to blkg_conf_ctx like blkg_conf_prep() and can be called multiple times safely. Instead of modifying the double pointer to input string directly, blkg_conf_open_bdev() now sets blkg_conf_ctx->body.
* blkg_conf_finish() is renamed to blkg_conf_exit() for symmetry and now must be called on all blkg_conf_ctx's which were initialized with blkg_conf_init().
Combined, this allows the users to either open the bdev first or do it altogether with blkg_conf_prep() which will help implementing lazy init of rq-qos policies.
blkg_conf_init/exit() will also be used implement synchronization against device removal. This is necessary because iolat / iocost are configured through cgroupfs instead of one of the files under /sys/block/DEVICE. As cgroupfs operations aren't synchronized with block layer, the lazy init and other configuration operations may race against device removal. This patch makes blkg_conf_init/exit() used consistently for all cgroup-orginating configurations making them a good place to implement explicit synchronization.
Users are updated accordingly. No behavior change is intended by this patch.
v2: bfq wasn't updated in v1 causing a build error. Fixed.
v3: Update the description to include future use of blkg_conf_init/exit() as synchronization points.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Yu Kuai <yukuai1@huaweicloud.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230413000649.115785-3-tj@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
a06377c5 |
| 14-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/2
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
1231039d |
| 14-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
Revert "blk-cgroup: move the cgroup information to struct gendisk"
This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch it depends on caused a few problems.
Signed-off-by: Christ
Revert "blk-cgroup: move the cgroup information to struct gendisk"
This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch it depends on caused a few problems.
Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3f13ab7c |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: move the cgroup information to struct gendisk
cgroup information only makes sense on a live gendisk that allows file system I/O (which includes the raw block device). So move over the c
blk-cgroup: move the cgroup information to struct gendisk
cgroup information only makes sense on a live gendisk that allows file system I/O (which includes the raw block device). So move over the cgroup related members.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-20-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0a0b4f79 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: pass a gendisk to pd_alloc_fn
No need to the request_queue here, pass a gendisk and extract the node ids from that.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas He
blk-cgroup: pass a gendisk to pd_alloc_fn
No need to the request_queue here, pass a gendisk and extract the node ids from that.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-18-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
40e4996e |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: pass a gendisk to blkcg_{de,}activate_policy
Prepare for storing the blkcg information in the gendisk instead of the request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewe
blk-cgroup: pass a gendisk to blkcg_{de,}activate_policy
Prepare for storing the blkcg information in the gendisk instead of the request_queue.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
ba91c849 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos
This is what about half of the users already want, and it's only going to grow more.
Signed-off-by: Christoph Hellwig <hch@lst.
blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos
This is what about half of the users already want, and it's only going to grow more.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3963d84d |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-rq-qos: constify rq_qos_ops
These op vectors are constant, so mark them const.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun He
blk-rq-qos: constify rq_qos_ops
These op vectors are constant, so mark them const.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
ce57b558 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-rq-qos: make rq_qos_add and rq_qos_del more useful
Switch to passing a gendisk, and make rq_qos_add initialize all required fields and drop the not required q argument from rq_qos_del.
Signed-o
blk-rq-qos: make rq_qos_add and rq_qos_del more useful
Switch to passing a gendisk, and make rq_qos_add initialize all required fields and drop the not required q argument from rq_qos_del.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
84d7d462 |
| 03-Feb-2023 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: pin the gendisk in struct blkcg_gq
Currently each blkcg_gq holds a request_queue reference, which is what is used in the policies. But a lot of these interfaces will move over to use a
blk-cgroup: pin the gendisk in struct blkcg_gq
Currently each blkcg_gq holds a request_queue reference, which is what is used in the policies. But a lot of these interfaces will move over to use a gendisk, so store a disk in struct blkcg_gq and hold a reference to it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230203150400.3199230-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
292a089d |
| 20-Dec-2022 |
Steven Rostedt (Google) <rostedt@goodmis.org> |
treewide: Convert del_timer*() to timer_shutdown*()
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called
treewide: Convert del_timer*() to timer_shutdown*()
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed.
The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case.
This was created by using a coccinelle script and the following commands:
$ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); | - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); | kmem_cache_free(slab, ptr); | kfree(ptr); )
$ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch
Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
dc572f41 |
| 18-Oct-2022 |
Kemeng Shi <shikemeng@huawei.com> |
block: Replace struct rq_depth with unsigned int in struct iolatency_grp
We only need a max queue depth for every iolatency to limit the inflight io number. Replace struct rq_depth with unsigned int
block: Replace struct rq_depth with unsigned int in struct iolatency_grp
We only need a max queue depth for every iolatency to limit the inflight io number. Replace struct rq_depth with unsigned int to simplfy "struct iolatency_grp" and save memory.
Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20221018111240.22612-4-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
6891f968 |
| 18-Oct-2022 |
Kemeng Shi <shikemeng@huawei.com> |
block: Correct comment for scale_cookie_change
Default queue depth of iolatency_grp is unlimited, so we scale down quickly(once by half) in scale_cookie_change. Remove the "subtract 1/16th" part whi
block: Correct comment for scale_cookie_change
Default queue depth of iolatency_grp is unlimited, so we scale down quickly(once by half) in scale_cookie_change. Remove the "subtract 1/16th" part which is not the truth and add the actual way we scale down.
Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Link: https://lore.kernel.org/r/20221018111240.22612-3-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
db5896e9 |
| 18-Oct-2022 |
Kemeng Shi <shikemeng@huawei.com> |
block: Remove redundant parent blkcg_gp check in check_scale_change
Function blkcg_iolatency_throttle will make sure blkg->parent is not NULL before calls check_scale_change. And function check_scal
block: Remove redundant parent blkcg_gp check in check_scale_change
Function blkcg_iolatency_throttle will make sure blkg->parent is not NULL before calls check_scale_change. And function check_scale_change is only called in blkcg_iolatency_throttle.
Signed-off-by: Kemeng Shi <shikemeng@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20221018111240.22612-2-shikemeng@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
de185b56 |
| 21-Sep-2022 |
Christoph Hellwig <hch@lst.de> |
blk-cgroup: pass a gendisk to blkcg_schedule_throttle
Pass the gendisk to blkcg_schedule_throttle as part of moving the blk-cgroup infrastructure to be gendisk based. Remove the unused !BLK_CGROUP
blk-cgroup: pass a gendisk to blkcg_schedule_throttle
Pass the gendisk to blkcg_schedule_throttle as part of moving the blk-cgroup infrastructure to be gendisk based. Remove the unused !BLK_CGROUP stub while we're at it.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921180501.1539876-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
16fac1b5 |
| 21-Sep-2022 |
Christoph Hellwig <hch@lst.de> |
blk-iolatency: pass a gendisk to blk_iolatency_init
Pass the gendisk to blk_iolatency_init as part of moving the blk-cgroup infrastructure to be gendisk based.
Signed-off-by: Christoph Hellwig <hch
blk-iolatency: pass a gendisk to blk_iolatency_init
Pass the gendisk to blk_iolatency_init as part of moving the blk-cgroup infrastructure to be gendisk based.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Herrmann <aherrmann@suse.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220921180501.1539876-9-hch@lst.de [axboe: missed inline for blk_iolatency_init() and !CONFIG_BLK_CGROUP_IOLATENCY] Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
14a6e2eb |
| 20-Jul-2022 |
Jinke Han <hanjinke.666@bytedance.com> |
block: don't allow the same type rq_qos add more than once
In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn.
The reason can be described as fol
block: don't allow the same type rq_qos add more than once
In our test of iocost, we encountered some list add/del corruptions of inner_walk list in ioc_timer_fn.
The reason can be described as follows:
cpu 0 cpu 1 ioc_qos_write ioc_qos_write
ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ioc = q_to_ioc(queue); if (!ioc) { ioc = kzalloc(); ... rq_qos_add(q, rqos); } ... rq_qos_add(q, rqos); ... }
When the io.cost.qos file is written by two cpus concurrently, rq_qos may be added to one disk twice. In that case, there will be two iocs enabled and running on one disk. They own different iocgs on their active list. In the ioc_timer_fn function, because of the iocgs from two iocs have the same root iocg, the root iocg's walk_list may be overwritten by each other and this leads to list add/del corruptions in building or destroying the inner_walk list.
And so far, the blk-rq-qos framework works in case that one instance for one type rq_qos per queue by default. This patch make this explicit and also fix the crash above.
Signed-off-by: Jinke Han <hanjinke.666@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
aee8960c |
| 12-Jul-2022 |
Uros Bizjak <ubizjak@gmail.com> |
blk-iolatency: Use atomic{,64}_try_cmpxchg
Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old in check_scale_change and atomic64_try_cmpxchg in blkcg_iolatency_done_bio. x86 CM
blk-iolatency: Use atomic{,64}_try_cmpxchg
Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old in check_scale_change and atomic64_try_cmpxchg in blkcg_iolatency_done_bio. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg).
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20220712151947.6783-1-ubizjak@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8a177a36 |
| 14-May-2022 |
Tejun Heo <tj@kernel.org> |
blk-iolatency: Fix inflight count imbalances and IO hangs on offline
iolatency needs to track the number of inflight IOs per cgroup. As this tracking can be expensive, it is disabled when no cgroup
blk-iolatency: Fix inflight count imbalances and IO hangs on offline
iolatency needs to track the number of inflight IOs per cgroup. As this tracking can be expensive, it is disabled when no cgroup has iolatency configured for the device. To ensure that the inflight counters stay balanced, iolatency_set_limit() freezes the request_queue while manipulating the enabled counter, which ensures that no IO is in flight and thus all counters are zero.
Unfortunately, iolatency_set_limit() isn't the only place where the enabled counter is manipulated. iolatency_pd_offline() can also dec the counter and trigger disabling. As this disabling happens without freezing the q, this can easily happen while some IOs are in flight and thus leak the counts.
This can be easily demonstrated by turning on iolatency on an one empty cgroup while IOs are in flight in other cgroups and then removing the cgroup. Note that iolatency shouldn't have been enabled elsewhere in the system to ensure that removing the cgroup disables iolatency for the whole device.
The following keeps flipping on and off iolatency on sda:
echo +io > /sys/fs/cgroup/cgroup.subtree_control while true; do mkdir -p /sys/fs/cgroup/test echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency sleep 1 rmdir /sys/fs/cgroup/test sleep 1 done
and there's concurrent fio generating direct rand reads:
fio --name test --filename=/dev/sda --direct=1 --rw=randread \ --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k
while monitoring with the following drgn script:
while True: for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()): for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list): blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node') pd = blkg.pd[prog['blkcg_policy_iolatency'].plid] if pd.value_() == 0: continue iolat = container_of(pd, 'struct iolatency_grp', 'pd') inflight = iolat.rq_wait.inflight.counter.value_() if inflight: print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} ' f'{cgroup_path(css.cgroup).decode("utf-8")}') time.sleep(1)
The monitoring output looks like the following:
inflight=1 sda /user.slice inflight=1 sda /user.slice ... inflight=14 sda /user.slice inflight=13 sda /user.slice inflight=17 sda /user.slice inflight=15 sda /user.slice inflight=18 sda /user.slice inflight=17 sda /user.slice inflight=20 sda /user.slice inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19 inflight=19 sda /user.slice inflight=19 sda /user.slice
If a cgroup with stuck inflight ends up getting throttled, the throttled IOs will never get issued as there's no completion event to wake it up leading to an indefinite hang.
This patch fixes the bug by unifying enable handling into a work item which is automatically kicked off from iolatency_set_min_lat_nsec() which is called from both iolatency_set_limit() and iolatency_pd_offline() paths. Punting to a work item is necessary as iolatency_pd_offline() is called under spinlocks while freezing a request_queue requires a sleepable context.
This also simplifies the code reducing LOC sans the comments and avoids the unnecessary freezes which were happening whenever a cgroup's latency target is newly set or cleared.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Liu Bo <bo.liu@linux.alibaba.com> Fixes: 8c772a9bfc7c ("blk-iolatency: fix IO hang due to negative inflight counter") Cc: stable@vger.kernel.org # v5.0+ Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3607849d |
| 11-Jan-2022 |
Wolfgang Bumiller <w.bumiller@proxmox.com> |
blk-cgroup: always terminate io.stat lines
With the removal of seq_get_buf in blkcg_print_one_stat, we cannot make adding the newline conditional on there being relevant stats because the name was a
blk-cgroup: always terminate io.stat lines
With the removal of seq_get_buf in blkcg_print_one_stat, we cannot make adding the newline conditional on there being relevant stats because the name was already written out unconditionally. Otherwise we may end up with multiple device names in one line which is confusing and doesn't follow the nested-keyed file format.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Fixes: 252c651a4c85 ("blk-cgroup: stop using seq_get_buf") Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20220111083159.42340-1-w.bumiller@proxmox.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|