History log of /linux/block/blk-iolatency.c (Results 1 – 25 of 73)
Revision Date Author Comments
# 08420cf7 15-Jan-2024 Jens Axboe <axboe@kernel.dk>

block: add blk_time_get_ns() and blk_time_get() helpers

Convert any user of ktime_get_ns() to use blk_time_get_ns(), and
ktime_get() to blk_time_get(), so we have a unified API for querying the
curr

block: add blk_time_get_ns() and blk_time_get() helpers

Convert any user of ktime_get_ns() to use blk_time_get_ns(), and
ktime_get() to blk_time_get(), so we have a unified API for querying the
current time in nanoseconds or as ktime.

No functional changes intended, this patch just wraps ktime_get_ns()
and ktime_get() with a block helper.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 18267a03 10-Aug-2023 Jens Axboe <axboe@kernel.dk>

block: fix bad lockdep annotation in blk-iolatency

A previous commit added a lockdep annotation, but botched it. Use the
right type.

Fixes: 4eb44d10766a ("block: remove init_mutex and open-code blk

block: fix bad lockdep annotation in blk-iolatency

A previous commit added a lockdep annotation, but botched it. Use the
right type.

Fixes: 4eb44d10766a ("block: remove init_mutex and open-code blk_iolatency_try_init")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 4eb44d10 10-Aug-2023 Li Lingfeng <lilingfeng3@huawei.com>

block: remove init_mutex and open-code blk_iolatency_try_init

Commit a13696b83da4 ("blk-iolatency: Make initialization lazy") adds
a mutex named "init_mutex" in blk_iolatency_try_init for the race
c

block: remove init_mutex and open-code blk_iolatency_try_init

Commit a13696b83da4 ("blk-iolatency: Make initialization lazy") adds
a mutex named "init_mutex" in blk_iolatency_try_init for the race
condition of initializing RQ_QOS_LATENCY.
Now a new lock has been add to struct request_queue by commit a13bd91be223
("block/rq_qos: protect rq_qos apis with a new lock"). And it has been
held in blkg_conf_open_bdev before calling blk_iolatency_init.
So it's not necessary to keep init_mutex in blk_iolatency_try_init, just
remove it.

Since init_mutex has been removed, blk_iolatency_try_init can be
open-coded back to iolatency_set_limit() like ioc_qos_write().

Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Michal Koutný <mkoutny@suse.com>
Link: https://lore.kernel.org/r/20230810035111.2236335-1-lilingfeng@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# a13696b8 13-Apr-2023 Tejun Heo <tj@kernel.org>

blk-iolatency: Make initialization lazy

Other rq_qos policies such as wbt and iocost are lazy-initialized when they
are configured for the first time for the device but iolatency is
initialized unco

blk-iolatency: Make initialization lazy

Other rq_qos policies such as wbt and iocost are lazy-initialized when they
are configured for the first time for the device but iolatency is
initialized unconditionally from blkcg_init_disk() during gendisk init. Lazy
init is beneficial because rq_qos policies add runtime overhead when
initialized as every IO has to walk all registered rq_qos callbacks.

This patch switches iolatency to lazy initialization too so that it only
registered its rq_qos policy when it is first configured.

Note that there is a known race condition between blkcg config file writes
and del_gendisk() and this patch makes iolatency susceptible to it by
exposing the init path to race against the deletion path. However, that
problem already exists in iocost and is being worked on.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20230413000649.115785-5-tj@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 33049187 13-Apr-2023 Tejun Heo <tj@kernel.org>

blk-iolatency: s/blkcg_rq_qos/iolat_rq_qos/

The name was too generic given that there are multiple blkcg rq-qos
policies.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Hellwig <hc

blk-iolatency: s/blkcg_rq_qos/iolat_rq_qos/

The name was too generic given that there are multiple blkcg rq-qos
policies.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20230413000649.115785-4-tj@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# faffaab2 13-Apr-2023 Tejun Heo <tj@kernel.org>

blkcg: Restructure blkg_conf_prep() and friends

We want to support lazy init of rq-qos policies so that iolatency is enabled
lazily on configuration instead of gendisk initialization. The way blkg
c

blkcg: Restructure blkg_conf_prep() and friends

We want to support lazy init of rq-qos policies so that iolatency is enabled
lazily on configuration instead of gendisk initialization. The way blkg
config helpers are structured now is a bit awkward for that. Let's
restructure:

* blkcg_conf_open_bdev() is renamed to blkg_conf_open_bdev(). The blkcg_
prefix was used because the bdev opening step is blkg-independent.
However, the distinction is too subtle and confuses more than helps. Let's
switch to blkg prefix so that it's consistent with the type and other
helper names.

* struct blkg_conf_ctx now remembers the original input string and is always
initialized by the new blkg_conf_init().

* blkg_conf_open_bdev() is updated to take a pointer to blkg_conf_ctx like
blkg_conf_prep() and can be called multiple times safely. Instead of
modifying the double pointer to input string directly,
blkg_conf_open_bdev() now sets blkg_conf_ctx->body.

* blkg_conf_finish() is renamed to blkg_conf_exit() for symmetry and now
must be called on all blkg_conf_ctx's which were initialized with
blkg_conf_init().

Combined, this allows the users to either open the bdev first or do it
altogether with blkg_conf_prep() which will help implementing lazy init of
rq-qos policies.

blkg_conf_init/exit() will also be used implement synchronization against
device removal. This is necessary because iolat / iocost are configured
through cgroupfs instead of one of the files under /sys/block/DEVICE. As
cgroupfs operations aren't synchronized with block layer, the lazy init and
other configuration operations may race against device removal. This patch
makes blkg_conf_init/exit() used consistently for all cgroup-orginating
configurations making them a good place to implement explicit
synchronization.

Users are updated accordingly. No behavior change is intended by this patch.

v2: bfq wasn't updated in v1 causing a build error. Fixed.

v3: Update the description to include future use of blkg_conf_init/exit() as
synchronization points.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yu Kuai <yukuai1@huaweicloud.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230413000649.115785-3-tj@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# a06377c5 14-Feb-2023 Christoph Hellwig <hch@lst.de>

Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"

This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/2

Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"

This reverts commit 84d7d462b16dd5f0bf7c7ca9254bf81db2c952a2.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 1231039d 14-Feb-2023 Christoph Hellwig <hch@lst.de>

Revert "blk-cgroup: move the cgroup information to struct gendisk"

This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch
it depends on caused a few problems.

Signed-off-by: Christ

Revert "blk-cgroup: move the cgroup information to struct gendisk"

This reverts commit 3f13ab7c80fdb0ada86a8e3e818960bc1ccbaa59 as a patch
it depends on caused a few problems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 3f13ab7c 03-Feb-2023 Christoph Hellwig <hch@lst.de>

blk-cgroup: move the cgroup information to struct gendisk

cgroup information only makes sense on a live gendisk that allows
file system I/O (which includes the raw block device). So move over
the c

blk-cgroup: move the cgroup information to struct gendisk

cgroup information only makes sense on a live gendisk that allows
file system I/O (which includes the raw block device). So move over
the cgroup related members.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203150400.3199230-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 0a0b4f79 03-Feb-2023 Christoph Hellwig <hch@lst.de>

blk-cgroup: pass a gendisk to pd_alloc_fn

No need to the request_queue here, pass a gendisk and extract the
node ids from that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas He

blk-cgroup: pass a gendisk to pd_alloc_fn

No need to the request_queue here, pass a gendisk and extract the
node ids from that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203150400.3199230-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 40e4996e 03-Feb-2023 Christoph Hellwig <hch@lst.de>

blk-cgroup: pass a gendisk to blkcg_{de,}activate_policy

Prepare for storing the blkcg information in the gendisk instead of
the request_queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewe

blk-cgroup: pass a gendisk to blkcg_{de,}activate_policy

Prepare for storing the blkcg information in the gendisk instead of
the request_queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203150400.3199230-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# ba91c849 03-Feb-2023 Christoph Hellwig <hch@lst.de>

blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos

This is what about half of the users already want, and it's only going to
grow more.

Signed-off-by: Christoph Hellwig <hch@lst.

blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos

This is what about half of the users already want, and it's only going to
grow more.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203150400.3199230-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 3963d84d 03-Feb-2023 Christoph Hellwig <hch@lst.de>

blk-rq-qos: constify rq_qos_ops

These op vectors are constant, so mark them const.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun He

blk-rq-qos: constify rq_qos_ops

These op vectors are constant, so mark them const.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203150400.3199230-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# ce57b558 03-Feb-2023 Christoph Hellwig <hch@lst.de>

blk-rq-qos: make rq_qos_add and rq_qos_del more useful

Switch to passing a gendisk, and make rq_qos_add initialize all required
fields and drop the not required q argument from rq_qos_del.

Signed-o

blk-rq-qos: make rq_qos_add and rq_qos_del more useful

Switch to passing a gendisk, and make rq_qos_add initialize all required
fields and drop the not required q argument from rq_qos_del.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203150400.3199230-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 84d7d462 03-Feb-2023 Christoph Hellwig <hch@lst.de>

blk-cgroup: pin the gendisk in struct blkcg_gq

Currently each blkcg_gq holds a request_queue reference, which is what
is used in the policies. But a lot of these interfaces will move over to
use a

blk-cgroup: pin the gendisk in struct blkcg_gq

Currently each blkcg_gq holds a request_queue reference, which is what
is used in the policies. But a lot of these interfaces will move over to
use a gendisk, so store a disk in struct blkcg_gq and hold a reference to
it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203150400.3199230-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 292a089d 20-Dec-2022 Steven Rostedt (Google) <rostedt@goodmis.org>

treewide: Convert del_timer*() to timer_shutdown*()

Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called

treewide: Convert del_timer*() to timer_shutdown*()

Due to several bugs caused by timers being re-armed after they are
shutdown and just before they are freed, a new state of timers was added
called "shutdown". After a timer is set to this state, then it can no
longer be re-armed.

The following script was run to find all the trivial locations where
del_timer() or del_timer_sync() is called in the same function that the
object holding the timer is freed. It also ignores any locations where
the timer->function is modified between the del_timer*() and the free(),
as that is not considered a "trivial" case.

This was created by using a coccinelle script and the following
commands:

$ cat timer.cocci
@@
expression ptr, slab;
identifier timer, rfield;
@@
(
- del_timer(&ptr->timer);
+ timer_shutdown(&ptr->timer);
|
- del_timer_sync(&ptr->timer);
+ timer_shutdown_sync(&ptr->timer);
)
... when strict
when != ptr->timer
(
kfree_rcu(ptr, rfield);
|
kmem_cache_free(slab, ptr);
|
kfree(ptr);
)

$ spatch timer.cocci . > /tmp/t.patch
$ patch -p1 < /tmp/t.patch

Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ]
Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ]
Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# dc572f41 18-Oct-2022 Kemeng Shi <shikemeng@huawei.com>

block: Replace struct rq_depth with unsigned int in struct iolatency_grp

We only need a max queue depth for every iolatency to limit the inflight io
number. Replace struct rq_depth with unsigned int

block: Replace struct rq_depth with unsigned int in struct iolatency_grp

We only need a max queue depth for every iolatency to limit the inflight io
number. Replace struct rq_depth with unsigned int to simplfy "struct
iolatency_grp" and save memory.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20221018111240.22612-4-shikemeng@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 6891f968 18-Oct-2022 Kemeng Shi <shikemeng@huawei.com>

block: Correct comment for scale_cookie_change

Default queue depth of iolatency_grp is unlimited, so we scale down
quickly(once by half) in scale_cookie_change. Remove the "subtract
1/16th" part whi

block: Correct comment for scale_cookie_change

Default queue depth of iolatency_grp is unlimited, so we scale down
quickly(once by half) in scale_cookie_change. Remove the "subtract
1/16th" part which is not the truth and add the actual way we
scale down.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Link: https://lore.kernel.org/r/20221018111240.22612-3-shikemeng@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# db5896e9 18-Oct-2022 Kemeng Shi <shikemeng@huawei.com>

block: Remove redundant parent blkcg_gp check in check_scale_change

Function blkcg_iolatency_throttle will make sure blkg->parent is not
NULL before calls check_scale_change. And function check_scal

block: Remove redundant parent blkcg_gp check in check_scale_change

Function blkcg_iolatency_throttle will make sure blkg->parent is not
NULL before calls check_scale_change. And function check_scale_change
is only called in blkcg_iolatency_throttle.

Signed-off-by: Kemeng Shi <shikemeng@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20221018111240.22612-2-shikemeng@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# de185b56 21-Sep-2022 Christoph Hellwig <hch@lst.de>

blk-cgroup: pass a gendisk to blkcg_schedule_throttle

Pass the gendisk to blkcg_schedule_throttle as part of moving the
blk-cgroup infrastructure to be gendisk based. Remove the unused
!BLK_CGROUP

blk-cgroup: pass a gendisk to blkcg_schedule_throttle

Pass the gendisk to blkcg_schedule_throttle as part of moving the
blk-cgroup infrastructure to be gendisk based. Remove the unused
!BLK_CGROUP stub while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220921180501.1539876-17-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 16fac1b5 21-Sep-2022 Christoph Hellwig <hch@lst.de>

blk-iolatency: pass a gendisk to blk_iolatency_init

Pass the gendisk to blk_iolatency_init as part of moving the blk-cgroup
infrastructure to be gendisk based.

Signed-off-by: Christoph Hellwig <hch

blk-iolatency: pass a gendisk to blk_iolatency_init

Pass the gendisk to blk_iolatency_init as part of moving the blk-cgroup
infrastructure to be gendisk based.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andreas Herrmann <aherrmann@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220921180501.1539876-9-hch@lst.de
[axboe: missed inline for blk_iolatency_init() and !CONFIG_BLK_CGROUP_IOLATENCY]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 14a6e2eb 20-Jul-2022 Jinke Han <hanjinke.666@bytedance.com>

block: don't allow the same type rq_qos add more than once

In our test of iocost, we encountered some list add/del corruptions of
inner_walk list in ioc_timer_fn.

The reason can be described as fol

block: don't allow the same type rq_qos add more than once

In our test of iocost, we encountered some list add/del corruptions of
inner_walk list in ioc_timer_fn.

The reason can be described as follows:

cpu 0 cpu 1
ioc_qos_write ioc_qos_write

ioc = q_to_ioc(queue);
if (!ioc) {
ioc = kzalloc();
ioc = q_to_ioc(queue);
if (!ioc) {
ioc = kzalloc();
...
rq_qos_add(q, rqos);
}
...
rq_qos_add(q, rqos);
...
}

When the io.cost.qos file is written by two cpus concurrently, rq_qos may
be added to one disk twice. In that case, there will be two iocs enabled
and running on one disk. They own different iocgs on their active list. In
the ioc_timer_fn function, because of the iocgs from two iocs have the
same root iocg, the root iocg's walk_list may be overwritten by each other
and this leads to list add/del corruptions in building or destroying the
inner_walk list.

And so far, the blk-rq-qos framework works in case that one instance for
one type rq_qos per queue by default. This patch make this explicit and
also fix the crash above.

Signed-off-by: Jinke Han <hanjinke.666@bytedance.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# aee8960c 12-Jul-2022 Uros Bizjak <ubizjak@gmail.com>

blk-iolatency: Use atomic{,64}_try_cmpxchg

Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
in check_scale_change and atomic64_try_cmpxchg in blkcg_iolatency_done_bio.
x86 CM

blk-iolatency: Use atomic{,64}_try_cmpxchg

Use atomic_try_cmpxchg instead of atomic_cmpxchg (*ptr, old, new) == old
in check_scale_change and atomic64_try_cmpxchg in blkcg_iolatency_done_bio.
x86 CMPXCHG instruction returns success in ZF flag, so this change saves a
compare after cmpxchg (and related move instruction in front of cmpxchg).

No functional change intended.

Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220712151947.6783-1-ubizjak@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 8a177a36 14-May-2022 Tejun Heo <tj@kernel.org>

blk-iolatency: Fix inflight count imbalances and IO hangs on offline

iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup

blk-iolatency: Fix inflight count imbalances and IO hangs on offline

iolatency needs to track the number of inflight IOs per cgroup. As this
tracking can be expensive, it is disabled when no cgroup has iolatency
configured for the device. To ensure that the inflight counters stay
balanced, iolatency_set_limit() freezes the request_queue while manipulating
the enabled counter, which ensures that no IO is in flight and thus all
counters are zero.

Unfortunately, iolatency_set_limit() isn't the only place where the enabled
counter is manipulated. iolatency_pd_offline() can also dec the counter and
trigger disabling. As this disabling happens without freezing the q, this
can easily happen while some IOs are in flight and thus leak the counts.

This can be easily demonstrated by turning on iolatency on an one empty
cgroup while IOs are in flight in other cgroups and then removing the
cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
system to ensure that removing the cgroup disables iolatency for the whole
device.

The following keeps flipping on and off iolatency on sda:

echo +io > /sys/fs/cgroup/cgroup.subtree_control
while true; do
mkdir -p /sys/fs/cgroup/test
echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency
sleep 1
rmdir /sys/fs/cgroup/test
sleep 1
done

and there's concurrent fio generating direct rand reads:

fio --name test --filename=/dev/sda --direct=1 --rw=randread \
--runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k

while monitoring with the following drgn script:

while True:
for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
if pd.value_() == 0:
continue
iolat = container_of(pd, 'struct iolatency_grp', 'pd')
inflight = iolat.rq_wait.inflight.counter.value_()
if inflight:
print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
f'{cgroup_path(css.cgroup).decode("utf-8")}')
time.sleep(1)

The monitoring output looks like the following:

inflight=1 sda /user.slice
inflight=1 sda /user.slice
...
inflight=14 sda /user.slice
inflight=13 sda /user.slice
inflight=17 sda /user.slice
inflight=15 sda /user.slice
inflight=18 sda /user.slice
inflight=17 sda /user.slice
inflight=20 sda /user.slice
inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19
inflight=19 sda /user.slice
inflight=19 sda /user.slice

If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
will never get issued as there's no completion event to wake it up leading
to an indefinite hang.

This patch fixes the bug by unifying enable handling into a work item which
is automatically kicked off from iolatency_set_min_lat_nsec() which is
called from both iolatency_set_limit() and iolatency_pd_offline() paths.
Punting to a work item is necessary as iolatency_pd_offline() is called
under spinlocks while freezing a request_queue requires a sleepable context.

This also simplifies the code reducing LOC sans the comments and avoids the
unnecessary freezes which were happening whenever a cgroup's latency target
is newly set or cleared.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Liu Bo <bo.liu@linux.alibaba.com>
Fixes: 8c772a9bfc7c ("blk-iolatency: fix IO hang due to negative inflight counter")
Cc: stable@vger.kernel.org # v5.0+
Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 3607849d 11-Jan-2022 Wolfgang Bumiller <w.bumiller@proxmox.com>

blk-cgroup: always terminate io.stat lines

With the removal of seq_get_buf in blkcg_print_one_stat, we
cannot make adding the newline conditional on there being
relevant stats because the name was a

blk-cgroup: always terminate io.stat lines

With the removal of seq_get_buf in blkcg_print_one_stat, we
cannot make adding the newline conditional on there being
relevant stats because the name was already written out
unconditionally.
Otherwise we may end up with multiple device names in one
line which is confusing and doesn't follow the nested-keyed
file format.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Fixes: 252c651a4c85 ("blk-cgroup: stop using seq_get_buf")
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220111083159.42340-1-w.bumiller@proxmox.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


123