History log of /linux/drivers/md/raid0.c (Results 1 – 25 of 190)
Revision Date Author Comments
# 396799eb 03-Mar-2024 Christoph Hellwig <hch@lst.de>

md: remove mddev->queue

Just use the request_queue from the gendisk pointer in the relatively
few places that sill need it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed--by: Song Liu <son

md: remove mddev->queue

Just use the request_queue from the gendisk pointer in the relatively
few places that sill need it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed--by: Song Liu <song@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240303140150.5435-11-hch@lst.de

show more ...


# 56cf22d6 03-Mar-2024 Christoph Hellwig <hch@lst.de>

md/raid0: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling i

md/raid0: use the atomic queue limit update APIs

Build the queue limits outside the queue and apply them using
queue_limits_set. To make the code more obvious also split the queue
limits handling into a separate helper function.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed--by: Song Liu <song@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240303140150.5435-6-hch@lst.de

show more ...


# 176df894 03-Mar-2024 Christoph Hellwig <hch@lst.de>

md: add a mddev_is_dm helper

Add a helper to check for a DM-mapped MD device instead of using
the obfuscated ->gendisk or ->queue NULL checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed

md: add a mddev_is_dm helper

Add a helper to check for a DM-mapped MD device instead of using
the obfuscated ->gendisk or ->queue NULL checks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed--by: Song Liu <song@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240303140150.5435-4-hch@lst.de

show more ...


# c396b90e 03-Mar-2024 Christoph Hellwig <hch@lst.de>

md: add a mddev_trace_remap helper

Add a helper to trace bio remapping that hides some argument
dereferences and the check for a DM-mapped MD device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
R

md: add a mddev_trace_remap helper

Add a helper to trace bio remapping that hides some argument
dereferences and the check for a DM-mapped MD device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed--by: Song Liu <song@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20240303140150.5435-2-hch@lst.de

show more ...


# cc22b540 16-Aug-2023 David Jeffery <djeffery@redhat.com>

md: raid0: account for split bio in iostat accounting

When a bio is split by md raid0, the newly created bio will not be tracked
by md for I/O accounting. Only the portion of I/O still assigned to t

md: raid0: account for split bio in iostat accounting

When a bio is split by md raid0, the newly created bio will not be tracked
by md for I/O accounting. Only the portion of I/O still assigned to the
original bio which was reduced by the split will be accounted for. This
results in md iostat data sometimes showing I/O values far below the actual
amount of data being sent through md.

md_account_bio() needs to be called for all bio generated by the bio split.

A simple example of the issue was generated using a raid0 device on partitions
to the same device. Since all raid0 I/O then goes to one device, it makes it
easy to see a gap between the md device and its sd storage. Reading an lvm
device on top of the md device, the iostat output (some 0 columns and extra
devices removed to make the data more compact) was:

Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read
md2 0.00 0.00 0.00 0.00 0
sde 0.00 0.00 0.00 0.00 0
md2 1364.00 411496.00 0.00 0.00 411496
sde 1734.00 646144.00 0.00 0.00 646144
md2 1699.00 510680.00 0.00 0.00 510680
sde 2155.00 802784.00 0.00 0.00 802784
md2 803.00 241480.00 0.00 0.00 241480
sde 1016.00 377888.00 0.00 0.00 377888
md2 0.00 0.00 0.00 0.00 0
sde 0.00 0.00 0.00 0.00 0

I/O was generated doing large direct I/O reads (12M) with dd to a linear
lvm volume on top of the 4 leg raid0 device.

The md2 reads were showing as roughly 2/3 of the reads to the sde device
containing all of md2's raid partitions. The sum of reads to sde was
1826816 kB, which was the expected amount as it was the amount read by
dd. With the patch, the total reads from md will match the reads from
sde and be consistent with the amount of I/O generated.

Fixes: 10764815ff47 ("md: add io accounting for raid0 and raid5")
Signed-off-by: David Jeffery <djeffery@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230816181433.13289-1-djeffery@redhat.com

show more ...


# 319ff40a 14-Aug-2023 Jan Kara <jack@suse.cz>

md/raid0: Fix performance regression for large sequential writes

Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
things changed how bio that needs to be split is submitted. Befor

md/raid0: Fix performance regression for large sequential writes

Commit f00d7c85be9e ("md/raid0: fix up bio splitting.") among other
things changed how bio that needs to be split is submitted. Before this
commit, we have split the bio, mapped and submitted each part. After
this commit, we map only the first part of the split bio and submit the
second part unmapped. Due to bio sorting in __submit_bio_noacct() this
results in the following request ordering:

9,0 18 1181 0.525037895 15995 Q WS 1479315464 + 63392

Split off chunk-sized (1024 sectors) request:

9,0 18 1182 0.629019647 15995 X WS 1479315464 / 1479316488

Request is unaligned to the chunk so it's split in
raid0_make_request(). This is the first part mapped and punted to
bio_list:

8,0 18 7053 0.629020455 15995 A WS 739921928 + 1016 <- (9,0) 1479315464

Now raid0_make_request() returns, second part is postponed on
bio_list. __submit_bio_noacct() resorts the bio_list, mapped request
is submitted to the underlying device:

8,0 18 7054 0.629022782 15995 G WS 739921928 + 1016

Now we take another request from the bio_list which is the remainder
of the original huge request. Split off another chunk-sized bit from
it and the situation repeats:

9,0 18 1183 0.629024499 15995 X WS 1479316488 / 1479317512
8,16 18 6998 0.629025110 15995 A WS 739921928 + 1016 <- (9,0) 1479316488
8,16 18 6999 0.629026728 15995 G WS 739921928 + 1016
...
9,0 18 1184 0.629032940 15995 X WS 1479317512 / 1479318536 [libnetacq-write]
8,0 18 7059 0.629033294 15995 A WS 739922952 + 1016 <- (9,0) 1479317512
8,0 18 7060 0.629033902 15995 G WS 739922952 + 1016
...

This repeats until we consume the whole original huge request. Now we
finally get to processing the second parts of the split off requests
(in reverse order):

8,16 18 7181 0.629161384 15995 A WS 739952640 + 8 <- (9,0) 1479377920
8,0 18 7239 0.629162140 15995 A WS 739952640 + 8 <- (9,0) 1479376896
8,16 18 7186 0.629163881 15995 A WS 739951616 + 8 <- (9,0) 1479375872
8,0 18 7242 0.629164421 15995 A WS 739951616 + 8 <- (9,0) 1479374848
...

I guess it is obvious that this IO pattern is extremely inefficient way
to perform sequential IO. It also makes bio_list to grow to rather long
lengths.

Change raid0_make_request() to map both parts of the split bio. Since we
know we are provided with at most chunk-sized bios, we will always need
to split the incoming bio at most once.

Fixes: f00d7c85be9e ("md/raid0: fix up bio splitting.")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230814092720.3931-2-jack@suse.cz
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# af50e20a 14-Aug-2023 Jan Kara <jack@suse.cz>

md/raid0: Factor out helper for mapping and submitting a bio

Factor out helper function for mapping and submitting a bio out of
raid0_make_request(). We will use it later for submitting both parts o

md/raid0: Factor out helper for mapping and submitting a bio

Factor out helper function for mapping and submitting a bio out of
raid0_make_request(). We will use it later for submitting both parts of
a split bio.

Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20230814092720.3931-1-jack@suse.cz
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# c567c86b 21-Jun-2023 Yu Kuai <yukuai3@huawei.com>

md: move initialization and destruction of 'io_acct_set' to md.c

'io_acct_set' is only used for raid0 and raid456, prepare to use it for
raid1 and raid10, so that io accounting from different levels

md: move initialization and destruction of 'io_acct_set' to md.c

'io_acct_set' is only used for raid0 and raid456, prepare to use it for
raid1 and raid10, so that io accounting from different levels can be
consistent.

By the way, follow up patches will also use this io clone mechanism to
make sure 'active_io' represents in flight io, not io that is dispatching,
so that mddev_suspend will wait for io to be done as designed.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230621165110.1498313-2-yukuai1@huaweicloud.com

show more ...


# e8360070 23-Jun-2023 Jason Baron <jbaron@akamai.com>

md/raid0: add discard support for the 'original' layout

We've found that using raid0 with the 'original' layout and discard
enabled with different disk sizes (such that at least two zones are
create

md/raid0: add discard support for the 'original' layout

We've found that using raid0 with the 'original' layout and discard
enabled with different disk sizes (such that at least two zones are
created) can result in data corruption. This is due to the fact that
the discard handling in 'raid0_handle_discard()' assumes the 'alternate'
layout. We've seen this corruption using ext4 but other filesystems are
likely susceptible as well.

More specifically, while multiple zones are necessary to create the
corruption, the corruption may not occur with multiple zones if they
layout in such a way the layout matches what the 'alternate' layout
would have produced. Thus, not all raid0 devices with the 'original'
layout, different size disks and discard enabled will encounter this
corruption.

The 3.14 kernel inadvertently changed the raid0 disk layout for different
size disks. Thus, running a pre-3.14 kernel and post-3.14 kernel on the
same raid0 array could corrupt data. This lead to the creation of the
'original' layout (to match the pre-3.14 layout) and the 'alternate' layout
(to match the post 3.14 layout) in the 5.4 kernel time frame and an option
to tell the kernel which layout to use (since it couldn't be autodetected).
However, when the 'original' layout was added back to 5.4 discard support
for the 'original' layout was not added leading this issue.

I've been able to reliably reproduce the corruption with the following
test case:

1. create raid0 array with different size disks using original layout
2. mkfs
3. mount -o discard
4. create lots of files
5. remove 1/2 the files
6. fstrim -a (or just the mount point for the raid0 array)
7. umount
8. fsck -fn /dev/md0 (spews all sorts of corruptions)

Let's fix this by adding proper discard support to the 'original' layout.
The fix 'maps' the 'original' layout disks to the order in which they are
read/written such that we can compare the disks in the same way that the
current 'alternate' layout does. A 'disk_shift' field is added to
'struct strip_zone'. This could be computed on the fly in
raid0_handle_discard() but by adding this field, we save some computation
in the discard path.

Note we could also potentially fix this by re-ordering the disks in the
zones that follow the first one, and then always read/writing them using
the 'alternate' layout. However, that is seen as a more substantial change,
and we are attempting the least invasive fix at this time to remedy the
corruption.

I've verified the change using the reproducer mentioned above. Typically,
the corruption is seen after less than 3 iterations, while the patch has
run 500+ iterations.

Cc: NeilBrown <neilb@suse.de>
Cc: Song Liu <song@kernel.org>
Fixes: c84a1372df92 ("md/raid0: avoid RAID0 data corruption due to layout confusion.")
Cc: stable@vger.kernel.org
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230623180523.1901230-1-jbaron@akamai.com

show more ...


# c31fea2f 06-Mar-2023 Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>

md: add error_handlers for raid0 and linear

After the commit 9631abdbf406c("md: Set MD_BROKEN for RAID1 and RAID10")
MD_BROKEN must be set if array is failed because state_store() checks it.
If it i

md: add error_handlers for raid0 and linear

After the commit 9631abdbf406c("md: Set MD_BROKEN for RAID1 and RAID10")
MD_BROKEN must be set if array is failed because state_store() checks it.
If it is set then -EBUSY is returned to userspace.

For raid0 and linear MD_BROKEN is not set by error_handler(). As a result
mdadm is unable to trigger clean-up actions. It is a regression.

This patch adds appropriate error_handler for raid0 and linear. The
error handler sets MD_BROKEN for this device.

Reviewed-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
Signed-off-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230306130317.3418-1-mariusz.tkaczyk@linux.intel.com

show more ...


# 8e1a2279 02-Nov-2022 Xiao Ni <xni@redhat.com>

md/raid0, raid10: Don't set discard sectors for request queue

It should use disk_stack_limits to get a proper max_discard_sectors
rather than setting a value by stack drivers.

And there is a bug. I

md/raid0, raid10: Don't set discard sectors for request queue

It should use disk_stack_limits to get a proper max_discard_sectors
rather than setting a value by stack drivers.

And there is a bug. If all member disks are rotational devices,
raid0/raid10 set max_discard_sectors. So the member devices are
not ssd/nvme, but raid0/raid10 export the wrong value. It reports
warning messages in function __blkdev_issue_discard when mkfs.xfs
like this:

[ 4616.022599] ------------[ cut here ]------------
[ 4616.027779] WARNING: CPU: 4 PID: 99634 at block/blk-lib.c:50 __blkdev_issue_discard+0x16a/0x1a0
[ 4616.140663] RIP: 0010:__blkdev_issue_discard+0x16a/0x1a0
[ 4616.146601] Code: 24 4c 89 20 31 c0 e9 fe fe ff ff c1 e8 09 8d 48 ff 4c 89 f0 4c 09 e8 48 85 c1 0f 84 55 ff ff ff b8 ea ff ff ff e9 df fe ff ff <0f> 0b 48 8d 74 24 08 e8 ea d6 00 00 48 c7 c6 20 1e 89 ab 48 c7 c7
[ 4616.167567] RSP: 0018:ffffaab88cbffca8 EFLAGS: 00010246
[ 4616.173406] RAX: ffff9ba1f9e44678 RBX: 0000000000000000 RCX: ffff9ba1c9792080
[ 4616.181376] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ba1c9792080
[ 4616.189345] RBP: 0000000000000cc0 R08: ffffaab88cbffd10 R09: 0000000000000000
[ 4616.197317] R10: 0000000000000012 R11: 0000000000000000 R12: 0000000000000000
[ 4616.205288] R13: 0000000000400000 R14: 0000000000000cc0 R15: ffff9ba1c9792080
[ 4616.213259] FS: 00007f9a5534e980(0000) GS:ffff9ba1b7c80000(0000) knlGS:0000000000000000
[ 4616.222298] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4616.228719] CR2: 000055a390a4c518 CR3: 0000000123e40006 CR4: 00000000001706e0
[ 4616.236689] Call Trace:
[ 4616.239428] blkdev_issue_discard+0x52/0xb0
[ 4616.244108] blkdev_common_ioctl+0x43c/0xa00
[ 4616.248883] blkdev_ioctl+0x116/0x280
[ 4616.252977] __x64_sys_ioctl+0x8a/0xc0
[ 4616.257163] do_syscall_64+0x5c/0x90
[ 4616.261164] ? handle_mm_fault+0xc5/0x2a0
[ 4616.265652] ? do_user_addr_fault+0x1d8/0x690
[ 4616.270527] ? do_syscall_64+0x69/0x90
[ 4616.274717] ? exc_page_fault+0x62/0x150
[ 4616.279097] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 4616.284748] RIP: 0033:0x7f9a55398c6b

Signed-off-by: Xiao Ni <xni@redhat.com>
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# 1727fd50 23-Aug-2022 Saurabh Sengar <ssengar@linux.microsoft.com>

md: Replace snprintf with scnprintf

Current code produces a warning as shown below when total characters
in the constituent block device names plus the slashes exceeds 200.
snprintf() returns the nu

md: Replace snprintf with scnprintf

Current code produces a warning as shown below when total characters
in the constituent block device names plus the slashes exceeds 200.
snprintf() returns the number of characters generated from the given
input, which could cause the expression “200 – len” to wrap around
to a large positive number. Fix this by using scnprintf() instead,
which returns the actual number of characters written into the buffer.

[ 1513.267938] ------------[ cut here ]------------
[ 1513.267943] WARNING: CPU: 15 PID: 37247 at <snip>/lib/vsprintf.c:2509 vsnprintf+0x2c8/0x510
[ 1513.267944] Modules linked in: <snip>
[ 1513.267969] CPU: 15 PID: 37247 Comm: mdadm Not tainted 5.4.0-1085-azure #90~18.04.1-Ubuntu
[ 1513.267969] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022
[ 1513.267971] RIP: 0010:vsnprintf+0x2c8/0x510
<-snip->
[ 1513.267982] Call Trace:
[ 1513.267986] snprintf+0x45/0x70
[ 1513.267990] ? disk_name+0x71/0xa0
[ 1513.267993] dump_zones+0x114/0x240 [raid0]
[ 1513.267996] ? _cond_resched+0x19/0x40
[ 1513.267998] raid0_run+0x19e/0x270 [raid0]
[ 1513.268000] md_run+0x5e0/0xc50
[ 1513.268003] ? security_capable+0x3f/0x60
[ 1513.268005] do_md_run+0x19/0x110
[ 1513.268006] md_ioctl+0x195e/0x1f90
[ 1513.268007] blkdev_ioctl+0x91f/0x9f0
[ 1513.268010] block_ioctl+0x3d/0x50
[ 1513.268012] do_vfs_ioctl+0xa9/0x640
[ 1513.268014] ? __fput+0x162/0x260
[ 1513.268016] ksys_ioctl+0x75/0x80
[ 1513.268017] __x64_sys_ioctl+0x1a/0x20
[ 1513.268019] do_syscall_64+0x5e/0x200
[ 1513.268021] entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: 766038846e875 ("md/raid0: replace printk() with pr_*()")
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# 0f2571ad 12-May-2022 Xiao Ni <xni@redhat.com>

md: Don't set mddev private to NULL in raid0 pers->free

In normal stop process, it does like this:
do_md_stop
|
__md_stop (pers->free(); mddev->private=NULL)
|
md_free (free mdd

md: Don't set mddev private to NULL in raid0 pers->free

In normal stop process, it does like this:
do_md_stop
|
__md_stop (pers->free(); mddev->private=NULL)
|
md_free (free mddev)
__md_stop sets mddev->private to NULL after pers->free. The raid device
will be stopped and mddev memory is free. But in reshape, it doesn't
free the mddev and mddev will still be used in new raid.

In reshape, it first sets mddev->private to new_pers and then runs
old_pers->free(). Now raid0 sets mddev->private to NULL in raid0_free.
The new raid can't work anymore. It will panic when dereference
mddev->private because of NULL pointer dereference.

It can panic like this:
[63010.814972] kernel BUG at drivers/md/raid10.c:928!
[63010.819778] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[63010.825011] CPU: 3 PID: 44437 Comm: md0_resync Kdump: loaded Not tainted 5.14.0-86.el9.x86_64 #1
[63010.833789] Hardware name: Dell Inc. PowerEdge R6415/07YXFK, BIOS 1.15.0 09/11/2020
[63010.841440] RIP: 0010:raise_barrier+0x161/0x170 [raid10]
[63010.865508] RSP: 0018:ffffc312408bbc10 EFLAGS: 00010246
[63010.870734] RAX: 0000000000000000 RBX: ffffa00bf7d39800 RCX: 0000000000000000
[63010.877866] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffa00bf7d39800
[63010.884999] RBP: 0000000000000000 R08: fffffa4945e74400 R09: 0000000000000000
[63010.892132] R10: ffffa00eed02f798 R11: 0000000000000000 R12: ffffa00bbc435200
[63010.899266] R13: ffffa00bf7d39800 R14: 0000000000000400 R15: 0000000000000003
[63010.906399] FS: 0000000000000000(0000) GS:ffffa00eed000000(0000) knlGS:0000000000000000
[63010.914485] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63010.920229] CR2: 00007f5cfbe99828 CR3: 0000000105efe000 CR4: 00000000003506e0
[63010.927363] Call Trace:
[63010.929822] ? bio_reset+0xe/0x40
[63010.933144] ? raid10_alloc_init_r10buf+0x60/0xa0 [raid10]
[63010.938629] raid10_sync_request+0x756/0x1610 [raid10]
[63010.943770] md_do_sync.cold+0x3e4/0x94c
[63010.947698] md_thread+0xab/0x160
[63010.951024] ? md_write_inc+0x50/0x50
[63010.954688] kthread+0x149/0x170
[63010.957923] ? set_kthread_struct+0x40/0x40
[63010.962107] ret_from_fork+0x22/0x30

Removing the code that sets mddev->private to NULL in raid0 can fix
problem.

Fixes: 0c031fd37f69 (md: Move alloc/free acct bioset in to personality)
Reported-by: Fine Fan <ffan@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# 913cce5a 12-May-2022 Christoph Hellwig <hch@lst.de>

md: remove most calls to bdevname

Use the %pg format specifier to save on stack consumption and code size.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Song Liu <song@kernel.org>


# ea23994e 13-Apr-2022 Pascal Hambourg <pascal@plouf.fr.eu.org>

md/raid0: Ignore RAID0 layout if the second zone has only one device

The RAID0 layout is irrelevant if all members have the same size so the
array has only one zone. It is *also* irrelevant if the a

md/raid0: Ignore RAID0 layout if the second zone has only one device

The RAID0 layout is irrelevant if all members have the same size so the
array has only one zone. It is *also* irrelevant if the array has two
zones and the second zone has only one device, for example if the array
has two members of different sizes.

So in that case it makes sense to allow assembly even when the layout is
undefined, like what is done when the array has only one zone.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Pascal Hambourg <pascal@plouf.fr.eu.org>
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# 70200574 15-Apr-2022 Christoph Hellwig <hch@lst.de>

block: remove QUEUE_FLAG_DISCARD

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention

block: remove QUEUE_FLAG_DISCARD

Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 10fa225c 09-Feb-2022 Christoph Hellwig <hch@lst.de>

scsi: md: Remove WRITE_SAME support

There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-6-hch@lst.de
Signed-of

scsi: md: Remove WRITE_SAME support

There are no more end-users of REQ_OP_WRITE_SAME left, so we can start
deleting it.

Link: https://lore.kernel.org/r/20220209082828.2629273-6-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

show more ...


# 0c031fd3 10-Dec-2021 Xiao Ni <xni@redhat.com>

md: Move alloc/free acct bioset in to personality

bioset acct is only needed for raid0 and raid5. Therefore, md_run only
allocates it for raid0 and raid5. However, this does not cover
personality ta

md: Move alloc/free acct bioset in to personality

bioset acct is only needed for raid0 and raid5. Therefore, md_run only
allocates it for raid0 and raid5. However, this does not cover
personality takeover, which may cause uninitialized bioset. For example,
the following repro steps:

mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
mdadm --wait /dev/md0
mkfs.xfs /dev/md0
mdadm /dev/md0 --grow -l5
mount /dev/md0 /mnt

causes panic like:

[ 225.933939] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 225.934903] #PF: supervisor instruction fetch in kernel mode
[ 225.935639] #PF: error_code(0x0010) - not-present page
[ 225.936361] PGD 0 P4D 0
[ 225.936677] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
[ 225.937525] CPU: 27 PID: 1133 Comm: mount Not tainted 5.16.0-rc3+ #706
[ 225.938416] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.module_el8.4.0+547+a85d02ba 04/01/2014
[ 225.939922] RIP: 0010:0x0
[ 225.940289] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 225.941196] RSP: 0018:ffff88815897eff0 EFLAGS: 00010246
[ 225.941897] RAX: 0000000000000000 RBX: 0000000000092800 RCX: ffffffff81370a39
[ 225.942813] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000092800
[ 225.943772] RBP: 1ffff1102b12fe04 R08: fffffbfff0b43c01 R09: fffffbfff0b43c01
[ 225.944807] R10: ffffffff85a1e007 R11: fffffbfff0b43c00 R12: ffff88810eaaaf58
[ 225.945757] R13: 0000000000000000 R14: ffff88810eaaafb8 R15: ffff88815897f040
[ 225.946709] FS: 00007ff3f2505080(0000) GS:ffff888fb5e00000(0000) knlGS:0000000000000000
[ 225.947814] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 225.948556] CR2: ffffffffffffffd6 CR3: 000000015aa5a006 CR4: 0000000000370ee0
[ 225.949537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 225.950455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 225.951414] Call Trace:
[ 225.951787] <TASK>
[ 225.952120] mempool_alloc+0xe5/0x250
[ 225.952625] ? mempool_resize+0x370/0x370
[ 225.953187] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 225.953862] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 225.954464] ? sched_clock_cpu+0x15/0x120
[ 225.955019] ? find_held_lock+0xac/0xd0
[ 225.955564] bio_alloc_bioset+0x1ed/0x2a0
[ 225.956080] ? lock_downgrade+0x3a0/0x3a0
[ 225.956644] ? bvec_alloc+0xc0/0xc0
[ 225.957135] bio_clone_fast+0x19/0x80
[ 225.957651] raid5_make_request+0x1370/0x1b70
[ 225.958286] ? sched_clock_cpu+0x15/0x120
[ 225.958797] ? __lock_acquire+0x8b2/0x3510
[ 225.959339] ? raid5_get_active_stripe+0xce0/0xce0
[ 225.959986] ? lock_is_held_type+0xd8/0x130
[ 225.960528] ? rcu_read_lock_sched_held+0xa1/0xd0
[ 225.961135] ? rcu_read_lock_bh_held+0xb0/0xb0
[ 225.961703] ? sched_clock_cpu+0x15/0x120
[ 225.962232] ? lock_release+0x27a/0x6c0
[ 225.962746] ? do_wait_intr_irq+0x130/0x130
[ 225.963302] ? lock_downgrade+0x3a0/0x3a0
[ 225.963815] ? lock_release+0x6c0/0x6c0
[ 225.964348] md_handle_request+0x342/0x530
[ 225.964888] ? set_in_sync+0x170/0x170
[ 225.965397] ? blk_queue_split+0x133/0x150
[ 225.965988] ? __blk_queue_split+0x8b0/0x8b0
[ 225.966524] ? submit_bio_checks+0x3b2/0x9d0
[ 225.967069] md_submit_bio+0x127/0x1c0
[...]

Fix this by moving alloc/free of acct bioset to pers->run and pers->free.

While we are on this, properly handle md_integrity_register() error in
raid0_run().

Fixes: daee2024715d (md: check level before create and exit io_acct_set)
Cc: stable@vger.kernel.org
Acked-by: Guoqing Jiang <guoqing.jiang@linux.dev>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# 10764815 25-May-2021 Guoqing Jiang <jgq516@gmail.com>

md: add io accounting for raid0 and raid5

We introduce a new bioset (io_acct_set) for raid0 and raid5 since they
don't own clone infrastructure to accounting io. And the bioset is added
to mddev ins

md: add io accounting for raid0 and raid5

We introduce a new bioset (io_acct_set) for raid0 and raid5 since they
don't own clone infrastructure to accounting io. And the bioset is added
to mddev instead of to raid0 and raid5 layer, because with this way, we
can put common functions to md.h and reuse them in raid0 and raid5.

Also struct md_io_acct is added accordingly which includes io start_time,
the origin bio and cloned bio. Then we can call bio_{start,end}_io_acct
to get related io status.

Signed-off-by: Guoqing Jiang <jiangguoqing@kylinos.cn>
Signed-off-by: Song Liu <song@kernel.org>

show more ...


# cf78408f 04-Feb-2021 Xiao Ni <xni@redhat.com>

md: add md_submit_discard_bio() for submitting discard bio

Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing

md: add md_submit_discard_bio() for submitting discard bio

Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Tested-by: Adrian Huang <ahuang12@lenovo.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>

show more ...


# 57a0f3a8 09-Dec-2020 Song Liu <songliubraving@fb.com>

Revert "md: add md_submit_discard_bio() for submitting discard bio"

This reverts commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0.

Matthew Ruffell reported data corruption in raid10 due to the chang

Revert "md: add md_submit_discard_bio() for submitting discard bio"

This reverts commit 2628089b74d5a64bd0bcb5d247a18f78d7b6f4d0.

Matthew Ruffell reported data corruption in raid10 due to the changes
in discard handling [1]. Revert these changes before we find a proper fix.

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1907262/
Cc: Matthew Ruffell <matthew.ruffell@canonical.com>
Cc: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>

show more ...


# 1c02fca6 03-Dec-2020 Christoph Hellwig <hch@lst.de>

block: remove the request_queue argument to the block_bio_remap tracepoint

The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien

block: remove the request_queue argument to the block_bio_remap tracepoint

The request_queue can trivially be derived from the bio.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# d7a1c483 29-Sep-2020 Jason Yan <yanaijie@huawei.com>

md/raid0: remove unused function is_io_in_chunk_boundary()

This function is no longger needed after commit 20d0189b1012 ("block:
Introduce new bio_split()").

Signed-off-by: Jason Yan <yanaijie@huaw

md/raid0: remove unused function is_io_in_chunk_boundary()

This function is no longger needed after commit 20d0189b1012 ("block:
Introduce new bio_split()").

Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>

show more ...


# 2628089b 25-Aug-2020 Xiao Ni <xni@redhat.com>

md: add md_submit_discard_bio() for submitting discard bio

Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing

md: add md_submit_discard_bio() for submitting discard bio

Move these logic from raid0.c to md.c, so that we can also use it in
raid10.c.

Reviewed-by: Coly Li <colyli@suse.de>
Reviewed-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>

show more ...


# c2e4cd57 24-Sep-2020 Christoph Hellwig <hch@lst.de>

block: lift setting the readahead size into the block layer

Drivers shouldn't really mess with the readahead size, as that is a VM
concept. Instead set it based on the optimal I/O size by lifting t

block: lift setting the readahead size into the block layer

Drivers shouldn't really mess with the readahead size, as that is a VM
concept. Instead set it based on the optimal I/O size by lifting the
algorithm from the md driver when registering the disk. Also set
bdi->io_pages there as well by applying the same scheme based on
max_sectors. To ensure the limits work well for stacking drivers a
new helper is added to update the readahead limits from the block
limits, which is also called from disk_stack_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Coly Li <colyli@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


12345678