#
85f273a6 |
| 27-Jan-2024 |
Kent Overstreet <kent.overstreet@linux.dev> |
fs/pipe: Convert to lockdep_cmp_fn
*_lock_nested() is fundamentally broken; lockdep needs to check lock ordering, but we cannot device a total ordering on an unbounded number of elements with only a
fs/pipe: Convert to lockdep_cmp_fn
*_lock_nested() is fundamentally broken; lockdep needs to check lock ordering, but we cannot device a total ordering on an unbounded number of elements with only a few subclasses.
the replacement is to define lock ordering with a proper comparison function.
fs/pipe.c was already doing everything correctly otherwise, nothing much changes here.
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Link: https://lore.kernel.org/r/20240127020111.487218-2-kent.overstreet@linux.dev Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
9d5b9475 |
| 21-Nov-2023 |
Joel Granados <j.granados@samsung.com> |
fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels
fs: Remove the now superfluous sentinel elements from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Remove sentinel elements ctl_table struct. Special attention was placed in making sure that an empty directory for fs/verity was created when CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the register sysctl call that expects a size.
Signed-off-by: Joel Granados <j.granados@samsung.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
show more ...
|
#
e95aada4 |
| 01-Dec-2023 |
Lukas Schauer <lukas@schauer.dev> |
pipe: wakeup wr_wait after setting max_usage
Commit c73be61cede5 ("pipe: Add general notification queue support") a regression was introduced that would lock up resized pipes under certain condition
pipe: wakeup wr_wait after setting max_usage
Commit c73be61cede5 ("pipe: Add general notification queue support") a regression was introduced that would lock up resized pipes under certain conditions. See the reproducer in [1].
The commit resizing the pipe ring size was moved to a different function, doing that moved the wakeup for pipe->wr_wait before actually raising pipe->max_usage. If a pipe was full before the resize occured it would result in the wakeup never actually triggering pipe_write.
Set @max_usage and @nr_accounted before waking writers if this isn't a watch queue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212295 [1] Link: https://lore.kernel.org/r/20231201-orchideen-modewelt-e009de4562c6@brauner Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reviewed-by: David Howells <dhowells@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Lukas Schauer <lukas@schauer.dev> [Christian Brauner <brauner@kernel.org>: rewrite to account for watch queues] Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
055ca835 |
| 24-Nov-2023 |
Jann Horn <jannh@google.com> |
fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
When you try to splice between a normal pipe and a notification pipe, get_pipe_info(..., true) fails, so splice() falls back to treatin
fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
When you try to splice between a normal pipe and a notification pipe, get_pipe_info(..., true) fails, so splice() falls back to treating the notification pipe like a normal pipe - so we end up in iter_file_splice_write(), which first locks the input pipe, then calls vfs_iter_write(), which locks the output pipe.
Lockdep complains about that, because we're taking a pipe lock while already holding another pipe lock.
I think this probably (?) can't actually lead to deadlocks, since you'd need another way to nest locking a normal pipe into locking a watch_queue pipe, but the lockdep annotations don't make that clear.
Bail out earlier in pipe_write() for notification pipes, before taking the pipe lock.
Reported-and-tested-by: <syzbot+011e4ea1da6692cf881c@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c Fixes: c73be61cede5 ("pipe: Add general notification queue support") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20231124150822.2121798-1-jannh@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
478dbf12 |
| 21-Sep-2023 |
Max Kellermann <max.kellermann@ionos.com> |
fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
If there is no watch_queue, holding the pipe mutex is enough to prevent concurrent writes, and we can avoid the spinlock.
O_NOTIF
fs/pipe: use spinlock in pipe_read() only if there is a watch_queue
If there is no watch_queue, holding the pipe mutex is enough to prevent concurrent writes, and we can avoid the spinlock.
O_NOTIFICATION_QUEUE is an exotic and rarely used feature, and of all the pipes that exist at any given time, only very few actually have a watch_queue, therefore it appears worthwile to optimize the common case.
This patch does not optimize pipe_resize_ring() where the spinlocks could be avoided as well; that does not seem like a worthwile optimization because this function is not called often.
Related commits:
- commit 8df441294dd3 ("pipe: Check for ring full inside of the spinlock in pipe_write()") - commit b667b8673443 ("pipe: Advance tail pointer inside of wait spinlock in pipe_read()") - commit 189b0ddc2451 ("pipe: Fix missing lock in pipe_resize_ring()")
Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Message-Id: <20230921075755.1378787-4-max.kellermann@ionos.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
dfaabf91 |
| 21-Sep-2023 |
Max Kellermann <max.kellermann@ionos.com> |
fs/pipe: remove unnecessary spinlock from pipe_write()
This reverts commit 8df441294dd3 ("pipe: Check for ring full inside of the spinlock in pipe_write()") which was obsoleted by commit c73be61cede
fs/pipe: remove unnecessary spinlock from pipe_write()
This reverts commit 8df441294dd3 ("pipe: Check for ring full inside of the spinlock in pipe_write()") which was obsoleted by commit c73be61cede ("pipe: Add general notification queue support") because now pipe_write() fails early with -EXDEV if there is a watch_queue.
Without a watch_queue, no notifications can be posted to the pipe and mutex protection is enough, as can be seen in splice_pipe_to_pipe() which does not use the spinlock either.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Message-Id: <20230921075755.1378787-3-max.kellermann@ionos.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
b4bd6b4b |
| 21-Sep-2023 |
Max Kellermann <max.kellermann@ionos.com> |
fs/pipe: move check to pipe_has_watch_queue()
This declutters the code by reducing the number of #ifdefs and makes the watch_queue checks simpler. This has no runtime effect; the machine code is id
fs/pipe: move check to pipe_has_watch_queue()
This declutters the code by reducing the number of #ifdefs and makes the watch_queue checks simpler. This has no runtime effect; the machine code is identical.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Message-Id: <20230921075755.1378787-2-max.kellermann@ionos.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
68279f9c |
| 11-Oct-2023 |
Alexey Dobriyan <adobriyan@gmail.com> |
treewide: mark stuff as __ro_after_init
__read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1.
Also, mark some stuff as "
treewide: mark stuff as __ro_after_init
__read_mostly predates __ro_after_init. Many variables which are marked __read_mostly should have been __ro_after_init from day 1.
Also, mark some stuff as "const" and "__init" while I'm at it.
[akpm@linux-foundation.org: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning] [akpm@linux-foundation.org: coding-style cleanups] Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
16a94965 |
| 04-Oct-2023 |
Jeff Layton <jlayton@kernel.org> |
fs: convert core infrastructure to new timestamp accessors
Convert the core vfs code to use the new timestamp accessor functions.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.
fs: convert core infrastructure to new timestamp accessors
Convert the core vfs code to use the new timestamp accessor functions.
Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20231004185239.80830-2-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
ae81711c |
| 19-Sep-2023 |
Max Kellermann <max.kellermann@ionos.com> |
fs/pipe: remove duplicate "offset" initializer
This code duplication was introduced by commit a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot"), but since the pipe's mute
fs/pipe: remove duplicate "offset" initializer
This code duplication was introduced by commit a194dfe6e6f6 ("pipe: Rearrange sequence in pipe_write() to preallocate slot"), but since the pipe's mutex is locked, nobody else can modify the value meanwhile.
Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Message-Id: <20230919074045.1066796-1-max.kellermann@ionos.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
fbaa530e |
| 18-Aug-2023 |
Colin Ian King <colin.i.king@gmail.com> |
fs/pipe: remove redundant initialization of pointer buf
The pointer buf is being initializated with a value that is never read, it is being re-assigned later on at the pointer where it is being used
fs/pipe: remove redundant initialization of pointer buf
The pointer buf is being initializated with a value that is never read, it is being re-assigned later on at the pointer where it is being used. The initialization is redundant and can be removed. Cleans up clang scan build warning:
fs/pipe.c:492:24: warning: Value stored to 'buf' during its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Message-Id: <20230818144556.1208082-1-colin.i.king@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
2276e5ba |
| 05-Jul-2023 |
Jeff Layton <jlayton@kernel.org> |
fs: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
Re
fs: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime.
Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230705190309.579783-23-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
515c5046 |
| 01-Feb-2023 |
Luca Vizzarro <Luca.Vizzarro@arm.com> |
pipe: Pass argument of pipe_fcntl as int
The interface for fcntl expects the argument passed for the command F_SETPIPE_SZ to be of type int. The current code wrongly treats it as a long. In order to
pipe: Pass argument of pipe_fcntl as int
The interface for fcntl expects the argument passed for the command F_SETPIPE_SZ to be of type int. The current code wrongly treats it as a long. In order to avoid access to undefined bits, we should explicitly cast the argument to int.
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Kevin Brodsky <Kevin.Brodsky@arm.com> Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com> Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: David Laight <David.Laight@ACULAB.com> Cc: Mark Rutland <Mark.Rutland@arm.com> Cc: linux-fsdevel@vger.kernel.org Cc: linux-morello@op-lists.linaro.org Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com> Message-Id: <20230414152459.816046-4-Luca.Vizzarro@arm.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
c04fe8e3 |
| 09-May-2023 |
Jens Axboe <axboe@kernel.dk> |
pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
Pipe reads or writes need to enable nonblocking attempts, if either O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being passed in.
pipe: check for IOCB_NOWAIT alongside O_NONBLOCK
Pipe reads or writes need to enable nonblocking attempts, if either O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being passed in. The latter isn't currently true, ensure we check for both before waiting on data or space.
Fixes: afed6271f5b0 ("pipe: set FMODE_NOWAIT on pipes") Signed-off-by: Jens Axboe <axboe@kernel.dk> Message-Id: <e5946d67-4e5e-b056-ba80-656bab12d9f6@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
afed6271 |
| 08-Mar-2023 |
Jens Axboe <axboe@kernel.dk> |
pipe: set FMODE_NOWAIT on pipes
Pipes themselves do not hold the the pipe lock across IO, and hence are safe for RWF_NOWAIT/IOCB_NOWAIT usage. The "contract" for NOWAIT is really "should not do IO u
pipe: set FMODE_NOWAIT on pipes
Pipes themselves do not hold the the pipe lock across IO, and hence are safe for RWF_NOWAIT/IOCB_NOWAIT usage. The "contract" for NOWAIT is really "should not do IO under this lock", not strictly that we cannot block or that the below code is in any way atomic. Pipes fulfil that criteria.
Acked-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0f60d288 |
| 30-Jan-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
dynamic_dname(): drop unused dentry argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
189b0ddc |
| 26-May-2022 |
David Howells <dhowells@redhat.com> |
pipe: Fix missing lock in pipe_resize_ring()
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to prevent post_one_notification() from trying to insert into the ring whilst the ring i
pipe: Fix missing lock in pipe_resize_ring()
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to prevent post_one_notification() from trying to insert into the ring whilst the ring is being replaced.
The occupancy check must be done after the lock is taken, and the lock must be taken after the new ring is allocated.
The bug can lead to an oops looking something like:
BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840 Read of size 4 at addr ffff88801cc72a70 by task poc/27196 ... Call Trace: post_one_notification.isra.0+0x62e/0x840 __post_watch_notification+0x3b7/0x650 key_create_or_update+0xb8b/0xd20 __do_sys_add_key+0x175/0x340 __x64_sys_add_key+0xbe/0x140 do_syscall_64+0x5c/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae
Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero Day Initiative.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
f485922d |
| 29-Apr-2022 |
Kuniyuki Iwashima <kuniyu@amazon.co.jp> |
pipe: make poll_usage boolean and annotate its access
Patch series "Fix data-races around epoll reported by KCSAN."
This series suppresses a false positive KCSAN's message and fixes a real data-rac
pipe: make poll_usage boolean and annotate its access
Patch series "Fix data-races around epoll reported by KCSAN."
This series suppresses a false positive KCSAN's message and fixes a real data-race.
This patch (of 2):
pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage is set to 1, it never changes in other places. However, concurrent writes of a value trigger KCSAN, so let's make KCSAN happy.
BUG: KCSAN: data-race in pipe_poll / pipe_poll
write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)
Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6 Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014
Link: https://lkml.kernel.org/r/20220322002653.33865-1-kuniyu@amazon.co.jp Link: https://lkml.kernel.org/r/20220322002653.33865-2-kuniyu@amazon.co.jp Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Kuniyuki Iwashima <kuni1840@gmail.com> Cc: "Soheil Hassas Yeganeh" <soheil@google.com> Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
906f9040 |
| 20-Apr-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array"
This reverts commit 5a519c8fe4d620912385f94372fc8472fa98c662.
It turns out that making the pipe almost arbitrarily large has some rath
Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array"
This reverts commit 5a519c8fe4d620912385f94372fc8472fa98c662.
It turns out that making the pipe almost arbitrarily large has some rather unexpected downsides. The kernel test robot reports a kernel warning that is due to pipe->max_usage now growing to the point where the iter_file_splice_write() buffer allocation can no longer be satisfied as a slab allocation, and the
int nbufs = pipe->max_usage; struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL);
code sequence there will now always fail as a result.
That code could be modified to use kvcalloc() too, but I feel very uncomfortable making those kinds of changes for a very niche use case that really should have other options than make these kinds of fundamental changes to pipe behavior.
Maybe the CRIU process dumping should be multi-threaded, and use multiple pipes and multiple cores, rather than try to use one larger pipe to minimize splice() calls.
Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/ Cc: Andrei Vagin <avagin@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
aeb213cd |
| 23-Mar-2022 |
Andrei Vagin <avagin@gmail.com> |
fs/pipe.c: local vars have to match types of proper pipe_inode_info fields
head, tail, ring_size are declared as unsigned int, so all local variables that operate with these fields have to be unsign
fs/pipe.c: local vars have to match types of proper pipe_inode_info fields
head, tail, ring_size are declared as unsigned int, so all local variables that operate with these fields have to be unsigned to avoid signed integer overflow.
Right now, it isn't an issue because the maximum pipe size is limited by 1U<<31.
Link: https://lkml.kernel.org/r/20220106171946.36128-1-avagin@gmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
5a519c8f |
| 23-Mar-2022 |
Andrei Vagin <avagin@gmail.com> |
fs/pipe: use kvcalloc to allocate a pipe_buffer array
Right now, kcalloc is used to allocate a pipe_buffer array. The size of the pipe_buffer struct is 40 bytes. kcalloc allows allocating reliably
fs/pipe: use kvcalloc to allocate a pipe_buffer array
Right now, kcalloc is used to allocate a pipe_buffer array. The size of the pipe_buffer struct is 40 bytes. kcalloc allows allocating reliably chunks with sizes less or equal to PAGE_ALLOC_COSTLY_ORDER (3). It means that the maximum pipe size is 3.2MB in this case.
In CRIU, we use pipes to dump processes memory. CRIU freezes a target process, injects a parasite code into it and then this code splices memory into pipes. If a maximum pipe size is small, we need to do many iterations or create many pipes.
kvcalloc attempt to allocate physically contiguous memory, but upon failure, fall back to non-contiguous (vmalloc) allocation and so it isn't limited by PAGE_ALLOC_COSTLY_ORDER.
The maximum pipe size for non-root users is limited by the /proc/sys/fs/pipe-max-size sysctl that is 1MB by default, so only the root user will be able to trigger vmalloc allocations.
Link: https://lkml.kernel.org/r/20220104171058.22580-1-avagin@gmail.com Signed-off-by: Andrei Vagin <avagin@gmail.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
2ed147f0 |
| 11-Mar-2022 |
David Howells <dhowells@redhat.com> |
watch_queue: Fix lack of barrier/sync/lock between post and read
There's nothing to synchronise post_one_notification() versus pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the read
watch_queue: Fix lack of barrier/sync/lock between post and read
There's nothing to synchronise post_one_notification() versus pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the reader only takes pipe->mutex which cannot bar notification posting as that may need to be made from contexts that cannot sleep.
Fix this by setting pipe->head with a barrier in post_one_notification() and reading pipe->head with a barrier in pipe_read().
If that's not sufficient, the rd_wait.lock will need to be taken, possibly in a ->confirm() op so that it only applies to notifications. The lock would, however, have to be dropped before copy_page_to_iter() is invoked.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
db8facfc |
| 11-Mar-2022 |
David Howells <dhowells@redhat.com> |
watch_queue, pipe: Free watchqueue state after clearing pipe ring
In free_pipe_info(), free the watchqueue state after clearing the pipe ring as each pipe ring descriptor has a release function, and
watch_queue, pipe: Free watchqueue state after clearing pipe ring
In free_pipe_info(), free the watchqueue state after clearing the pipe ring as each pipe ring descriptor has a release function, and in the case of a notification message, this is watch_queue_pipe_buf_release() which tries to mark the allocation bitmap that was previously released.
Fix this by moving the put of the pipe's ref on the watch queue to after the ring has been cleared. We still need to call watch_queue_clear() before doing that to make sure that the pipe is disconnected from any notification sources first.
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
1998f193 |
| 22-Jan-2022 |
Luis Chamberlain <mcgrof@kernel.org> |
fs: move pipe sysctls to is own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain.
To help with this maintenance let's start
fs: move pipe sysctls to is own file
kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain.
To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic.
So move the pipe sysctls to its own file.
Link: https://lkml.kernel.org/r/20211129205548.605569-10-mcgrof@kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Antti Palosaari <crope@iki.fi> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Jeff Layton <jlayton@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Lukas Middendorf <kernel@tuxforce.de> Cc: Stephen Kitt <steve@sk2.org> Cc: Xiaoming Ni <nixiaoming@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
cd1adf1b |
| 07-Sep-2021 |
Linus Torvalds <torvalds@linux-foundation.org> |
Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"
This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e.
That commit was completely broken, and I should have caug
Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"
This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e.
That commit was completely broken, and I should have caught on to it earlier. But happily, the kernel test robot noticed the breakage fairly quickly.
The breakage is because "try_get_page()" is about avoiding the page reference count overflow case, but is otherwise the exact same as a plain "get_page()".
In contrast, "try_get_compound_head()" is an entirely different beast, and uses __page_cache_add_speculative() because it's not just about the page reference count, but also about possibly racing with the underlying page going away.
So all the commentary about how
"try_get_page() has fallen a little behind in terms of maintenance, try_get_compound_head() handles speculative page references more thoroughly"
was just completely wrong: yes, try_get_compound_head() handles speculative page references, but the point is that try_get_page() does not, and must not.
So there's no lack of maintainance - there are fundamentally different semantics.
A speculative page reference would be entirely wrong in "get_page()", and it's entirely wrong in "try_get_page()". It's not about speculation, it's purely about "uhhuh, you can't get this page because you've tried to increment the reference count too much already".
The reason the kernel test robot noticed this bug was that it hit the VM_BUG_ON() in __page_cache_add_speculative(), which is all about verifying that the context of any speculative page access is correct. But since that isn't what try_get_page() is all about, the VM_BUG_ON() tests things that are not correct to test for try_get_page().
Reported-by: kernel test robot <oliver.sang@intel.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|