History log of /linux/fs/pipe.c (Results 1 – 25 of 237)
Revision Date Author Comments
# 85f273a6 27-Jan-2024 Kent Overstreet <kent.overstreet@linux.dev>

fs/pipe: Convert to lockdep_cmp_fn

*_lock_nested() is fundamentally broken; lockdep needs to check lock
ordering, but we cannot device a total ordering on an unbounded number
of elements with only a

fs/pipe: Convert to lockdep_cmp_fn

*_lock_nested() is fundamentally broken; lockdep needs to check lock
ordering, but we cannot device a total ordering on an unbounded number
of elements with only a few subclasses.

the replacement is to define lock ordering with a proper comparison
function.

fs/pipe.c was already doing everything correctly otherwise, nothing
much changes here.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>
Link: https://lore.kernel.org/r/20240127020111.487218-2-kent.overstreet@linux.dev
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 9d5b9475 21-Nov-2023 Joel Granados <j.granados@samsung.com>

fs: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels

fs: Remove the now superfluous sentinel elements from ctl_table array

This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)

Remove sentinel elements ctl_table struct. Special attention was placed in
making sure that an empty directory for fs/verity was created when
CONFIG_FS_VERITY_BUILTIN_SIGNATURES is not defined. In this case we use the
register sysctl call that expects a size.

Signed-off-by: Joel Granados <j.granados@samsung.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

show more ...


# e95aada4 01-Dec-2023 Lukas Schauer <lukas@schauer.dev>

pipe: wakeup wr_wait after setting max_usage

Commit c73be61cede5 ("pipe: Add general notification queue support") a
regression was introduced that would lock up resized pipes under certain
condition

pipe: wakeup wr_wait after setting max_usage

Commit c73be61cede5 ("pipe: Add general notification queue support") a
regression was introduced that would lock up resized pipes under certain
conditions. See the reproducer in [1].

The commit resizing the pipe ring size was moved to a different
function, doing that moved the wakeup for pipe->wr_wait before actually
raising pipe->max_usage. If a pipe was full before the resize occured it
would result in the wakeup never actually triggering pipe_write.

Set @max_usage and @nr_accounted before waking writers if this isn't a
watch queue.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=212295 [1]
Link: https://lore.kernel.org/r/20231201-orchideen-modewelt-e009de4562c6@brauner
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reviewed-by: David Howells <dhowells@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Lukas Schauer <lukas@schauer.dev>
[Christian Brauner <brauner@kernel.org>: rewrite to account for watch queues]
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 055ca835 24-Nov-2023 Jann Horn <jannh@google.com>

fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()

When you try to splice between a normal pipe and a notification pipe,
get_pipe_info(..., true) fails, so splice() falls back to treatin

fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()

When you try to splice between a normal pipe and a notification pipe,
get_pipe_info(..., true) fails, so splice() falls back to treating the
notification pipe like a normal pipe - so we end up in
iter_file_splice_write(), which first locks the input pipe, then calls
vfs_iter_write(), which locks the output pipe.

Lockdep complains about that, because we're taking a pipe lock while
already holding another pipe lock.

I think this probably (?) can't actually lead to deadlocks, since you'd
need another way to nest locking a normal pipe into locking a
watch_queue pipe, but the lockdep annotations don't make that clear.

Bail out earlier in pipe_write() for notification pipes, before taking
the pipe lock.

Reported-and-tested-by: <syzbot+011e4ea1da6692cf881c@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=011e4ea1da6692cf881c
Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Signed-off-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/r/20231124150822.2121798-1-jannh@google.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 478dbf12 21-Sep-2023 Max Kellermann <max.kellermann@ionos.com>

fs/pipe: use spinlock in pipe_read() only if there is a watch_queue

If there is no watch_queue, holding the pipe mutex is enough to
prevent concurrent writes, and we can avoid the spinlock.

O_NOTIF

fs/pipe: use spinlock in pipe_read() only if there is a watch_queue

If there is no watch_queue, holding the pipe mutex is enough to
prevent concurrent writes, and we can avoid the spinlock.

O_NOTIFICATION_QUEUE is an exotic and rarely used feature, and of all
the pipes that exist at any given time, only very few actually have a
watch_queue, therefore it appears worthwile to optimize the common
case.

This patch does not optimize pipe_resize_ring() where the spinlocks
could be avoided as well; that does not seem like a worthwile
optimization because this function is not called often.

Related commits:

- commit 8df441294dd3 ("pipe: Check for ring full inside of the
spinlock in pipe_write()")
- commit b667b8673443 ("pipe: Advance tail pointer inside of wait
spinlock in pipe_read()")
- commit 189b0ddc2451 ("pipe: Fix missing lock in pipe_resize_ring()")

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-4-max.kellermann@ionos.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# dfaabf91 21-Sep-2023 Max Kellermann <max.kellermann@ionos.com>

fs/pipe: remove unnecessary spinlock from pipe_write()

This reverts commit 8df441294dd3 ("pipe: Check for ring full inside of
the spinlock in pipe_write()") which was obsoleted by commit
c73be61cede

fs/pipe: remove unnecessary spinlock from pipe_write()

This reverts commit 8df441294dd3 ("pipe: Check for ring full inside of
the spinlock in pipe_write()") which was obsoleted by commit
c73be61cede ("pipe: Add general notification queue support") because
now pipe_write() fails early with -EXDEV if there is a watch_queue.

Without a watch_queue, no notifications can be posted to the pipe and
mutex protection is enough, as can be seen in splice_pipe_to_pipe()
which does not use the spinlock either.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-3-max.kellermann@ionos.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# b4bd6b4b 21-Sep-2023 Max Kellermann <max.kellermann@ionos.com>

fs/pipe: move check to pipe_has_watch_queue()

This declutters the code by reducing the number of #ifdefs and makes
the watch_queue checks simpler. This has no runtime effect; the
machine code is id

fs/pipe: move check to pipe_has_watch_queue()

This declutters the code by reducing the number of #ifdefs and makes
the watch_queue checks simpler. This has no runtime effect; the
machine code is identical.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230921075755.1378787-2-max.kellermann@ionos.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 68279f9c 11-Oct-2023 Alexey Dobriyan <adobriyan@gmail.com>

treewide: mark stuff as __ro_after_init

__read_mostly predates __ro_after_init. Many variables which are marked
__read_mostly should have been __ro_after_init from day 1.

Also, mark some stuff as "

treewide: mark stuff as __ro_after_init

__read_mostly predates __ro_after_init. Many variables which are marked
__read_mostly should have been __ro_after_init from day 1.

Also, mark some stuff as "const" and "__init" while I'm at it.

[akpm@linux-foundation.org: revert sysctl_nr_open_min, sysctl_nr_open_max changes due to arm warning]
[akpm@linux-foundation.org: coding-style cleanups]
Link: https://lkml.kernel.org/r/4f6bb9c0-abba-4ee4-a7aa-89265e886817@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# 16a94965 04-Oct-2023 Jeff Layton <jlayton@kernel.org>

fs: convert core infrastructure to new timestamp accessors

Convert the core vfs code to use the new timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.

fs: convert core infrastructure to new timestamp accessors

Convert the core vfs code to use the new timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185239.80830-2-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# ae81711c 19-Sep-2023 Max Kellermann <max.kellermann@ionos.com>

fs/pipe: remove duplicate "offset" initializer

This code duplication was introduced by commit a194dfe6e6f6 ("pipe:
Rearrange sequence in pipe_write() to preallocate slot"), but since
the pipe's mute

fs/pipe: remove duplicate "offset" initializer

This code duplication was introduced by commit a194dfe6e6f6 ("pipe:
Rearrange sequence in pipe_write() to preallocate slot"), but since
the pipe's mutex is locked, nobody else can modify the value
meanwhile.

Signed-off-by: Max Kellermann <max.kellermann@ionos.com>
Message-Id: <20230919074045.1066796-1-max.kellermann@ionos.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# fbaa530e 18-Aug-2023 Colin Ian King <colin.i.king@gmail.com>

fs/pipe: remove redundant initialization of pointer buf

The pointer buf is being initializated with a value that is never read,
it is being re-assigned later on at the pointer where it is being used

fs/pipe: remove redundant initialization of pointer buf

The pointer buf is being initializated with a value that is never read,
it is being re-assigned later on at the pointer where it is being used.
The initialization is redundant and can be removed. Cleans up clang scan
build warning:

fs/pipe.c:492:24: warning: Value stored to 'buf' during its
initialization is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-Id: <20230818144556.1208082-1-colin.i.king@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 2276e5ba 05-Jul-2023 Jeff Layton <jlayton@kernel.org>

fs: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Re

fs: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Message-Id: <20230705190309.579783-23-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 515c5046 01-Feb-2023 Luca Vizzarro <Luca.Vizzarro@arm.com>

pipe: Pass argument of pipe_fcntl as int

The interface for fcntl expects the argument passed for the command
F_SETPIPE_SZ to be of type int. The current code wrongly treats it as
a long. In order to

pipe: Pass argument of pipe_fcntl as int

The interface for fcntl expects the argument passed for the command
F_SETPIPE_SZ to be of type int. The current code wrongly treats it as
a long. In order to avoid access to undefined bits, we should explicitly
cast the argument to int.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Kevin Brodsky <Kevin.Brodsky@arm.com>
Cc: Vincenzo Frascino <Vincenzo.Frascino@arm.com>
Cc: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: David Laight <David.Laight@ACULAB.com>
Cc: Mark Rutland <Mark.Rutland@arm.com>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-morello@op-lists.linaro.org
Signed-off-by: Luca Vizzarro <Luca.Vizzarro@arm.com>
Message-Id: <20230414152459.816046-4-Luca.Vizzarro@arm.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# c04fe8e3 09-May-2023 Jens Axboe <axboe@kernel.dk>

pipe: check for IOCB_NOWAIT alongside O_NONBLOCK

Pipe reads or writes need to enable nonblocking attempts, if either
O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being
passed in.

pipe: check for IOCB_NOWAIT alongside O_NONBLOCK

Pipe reads or writes need to enable nonblocking attempts, if either
O_NONBLOCK is set on the file, or IOCB_NOWAIT is set in the iocb being
passed in. The latter isn't currently true, ensure we check for both
before waiting on data or space.

Fixes: afed6271f5b0 ("pipe: set FMODE_NOWAIT on pipes")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Message-Id: <e5946d67-4e5e-b056-ba80-656bab12d9f6@kernel.dk>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# afed6271 08-Mar-2023 Jens Axboe <axboe@kernel.dk>

pipe: set FMODE_NOWAIT on pipes

Pipes themselves do not hold the the pipe lock across IO, and hence are
safe for RWF_NOWAIT/IOCB_NOWAIT usage. The "contract" for NOWAIT is
really "should not do IO u

pipe: set FMODE_NOWAIT on pipes

Pipes themselves do not hold the the pipe lock across IO, and hence are
safe for RWF_NOWAIT/IOCB_NOWAIT usage. The "contract" for NOWAIT is
really "should not do IO under this lock", not strictly that we cannot
block or that the below code is in any way atomic. Pipes fulfil that
criteria.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 0f60d288 30-Jan-2022 Al Viro <viro@zeniv.linux.org.uk>

dynamic_dname(): drop unused dentry argument

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# 189b0ddc 26-May-2022 David Howells <dhowells@redhat.com>

pipe: Fix missing lock in pipe_resize_ring()

pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to
prevent post_one_notification() from trying to insert into the ring
whilst the ring i

pipe: Fix missing lock in pipe_resize_ring()

pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to
prevent post_one_notification() from trying to insert into the ring
whilst the ring is being replaced.

The occupancy check must be done after the lock is taken, and the lock
must be taken after the new ring is allocated.

The bug can lead to an oops looking something like:

BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840
Read of size 4 at addr ffff88801cc72a70 by task poc/27196
...
Call Trace:
post_one_notification.isra.0+0x62e/0x840
__post_watch_notification+0x3b7/0x650
key_create_or_update+0xb8b/0xd20
__do_sys_add_key+0x175/0x340
__x64_sys_add_key+0xbe/0x140
do_syscall_64+0x5c/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero
Day Initiative.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# f485922d 29-Apr-2022 Kuniyuki Iwashima <kuniyu@amazon.co.jp>

pipe: make poll_usage boolean and annotate its access

Patch series "Fix data-races around epoll reported by KCSAN."

This series suppresses a false positive KCSAN's message and fixes a real
data-rac

pipe: make poll_usage boolean and annotate its access

Patch series "Fix data-races around epoll reported by KCSAN."

This series suppresses a false positive KCSAN's message and fixes a real
data-race.


This patch (of 2):

pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage
is set to 1, it never changes in other places. However, concurrent writes
of a value trigger KCSAN, so let's make KCSAN happy.

BUG: KCSAN: data-race in pipe_poll / pipe_poll

write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3:
pipe_poll (fs/pipe.c:656)
ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853)
do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234)
__x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)

write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1:
pipe_poll (fs/pipe.c:656)
ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853)
do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234)
__x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241)
do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113)

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6
Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014

Link: https://lkml.kernel.org/r/20220322002653.33865-1-kuniyu@amazon.co.jp
Link: https://lkml.kernel.org/r/20220322002653.33865-2-kuniyu@amazon.co.jp
Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kuniyuki Iwashima <kuni1840@gmail.com>
Cc: "Soheil Hassas Yeganeh" <soheil@google.com>
Cc: "Sridhar Samudrala" <sridhar.samudrala@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# 906f9040 20-Apr-2022 Linus Torvalds <torvalds@linux-foundation.org>

Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array"

This reverts commit 5a519c8fe4d620912385f94372fc8472fa98c662.

It turns out that making the pipe almost arbitrarily large has some
rath

Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array"

This reverts commit 5a519c8fe4d620912385f94372fc8472fa98c662.

It turns out that making the pipe almost arbitrarily large has some
rather unexpected downsides. The kernel test robot reports a kernel
warning that is due to pipe->max_usage now growing to the point where
the iter_file_splice_write() buffer allocation can no longer be
satisfied as a slab allocation, and the

int nbufs = pipe->max_usage;
struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec),
GFP_KERNEL);

code sequence there will now always fail as a result.

That code could be modified to use kvcalloc() too, but I feel very
uncomfortable making those kinds of changes for a very niche use case
that really should have other options than make these kinds of
fundamental changes to pipe behavior.

Maybe the CRIU process dumping should be multi-threaded, and use
multiple pipes and multiple cores, rather than try to use one larger
pipe to minimize splice() calls.

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# aeb213cd 23-Mar-2022 Andrei Vagin <avagin@gmail.com>

fs/pipe.c: local vars have to match types of proper pipe_inode_info fields

head, tail, ring_size are declared as unsigned int, so all local
variables that operate with these fields have to be unsign

fs/pipe.c: local vars have to match types of proper pipe_inode_info fields

head, tail, ring_size are declared as unsigned int, so all local
variables that operate with these fields have to be unsigned to avoid
signed integer overflow.

Right now, it isn't an issue because the maximum pipe size is limited by
1U<<31.

Link: https://lkml.kernel.org/r/20220106171946.36128-1-avagin@gmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Suggested-by: Dmitry Safonov <0x7f454c46@gmail.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 5a519c8f 23-Mar-2022 Andrei Vagin <avagin@gmail.com>

fs/pipe: use kvcalloc to allocate a pipe_buffer array

Right now, kcalloc is used to allocate a pipe_buffer array. The size of
the pipe_buffer struct is 40 bytes. kcalloc allows allocating reliably

fs/pipe: use kvcalloc to allocate a pipe_buffer array

Right now, kcalloc is used to allocate a pipe_buffer array. The size of
the pipe_buffer struct is 40 bytes. kcalloc allows allocating reliably
chunks with sizes less or equal to PAGE_ALLOC_COSTLY_ORDER (3). It
means that the maximum pipe size is 3.2MB in this case.

In CRIU, we use pipes to dump processes memory. CRIU freezes a target
process, injects a parasite code into it and then this code splices
memory into pipes. If a maximum pipe size is small, we need to do many
iterations or create many pipes.

kvcalloc attempt to allocate physically contiguous memory, but upon
failure, fall back to non-contiguous (vmalloc) allocation and so it
isn't limited by PAGE_ALLOC_COSTLY_ORDER.

The maximum pipe size for non-root users is limited by the
/proc/sys/fs/pipe-max-size sysctl that is 1MB by default, so only the
root user will be able to trigger vmalloc allocations.

Link: https://lkml.kernel.org/r/20220104171058.22580-1-avagin@gmail.com
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 2ed147f0 11-Mar-2022 David Howells <dhowells@redhat.com>

watch_queue: Fix lack of barrier/sync/lock between post and read

There's nothing to synchronise post_one_notification() versus
pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the
read

watch_queue: Fix lack of barrier/sync/lock between post and read

There's nothing to synchronise post_one_notification() versus
pipe_read(). Whilst posting is done under pipe->rd_wait.lock, the
reader only takes pipe->mutex which cannot bar notification posting as
that may need to be made from contexts that cannot sleep.

Fix this by setting pipe->head with a barrier in post_one_notification()
and reading pipe->head with a barrier in pipe_read().

If that's not sufficient, the rd_wait.lock will need to be taken,
possibly in a ->confirm() op so that it only applies to notifications.
The lock would, however, have to be dropped before copy_page_to_iter()
is invoked.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# db8facfc 11-Mar-2022 David Howells <dhowells@redhat.com>

watch_queue, pipe: Free watchqueue state after clearing pipe ring

In free_pipe_info(), free the watchqueue state after clearing the pipe
ring as each pipe ring descriptor has a release function, and

watch_queue, pipe: Free watchqueue state after clearing pipe ring

In free_pipe_info(), free the watchqueue state after clearing the pipe
ring as each pipe ring descriptor has a release function, and in the
case of a notification message, this is watch_queue_pipe_buf_release()
which tries to mark the allocation bitmap that was previously released.

Fix this by moving the put of the pipe's ref on the watch queue to after
the ring has been cleared. We still need to call watch_queue_clear()
before doing that to make sure that the pipe is disconnected from any
notification sources first.

Fixes: c73be61cede5 ("pipe: Add general notification queue support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 1998f193 22-Jan-2022 Luis Chamberlain <mcgrof@kernel.org>

fs: move pipe sysctls to is own file

kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start

fs: move pipe sysctls to is own file

kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
dishes, this makes it very difficult to maintain.

To help with this maintenance let's start by moving sysctls to places
where they actually belong. The proc sysctl maintainers do not want to
know what sysctl knobs you wish to add for your own piece of code, we
just care about the core logic.

So move the pipe sysctls to its own file.

Link: https://lkml.kernel.org/r/20211129205548.605569-10-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Antti Palosaari <crope@iki.fi>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Iurii Zaikin <yzaikin@google.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Lukas Middendorf <kernel@tuxforce.de>
Cc: Stephen Kitt <steve@sk2.org>
Cc: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# cd1adf1b 07-Sep-2021 Linus Torvalds <torvalds@linux-foundation.org>

Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"

This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e.

That commit was completely broken, and I should have caug

Revert "mm/gup: remove try_get_page(), call try_get_compound_head() directly"

This reverts commit 9857a17f206ff374aea78bccfb687f145368be2e.

That commit was completely broken, and I should have caught on to it
earlier. But happily, the kernel test robot noticed the breakage fairly
quickly.

The breakage is because "try_get_page()" is about avoiding the page
reference count overflow case, but is otherwise the exact same as a
plain "get_page()".

In contrast, "try_get_compound_head()" is an entirely different beast,
and uses __page_cache_add_speculative() because it's not just about the
page reference count, but also about possibly racing with the underlying
page going away.

So all the commentary about how

"try_get_page() has fallen a little behind in terms of maintenance,
try_get_compound_head() handles speculative page references more
thoroughly"

was just completely wrong: yes, try_get_compound_head() handles
speculative page references, but the point is that try_get_page() does
not, and must not.

So there's no lack of maintainance - there are fundamentally different
semantics.

A speculative page reference would be entirely wrong in "get_page()",
and it's entirely wrong in "try_get_page()". It's not about
speculation, it's purely about "uhhuh, you can't get this page because
you've tried to increment the reference count too much already".

The reason the kernel test robot noticed this bug was that it hit the
VM_BUG_ON() in __page_cache_add_speculative(), which is all about
verifying that the context of any speculative page access is correct.
But since that isn't what try_get_page() is all about, the VM_BUG_ON()
tests things that are not correct to test for try_get_page().

Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


12345678910