#
215ab0d8 |
| 03-Aug-2024 |
Joel Savitz <jsavitz@redhat.com> |
file: remove outdated comment after close_fd()
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org
file: remove outdated comment after close_fd()
Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: linux-fsdevel@vger.kernel.org
The comment on EXPORT_SYMBOL(close_fd) was added in commit 2ca2a09d6215 ("fs: add ksys_close() wrapper; remove in-kernel calls to sys_close()"), before commit 8760c909f54a ("file: Rename __close_fd to close_fd and remove the files parameter") gave the function its current name, however commit 1572bfdf21d4 ("file: Replace ksys_close with close_fd") removes the referenced caller entirely, obsoleting this comment.
Signed-off-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/r/20240803025455.239276-1-jsavitz@redhat.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
de12c339 |
| 31-May-2024 |
Al Viro <viro@zeniv.linux.org.uk> |
add struct fd constructors, get rid of __to_fd()
Make __fdget() et.al. return struct fd directly. New helpers: BORROWED_FD(file) and CLONED_FD(file), for borrowed and cloned file references resp.
add struct fd constructors, get rid of __to_fd()
Make __fdget() et.al. return struct fd directly. New helpers: BORROWED_FD(file) and CLONED_FD(file), for borrowed and cloned file references resp.
NOTE: this might need tuning; in particular, inline on __fget_light() is there to keep the code generation same as before - we probably want to keep it inlined in fdget() et.al. (especially so in fdget_pos()), but that needs profiling.
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
9a2fa147 |
| 03-Aug-2024 |
Al Viro <viro@zeniv.linux.org.uk> |
fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest wi
fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied.
For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to.
The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe.
Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open.
The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first.
* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset().
Reproducer added to tools/testing/selftests/core/close_range_test.c
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
8aa37bde |
| 01-Aug-2024 |
Al Viro <viro@zeniv.linux.org.uk> |
protect the fetch of ->fd[fd] in do_dup2() from mispredictions
both callers have verified that fd is not greater than ->max_fds; however, misprediction might end up with tofree = fdt->fd[fd]
protect the fetch of ->fd[fd] in do_dup2() from mispredictions
both callers have verified that fd is not greater than ->max_fds; however, misprediction might end up with tofree = fdt->fd[fd]; being speculatively executed. That's wrong for the same reasons why it's wrong in close_fd()/file_close_fd_locked(); the same solution applies - array_index_nospec(fd, fdt->max_fds) could differ from fd only in case of speculative execution on mispredicted path.
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
ed8c7fbd |
| 29-May-2024 |
Yuntao Wang <yuntao.wang@linux.dev> |
fs/file: fix the check in find_next_fd()
The maximum possible return value of find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) is maxbit. This return value, multiplied by BITS_PER_LONG, gives
fs/file: fix the check in find_next_fd()
The maximum possible return value of find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) is maxbit. This return value, multiplied by BITS_PER_LONG, gives the value of bitbit, which can never be greater than maxfd, it can only be equal to maxfd at most, so the following check 'if (bitbit > maxfd)' will never be true.
Moreover, when bitbit equals maxfd, it indicates that there are no unused fds, and the function can directly return.
Fix this check.
Signed-off-by: Yuntao Wang <yuntao.wang@linux.dev> Link: https://lore.kernel.org/r/20240529160656.209352-1-yuntao.wang@linux.dev Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
613aee94 |
| 20-Jan-2024 |
Al Viro <viro@zeniv.linux.org.uk> |
get_file_rcu(): no need to check for NULL separately
IS_ERR(NULL) is false and IS_ERR() already comes with unlikely()...
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <
get_file_rcu(): no need to check for NULL separately
IS_ERR(NULL) is false and IS_ERR() already comes with unlikely()...
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
c4aab262 |
| 05-Jan-2024 |
Al Viro <viro@zeniv.linux.org.uk> |
fd_is_open(): move to fs/file.c
no users outside that...
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
f60d374d |
| 05-Jan-2024 |
Al Viro <viro@zeniv.linux.org.uk> |
close_on_exec(): pass files_struct instead of fdtable
both callers are happier that way...
Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
4e94ddfe |
| 30-Nov-2023 |
Christian Brauner <brauner@kernel.org> |
file: remove __receive_fd()
Honestly, there's little value in having a helper with and without that int __user *ufd argument. It's just messy and doesn't really give us anything. Just expose receive
file: remove __receive_fd()
Honestly, there's little value in having a helper with and without that int __user *ufd argument. It's just messy and doesn't really give us anything. Just expose receive_fd() with that argument and get rid of that helper.
Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-5-e73ca6f4ea83@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
24fa3ae9 |
| 30-Nov-2023 |
Christian Brauner <brauner@kernel.org> |
file: remove pointless wrapper
Only io_uring uses __close_fd_get_file(). All it does is hide current->files but io_uring accesses files_struct directly right now anyway so it's a bit pointless. Just
file: remove pointless wrapper
Only io_uring uses __close_fd_get_file(). All it does is hide current->files but io_uring accesses files_struct directly right now anyway so it's a bit pointless. Just rename pick_file() to file_close_fd_locked() and let io_uring use it. Add a lockdep assert in there that we expect the caller to hold file_lock while we're at it.
Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-2-e73ca6f4ea83@kernel.org Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
a88c955f |
| 30-Nov-2023 |
Christian Brauner <brauner@kernel.org> |
file: s/close_fd_get_file()/file_close_fd()/g
That really shouldn't have "get" in there as that implies we're bumping the reference count which we don't do at all. We used to but not anmore. Now we'
file: s/close_fd_get_file()/file_close_fd()/g
That really shouldn't have "get" in there as that implies we're bumping the reference count which we don't do at all. We used to but not anmore. Now we're just closing the fd and pick that file from the fdtable without bumping the reference count. Update the wrong documentation while at it.
Link: https://lore.kernel.org/r/20231130-vfs-files-fixes-v1-1-e73ca6f4ea83@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
253ca867 |
| 26-Nov-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
Improve __fget_files_rcu() code generation (and thus __fget_light())
Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a performance regression as reported by the kernel test robo
Improve __fget_files_rcu() code generation (and thus __fget_light())
Commit 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU") caused a performance regression as reported by the kernel test robot.
The __fget_light() function is one of those critical ones for some loads, and the code generation was unnecessarily impacted. Let's just write that function to better.
Reported-by: kernel test robot <oliver.sang@intel.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Closes: https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/CAHk-=wiCJtLbFWNURB34b9a_R_unaH3CiMRXfkR0-iihB_z68A@mail.gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
61d4fb0b |
| 25-Oct-2023 |
Christian Brauner <brauner@kernel.org> |
file, i915: fix file reference for mmap_singleton()
Today we got a report at [1] for rcu stalls on the i915 testsuite in [2] due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict, get_file_
file, i915: fix file reference for mmap_singleton()
Today we got a report at [1] for rcu stalls on the i915 testsuite in [2] due to the conversion of files to SLAB_TYPSSAFE_BY_RCU. Afaict, get_file_rcu() goes into an infinite loop trying to carefully verify that i915->gem.mmap_singleton hasn't changed - see the splat below.
So I stared at this code to figure out what it actually does. It seems that the i915->gem.mmap_singleton pointer itself never had rcu semantics.
The i915->gem.mmap_singleton is replaced in file->f_op->release::singleton_release():
static int singleton_release(struct inode *inode, struct file *file) { struct drm_i915_private *i915 = file->private_data;
cmpxchg(&i915->gem.mmap_singleton, file, NULL); drm_dev_put(&i915->drm);
return 0; }
The cmpxchg() is ordered against a concurrent update of i915->gem.mmap_singleton from mmap_singleton(). IOW, when mmap_singleton() fails to get a reference on i915->gem.mmap_singleton:
While mmap_singleton() does
rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock();
it allocates a new file via anon_inode_getfile() and does
smp_store_mb(i915->gem.mmap_singleton, file);
So, then what happens in the case of this bug is that at some point fput() is called and drops the file->f_count to zero leaving the pointer in i915->gem.mmap_singleton in tact.
Now, there might be delays until file->f_op->release::singleton_release() is called and i915->gem.mmap_singleton is set to NULL.
Say concurrently another task hits mmap_singleton() and does:
rcu_read_lock(); file = get_file_rcu(&i915->gem.mmap_singleton); rcu_read_unlock();
When get_file_rcu() fails to get a reference via atomic_inc_not_zero() it will try the reload from i915->gem.mmap_singleton expecting it to be NULL, assuming it has comparable semantics as we expect in __fget_files_rcu().
But it hasn't so it reloads the same pointer again, trying the same atomic_inc_not_zero() again and doing so until file->f_op->release::singleton_release() of the old file has been called.
So, in contrast to __fget_files_rcu() here we want to not retry when atomic_inc_not_zero() has failed. We only want to retry in case we managed to get a reference but the pointer did change on reload.
<3> [511.395679] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: <3> [511.395716] rcu: Tasks blocked on level-1 rcu_node (CPUs 0-9): P6238 <3> [511.395934] rcu: (detected by 16, t=65002 jiffies, g=123977, q=439 ncpus=20) <6> [511.395944] task:i915_selftest state:R running task stack:10568 pid:6238 tgid:6238 ppid:1001 flags:0x00004002 <6> [511.395962] Call Trace: <6> [511.395966] <TASK> <6> [511.395974] ? __schedule+0x3a8/0xd70 <6> [511.395995] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396003] ? lockdep_hardirqs_on+0xc3/0x140 <6> [511.396013] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 <6> [511.396029] ? get_file_rcu+0x10/0x30 <6> [511.396039] ? get_file_rcu+0x10/0x30 <6> [511.396046] ? i915_gem_object_mmap+0xbc/0x450 [i915] <6> [511.396509] ? i915_gem_mmap+0x272/0x480 [i915] <6> [511.396903] ? mmap_region+0x253/0xb60 <6> [511.396925] ? do_mmap+0x334/0x5c0 <6> [511.396939] ? vm_mmap_pgoff+0x9f/0x1c0 <6> [511.396949] ? rcu_is_watching+0x11/0x50 <6> [511.396962] ? igt_mmap_offset+0xfc/0x110 [i915] <6> [511.397376] ? __igt_mmap+0xb3/0x570 [i915] <6> [511.397762] ? igt_mmap+0x11e/0x150 [i915] <6> [511.398139] ? __trace_bprintk+0x76/0x90 <6> [511.398156] ? __i915_subtests+0xbf/0x240 [i915] <6> [511.398586] ? __pfx___i915_live_setup+0x10/0x10 [i915] <6> [511.399001] ? __pfx___i915_live_teardown+0x10/0x10 [i915] <6> [511.399433] ? __run_selftests+0xbc/0x1a0 [i915] <6> [511.399875] ? i915_live_selftests+0x4b/0x90 [i915] <6> [511.400308] ? i915_pci_probe+0x106/0x200 [i915] <6> [511.400692] ? pci_device_probe+0x95/0x120 <6> [511.400704] ? really_probe+0x164/0x3c0 <6> [511.400715] ? __pfx___driver_attach+0x10/0x10 <6> [511.400722] ? __driver_probe_device+0x73/0x160 <6> [511.400731] ? driver_probe_device+0x19/0xa0 <6> [511.400741] ? __driver_attach+0xb6/0x180 <6> [511.400749] ? __pfx___driver_attach+0x10/0x10 <6> [511.400756] ? bus_for_each_dev+0x77/0xd0 <6> [511.400770] ? bus_add_driver+0x114/0x210 <6> [511.400781] ? driver_register+0x5b/0x110 <6> [511.400791] ? i915_init+0x23/0xc0 [i915] <6> [511.401153] ? __pfx_i915_init+0x10/0x10 [i915] <6> [511.401503] ? do_one_initcall+0x57/0x270 <6> [511.401515] ? rcu_is_watching+0x11/0x50 <6> [511.401521] ? kmalloc_trace+0xa3/0xb0 <6> [511.401532] ? do_init_module+0x5f/0x210 <6> [511.401544] ? load_module+0x1d00/0x1f60 <6> [511.401581] ? init_module_from_file+0x86/0xd0 <6> [511.401590] ? init_module_from_file+0x86/0xd0 <6> [511.401613] ? idempotent_init_module+0x17c/0x230 <6> [511.401639] ? __x64_sys_finit_module+0x56/0xb0 <6> [511.401650] ? do_syscall_64+0x3c/0x90 <6> [511.401659] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 <6> [511.401684] </TASK>
Link: [1]: https://lore.kernel.org/intel-gfx/SJ1PR11MB6129CB39EED831784C331BAFB9DEA@SJ1PR11MB6129.namprd11.prod.outlook.com Link: [2]: https://intel-gfx-ci.01.org/tree/linux-next/next-20231013/bat-dg2-11/igt@i915_selftest@live@mman.html#dmesg-warnings10963 Cc: Jann Horn <jannh@google.com>, Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231025-formfrage-watscheln-84526cd3bd7d@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
6cf41fcf |
| 05-Oct-2023 |
Christian Brauner <brauner@kernel.org> |
backing file: free directly
Backing files as used by overlayfs are never installed into file descriptor tables and are explicitly documented as such. They aren't subject to rcu access conditions lik
backing file: free directly
Backing files as used by overlayfs are never installed into file descriptor tables and are explicitly documented as such. They aren't subject to rcu access conditions like regular files are.
Their lifetime is bound to the lifetime of the overlayfs file, i.e., they're stashed in ovl_file->private_data and go away otherwise. If they're set as vma->vm_file - which is their main purpose - then they're subject to regular refcount rules and vma->vm_file can't be installed into an fdtable after having been set. All in all I don't see any need for rcu delay here. So free it directly.
This all hinges on such hybrid beasts to never actually be installed into fdtables which - as mentioned before - is not allowed. So add an explicit WARN_ON_ONCE() so we catch any case where someone is suddenly trying to install one of those things into a file descriptor table so we can have a nice long chat with them.
Link: https://lore.kernel.org/r/20231005-sakralbau-wappnen-f5c31755ed70@brauner Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
0ede61d8 |
| 29-Sep-2023 |
Christian Brauner <brauner@kernel.org> |
file: convert to SLAB_TYPESAFE_BY_RCU
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which
file: convert to SLAB_TYPESAFE_BY_RCU
In recent discussions around some performance improvements in the file handling area we discussed switching the file cache to rely on SLAB_TYPESAFE_BY_RCU which allows us to get rid of call_rcu() based freeing for files completely. This is a pretty sensitive change overall but it might actually be worth doing.
The main downside is the subtlety. The other one is that we should really wait for Jann's patch to land that enables KASAN to handle SLAB_TYPESAFE_BY_RCU UAFs. Currently it doesn't but a patch for this exists.
With SLAB_TYPESAFE_BY_RCU objects may be freed and reused multiple times which requires a few changes. So it isn't sufficient anymore to just acquire a reference to the file in question under rcu using atomic_long_inc_not_zero() since the file might have already been recycled and someone else might have bumped the reference.
In other words, callers might see reference count bumps from newer users. For this reason it is necessary to verify that the pointer is the same before and after the reference count increment. This pattern can be seen in get_file_rcu() and __files_get_rcu().
In addition, it isn't possible to access or check fields in struct file without first aqcuiring a reference on it. Not doing that was always very dodgy and it was only usable for non-pointer data in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers first acquire a reference under rcu or they must hold the files_lock of the fdtable. Failing to do either one of this is a bug.
Thanks to Jann for pointing out that we need to ensure memory ordering between reallocations and pointer check by ensuring that all subsequent loads have a dependency on the second load in get_file_rcu() and providing a fixup that was folded into this patch.
Cc: Jann Horn <jannh@google.com> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
35931eb3 |
| 18-Aug-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Fix kernel-doc warnings
These have a variety of causes and a corresponding variety of solutions.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230818200824.27200
fs: Fix kernel-doc warnings
These have a variety of causes and a corresponding variety of solutions.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230818200824.2720007-1-willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
7d84d1b9 |
| 06-Aug-2023 |
Christian Brauner <brauner@kernel.org> |
fs: rely on ->iterate_shared to determine f_pos locking
Now that we removed ->iterate we don't need to check for either ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check for ->i
fs: rely on ->iterate_shared to determine f_pos locking
Now that we removed ->iterate we don't need to check for either ->iterate or ->iterate_shared in file_needs_f_pos_lock(). Simply check for ->iterate_shared instead. This will tell us whether we need to unconditionally take the lock. Not just does it allow us to avoid checking f_inode's mode it also actually clearly shows that we're locking because of readdir.
Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
79796425 |
| 03-Aug-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
file: reinstate f_pos locking optimization for regular files
In commit 20ea1e7d13c1 ("file: always lock position for FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because pidfd_get
file: reinstate f_pos locking optimization for regular files
In commit 20ea1e7d13c1 ("file: always lock position for FMODE_ATOMIC_POS") we ended up always taking the file pos lock, because pidfd_getfd() could get a reference to the file even when it didn't have an elevated file count due to threading of other sharing cases.
But Mateusz Guzik reports that the extra locking is actually measurable, so let's re-introduce the optimization, and only force the locking for directory traversal.
Directories need the lock for correctness reasons, while regular files only need it for "POSIX semantics". Since pidfd_getfd() is about debuggers etc special things that are _way_ outside of POSIX, we can relax the rules for that case.
Reported-by: Mateusz Guzik <mjguzik@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/linux-fsdevel/20230803095311.ijpvhx3fyrbkasul@f/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
ed192c59 |
| 27-Jul-2023 |
Mateusz Guzik <mjguzik@gmail.com> |
file: mostly eliminate spurious relocking in __range_close
Stock code takes a lock trip for every fd in range, but this can be trivially avoided and real-world consumers do have plenty of already cl
file: mostly eliminate spurious relocking in __range_close
Stock code takes a lock trip for every fd in range, but this can be trivially avoided and real-world consumers do have plenty of already closed cases.
Just booting Debian 12 with a debug printk shows: (sh) min 3 max 17 closed 15 empty 0 (sh) min 19 max 63 closed 31 empty 14 (sh) min 4 max 63 closed 0 empty 60 (spawn) min 3 max 63 closed 13 empty 48 (spawn) min 3 max 63 closed 13 empty 48 (mount) min 3 max 17 closed 15 empty 0 (mount) min 19 max 63 closed 32 empty 13
and so on.
While here use more idiomatic naming.
An avoidable relock is left in place to avoid uglifying the code. The code was not switched to bitmap traversal for the same reason.
Tested with ltp kernel/syscalls/close_range
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Message-Id: <20230727113809.800067-1-mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
20ea1e7d |
| 24-Jul-2023 |
Christian Brauner <brauner@kernel.org> |
file: always lock position for FMODE_ATOMIC_POS
The pidfd_getfd() system call allows a caller with ptrace_may_access() abilities on another process to steal a file descriptor from this process. This
file: always lock position for FMODE_ATOMIC_POS
The pidfd_getfd() system call allows a caller with ptrace_may_access() abilities on another process to steal a file descriptor from this process. This system call is used by debuggers, container runtimes, system call supervisors, networking proxies etc. So while it is a special interest system call it is used in common tools.
That ability ends up breaking our long-time optimization in fdget_pos(), which "knew" that if we had exclusive access to the file descriptor nobody else could access it, and we didn't need the lock for the file position.
That check for file_count(file) was always fairly subtle - it depended on __fdget() not incrementing the file count for single-threaded processes and thus included that as part of the rule - but it did mean that we didn't need to take the lock in all those traditional unix process contexts.
So it's sad to see this go, and I'd love to have some way to re-instate the optimization. At the same time, the lock obviously isn't ever contended in the case we optimized, so all we were optimizing away is the atomics and the cacheline dirtying. Let's see if anybody even notices that the optimization is gone.
Link: https://lore.kernel.org/linux-fsdevel/20230724-vfs-fdget_pos-v1-1-a4abfd7103f3@kernel.org/ Fixes: 8649c322f75c ("pid: Implement pidfd_getfd syscall") Cc: stable@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
609d5444 |
| 06-Mar-2023 |
Theodore Ts'o <tytso@mit.edu> |
fs: prevent out-of-bounds array speculation when closing a file descriptor
Google-Bug-Id: 114199369 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
7ee47dcf |
| 31-Oct-2022 |
Jann Horn <jannh@google.com> |
fs: use acquire ordering in __fget_light()
We must prevent the CPU from reordering the files->count read with the FD table access like this, on architectures where read-read reordering is possible:
fs: use acquire ordering in __fget_light()
We must prevent the CPU from reordering the files->count read with the FD table access like this, on architectures where read-read reordering is possible:
files_lookup_fd_raw() close_fd() put_files_struct() atomic_read(&files->count)
I would like to mark this for stable, but the stable rules explicitly say "no theoretical races", and given that the FD table pointer and files->count are explicitly stored in the same cacheline, this sort of reordering seems quite unlikely in practice...
Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
4480c27c |
| 08-Jun-2022 |
Andreas Gruenbacher <agruenba@redhat.com> |
gfs2: Add glockfd debugfs file
When a process has a gfs2 file open, the file is keeping a reference on the underlying gfs2 inode, and the inode is keeping the inode's iopen glock held in shared mode
gfs2: Add glockfd debugfs file
When a process has a gfs2 file open, the file is keeping a reference on the underlying gfs2 inode, and the inode is keeping the inode's iopen glock held in shared mode. In other words, the process depends on the iopen glock of each open gfs2 file. Expose those dependencies in a new "glockfd" debugfs file.
The new debugfs file contains one line for each gfs2 file descriptor, specifying the tgid, file descriptor number, and glock name, e.g.,
1601 6 5/816d
This list is compiled by iterating all tasks on the system using find_ge_pid(), and all file descriptors of each task using task_lookup_next_fd_rcu(). To make that work from gfs2, export those two functions.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
show more ...
|
#
40a19260 |
| 05-Jun-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
fix the breakage in close_fd_get_file() calling conventions change
It used to grab an extra reference to struct file rather than just transferring to caller the one it had removed from descriptor ta
fix the breakage in close_fd_get_file() calling conventions change
It used to grab an extra reference to struct file rather than just transferring to caller the one it had removed from descriptor table. New variant doesn't, and callers need to be adjusted.
Reported-and-tested-by: syzbot+47dd250f527cb7bebf24@syzkaller.appspotmail.com Fixes: 6319194ec57b ("Unify the primitives for file descriptor closing") Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
6319194e |
| 12-May-2022 |
Al Viro <viro@zeniv.linux.org.uk> |
Unify the primitives for file descriptor closing
Currently we have 3 primitives for removing an opened file from descriptor table - pick_file(), __close_fd_get_file() and close_fd_get_file(). Their
Unify the primitives for file descriptor closing
Currently we have 3 primitives for removing an opened file from descriptor table - pick_file(), __close_fd_get_file() and close_fd_get_file(). Their calling conventions are rather odd and there's a code duplication for no good reason. They can be unified -
1) have __range_close() cap max_fd in the very beginning; that way we don't need separate way for pick_file() to report being past the end of descriptor table.
2) make {__,}close_fd_get_file() return file (or NULL) directly, rather than returning it via struct file ** argument. Don't bother with (bogus) return value - nobody wants that -ENOENT.
3) make pick_file() return NULL on unopened descriptor - the only caller that used to care about the distinction between descriptor past the end of descriptor table and finding NULL in descriptor table doesn't give a damn after (1).
4) lift ->files_lock out of pick_file()
That actually simplifies the callers, as well as the primitives themselves. Code duplication is also gone...
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|