#
e001d144 |
| 12-Oct-2023 |
Amir Goldstein <amir73il@gmail.com> |
fs: factor out vfs_parse_monolithic_sep() helper
Factor out vfs_parse_monolithic_sep() from generic_parse_monolithic(), so filesystems could use it with a custom option separator callback.
Acked-by
fs: factor out vfs_parse_monolithic_sep() helper
Factor out vfs_parse_monolithic_sep() from generic_parse_monolithic(), so filesystems could use it with a custom option separator callback.
Acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
show more ...
|
#
35931eb3 |
| 18-Aug-2023 |
Matthew Wilcox (Oracle) <willy@infradead.org> |
fs: Fix kernel-doc warnings
These have a variety of causes and a corresponding variety of solutions.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230818200824.27200
fs: Fix kernel-doc warnings
These have a variety of causes and a corresponding variety of solutions.
Signed-off-by: "Matthew Wilcox (Oracle)" <willy@infradead.org> Message-Id: <20230818200824.2720007-1-willy@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
d80a8f1b |
| 08-Aug-2023 |
David Howells <dhowells@redhat.com> |
vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
When NFS superblocks are created by automounting, their LSM parameters aren't set in the fs_context struct prior t
vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing
When NFS superblocks are created by automounting, their LSM parameters aren't set in the fs_context struct prior to sget_fc() being called, leading to failure to match existing superblocks.
This bug leads to messages like the following appearing in dmesg when fscache is enabled:
NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
Fix this by adding a new LSM hook to load fc->security for submount creation.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4 Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5 Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.") Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode") Tested-by: Jeff Layton <jlayton@kernel.org> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: "Christian Brauner (Microsoft)" <brauner@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Message-Id: <20230808-master-v9-1-e0ecde888221@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
22ed7ecd |
| 02-Aug-2023 |
Christian Brauner <brauner@kernel.org> |
fs: add FSCONFIG_CMD_CREATE_EXCL
Summary =======
This introduces FSCONFIG_CMD_CREATE_EXCL which will allows userspace to implement something like mount -t ext4 --exclusive /dev/sda /B which fails i
fs: add FSCONFIG_CMD_CREATE_EXCL
Summary =======
This introduces FSCONFIG_CMD_CREATE_EXCL which will allows userspace to implement something like mount -t ext4 --exclusive /dev/sda /B which fails if a superblock for the requested filesystem does already exist:
Before this patch -----------------
$ sudo ./move-mount -f xfs -o source=/dev/sda4 /A Requesting filesystem type xfs Mount options requested: source=/dev/sda4 Attaching mount at /A Moving single attached mount Setting key(source) with val(/dev/sda4)
$ sudo ./move-mount -f xfs -o source=/dev/sda4 /B Requesting filesystem type xfs Mount options requested: source=/dev/sda4 Attaching mount at /B Moving single attached mount Setting key(source) with val(/dev/sda4)
After this patch with --exclusive as a switch for FSCONFIG_CMD_CREATE_EXCL --------------------------------------------------------------------------
$ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /A Requesting filesystem type xfs Request exclusive superblock creation Mount options requested: source=/dev/sda4 Attaching mount at /A Moving single attached mount Setting key(source) with val(/dev/sda4)
$ sudo ./move-mount -f xfs --exclusive -o source=/dev/sda4 /B Requesting filesystem type xfs Request exclusive superblock creation Mount options requested: source=/dev/sda4 Attaching mount at /B Moving single attached mount Setting key(source) with val(/dev/sda4) Device or resource busy | move-mount.c: 300: do_fsconfig: i xfs: reusing existing filesystem not allowed
Details =======
As mentioned on the list (cf. [1]-[3]) mount requests like mount -t ext4 /dev/sda /A are ambigous for userspace. Either a new superblock has been created and mounted or an existing superblock has been reused and a bind-mount has been created.
This becomes clear in the following example where two processes create the same mount for the same block device:
P1 P2 fd_fs = fsopen("ext4"); fd_fs = fsopen("ext4"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "source", "/dev/sda"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "dax", "always"); fsconfig(fd_fs, FSCONFIG_SET_STRING, "resuid", "1000");
// wins and creates superblock fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...) // finds compatible superblock of P1 // spins until P1 sets SB_BORN and grabs a reference fsconfig(fd_fs, FSCONFIG_CMD_CREATE, ...)
fd_mnt1 = fsmount(fd_fs); fd_mnt2 = fsmount(fd_fs); move_mount(fd_mnt1, "/A") move_mount(fd_mnt2, "/B")
Not just does P2 get a bind-mount but the mount options that P2 requestes are silently ignored. The VFS itself doesn't, can't and shouldn't enforce filesystem specific mount option compatibility. It only enforces incompatibility for read-only <-> read-write transitions:
mount -t ext4 /dev/sda /A mount -t ext4 -o ro /dev/sda /B
The read-only request will fail with EBUSY as the VFS can't just silently transition a superblock from read-write to read-only or vica versa without risking security issues.
To userspace this silent superblock reuse can become a security issue in because there is currently no straightforward way for userspace to know that they did indeed manage to create a new superblock and didn't just reuse an existing one.
This adds a new FSCONFIG_CMD_CREATE_EXCL command to fsconfig() that returns EBUSY if an existing superblock would be reused. Userspace that needs to be sure that it did create a new superblock with the requested mount options can request superblock creation using this command. If the command succeeds they can be sure that they did create a new superblock with the requested mount options.
This requires the new mount api. With the old mount api it would be necessary to plumb this through every legacy filesystem's file_system_type->mount() method. If they want this feature they are most welcome to switch to the new mount api.
Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on each high-level superblock creation helper:
(1) get_tree_nodev()
Always allocate new superblock. Hence, FSCONFIG_CMD_CREATE and FSCONFIG_CMD_CREATE_EXCL are equivalent.
The binderfs or overlayfs filesystems are examples.
(4) get_tree_keyed()
Finds an existing superblock based on sb->s_fs_info. Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY.
The mqueue or nfsd filesystems are examples.
(2) get_tree_bdev()
This effectively works like get_tree_keyed().
The ext4 or xfs filesystems are examples.
(3) get_tree_single()
Only one superblock of this filesystem type can ever exist. Hence, FSCONFIG_CMD_CREATE would reuse an existing superblock whereas FSCONFIG_CMD_CREATE_EXCL would reject it with EBUSY.
The securityfs or configfs filesystems are examples.
Note that some single-instance filesystems never destroy the superblock once it has been created during the first mount. For example, if securityfs has been mounted at least onces then the created superblock will never be destroyed again as long as there is still an LSM making use it. Consequently, even if securityfs is unmounted and the superblock seemingly destroyed it really isn't which means that FSCONFIG_CMD_CREATE_EXCL will continue rejecting reusing an existing superblock.
This is acceptable thugh since special purpose filesystems such as this shouldn't have a need to use FSCONFIG_CMD_CREATE_EXCL anyway and if they do it's probably to make sure that mount options aren't ignored.
Following is an analysis of the effect of FSCONFIG_CMD_CREATE_EXCL on filesystems that make use of the low-level sget_fc() helper directly. They're all effectively variants on get_tree_keyed(), get_tree_bdev(), or get_tree_nodev():
(5) mtd_get_sb()
Similar logic to get_tree_keyed().
(6) afs_get_tree()
Similar logic to get_tree_keyed().
(7) ceph_get_tree()
Similar logic to get_tree_keyed().
Already explicitly allows forcing the allocation of a new superblock via CEPH_OPT_NOSHARE. This turns it into get_tree_nodev().
(8) fuse_get_tree_submount()
Similar logic to get_tree_nodev().
(9) fuse_get_tree()
Forces reuse of existing FUSE superblock.
Forces reuse of existing superblock if passed in file refers to an existing FUSE connection. If FSCONFIG_CMD_CREATE_EXCL is specified together with an fd referring to an existing FUSE connections this would cause the superblock reusal to fail. If reusing is the intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified.
(10) fuse_get_tree() -> get_tree_nodev()
Same logic as in get_tree_nodev().
(11) fuse_get_tree() -> get_tree_bdev()
Same logic as in get_tree_bdev().
(12) virtio_fs_get_tree()
Same logic as get_tree_keyed().
(13) gfs2_meta_get_tree()
Forces reuse of existing gfs2 superblock.
Mounting gfs2meta enforces that a gf2s superblock must already exist. If not, it will error out. Consequently, mounting gfs2meta with FSCONFIG_CMD_CREATE_EXCL would always fail. If reusing is the intent then FSCONFIG_CMD_CREATE_EXCL shouldn't be specified.
(14) kernfs_get_tree()
Similar logic to get_tree_keyed().
(15) nfs_get_tree_common()
Similar logic to get_tree_keyed().
Already explicitly allows forcing the allocation of a new superblock via NFS_MOUNT_UNSHARED. This effectively turns it into get_tree_nodev().
Link: [1] https://lore.kernel.org/linux-block/20230704-fasching-wertarbeit-7c6ffb01c83d@brauner Link: [2] https://lore.kernel.org/linux-block/20230705-pumpwerk-vielversprechend-a4b1fd947b65@brauner Link: [3] https://lore.kernel.org/linux-fsdevel/20230725-einnahmen-warnschilder-17779aec0a97@brauner Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Message-Id: <20230802-vfs-super-exclusive-v2-4-95dc4e41b870@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
62176420 |
| 07-Jun-2023 |
Thomas Weißschuh <linux@weissschuh.net> |
fs: avoid empty option when generating legacy mount string
As each option string fragment is always prepended with a comma it would happen that the whole string always starts with a comma. This coul
fs: avoid empty option when generating legacy mount string
As each option string fragment is always prepended with a comma it would happen that the whole string always starts with a comma. This could be interpreted by filesystem drivers as an empty option and may produce errors.
For example the NTFS driver from ntfs.ko behaves like this and fails when mounted via the new API.
Link: https://github.com/util-linux/util-linux/issues/2298 Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Fixes: 3e1aeb00e6d1 ("vfs: Implement a filesystem superblock creation/configuration context") Cc: stable@vger.kernel.org Message-Id: <20230607-fs-empty-option-v1-1-20c8dbf4671b@weissschuh.net> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
722d9484 |
| 18-Jan-2022 |
Jamie Hill-Daniel <jamie@hill-daniel.co.uk> |
vfs: fs_context: fix up param length parsing in legacy_parse_param
The "PAGE_SIZE - 2 - size" calculation in legacy_parse_param() is an unsigned type so a large value of "size" results in a high pos
vfs: fs_context: fix up param length parsing in legacy_parse_param
The "PAGE_SIZE - 2 - size" calculation in legacy_parse_param() is an unsigned type so a large value of "size" results in a high positive value instead of a negative value as expected. Fix this by getting rid of the subtraction.
Signed-off-by: Jamie Hill-Daniel <jamie@hill-daniel.co.uk> Signed-off-by: William Liu <willsroot@protonmail.com> Tested-by: Salvatore Bonaccorso <carnil@debian.org> Tested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
bb902cb4 |
| 02-Sep-2021 |
Yutian Yang <nglaive@gmail.com> |
memcg: charge fs_context and legacy_fs_context
This patch adds accounting flags to fs_context and legacy_fs_context allocation sites so that kernel could correctly charge these objects.
We have wri
memcg: charge fs_context and legacy_fs_context
This patch adds accounting flags to fs_context and legacy_fs_context allocation sites so that kernel could correctly charge these objects.
We have written a PoC to demonstrate the effect of the missing-charging bugs. The PoC takes around 1,200MB unaccounted memory, while it is charged for only 362MB memory usage. We evaluate the PoC on QEMU x86_64 v5.2.90 + Linux kernel v5.10.19 + Debian buster. All the limitations including ulimits and sysctl variables are set as default. Specifically, the hard NOFILE limit and nr_open in sysctl are both 1,048,576.
/*------------------------- POC code ----------------------------*/
#define _GNU_SOURCE #include <sys/types.h> #include <sys/file.h> #include <time.h> #include <sys/wait.h> #include <stdint.h> #include <stdlib.h> #include <unistd.h> #include <stdio.h> #include <signal.h> #include <sched.h> #include <fcntl.h> #include <linux/mount.h>
#define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \ } while (0)
#define STACK_SIZE (8 * 1024) #ifndef __NR_fsopen #define __NR_fsopen 430 #endif static inline int fsopen(const char *fs_name, unsigned int flags) { return syscall(__NR_fsopen, fs_name, flags); }
static char thread_stack[512][STACK_SIZE];
int thread_fn(void* arg) { for (int i = 0; i< 800000; ++i) { int fsfd = fsopen("nfs", FSOPEN_CLOEXEC); if (fsfd == -1) { errExit("fsopen"); } } while(1); return 0; }
int main(int argc, char *argv[]) { int thread_pid; for (int i = 0; i < 1; ++i) { thread_pid = clone(thread_fn, thread_stack[i] + STACK_SIZE, \ SIGCHLD, NULL); } while(1); return 0; }
/*-------------------------- end --------------------------------*/
Link: https://lkml.kernel.org/r/1626517201-24086-1-git-send-email-nglaive@gmail.com Signed-off-by: Yutian Yang <nglaive@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <shenwenbo@zju.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
d1d488d8 |
| 14-Jul-2021 |
Christian Brauner <christian.brauner@ubuntu.com> |
fs: add vfs_parse_fs_param_source() helper
Add a simple helper that filesystems can use in their parameter parser to parse the "source" parameter. A few places open-coded this function and that alre
fs: add vfs_parse_fs_param_source() helper
Add a simple helper that filesystems can use in their parameter parser to parse the "source" parameter. A few places open-coded this function and that already caused a bug in the cgroup v1 parser that we fixed. Let's make it harder to get this wrong by introducing a helper which performs all necessary checks.
Link: https://syzkaller.appspot.com/bug?id=6312526aba5beae046fdae8f00399f87aab48b12 Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
df561f66 |
| 23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through mar
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
show more ...
|
#
55923e4d |
| 14-May-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: don't parse "silent" option
Parsing "silent" and clearing SB_SILENT makes zero sense.
Parsing "silent" and setting SB_SILENT would make a bit more sense, but apparently nobody cares.
Signed-o
vfs: don't parse "silent" option
Parsing "silent" and clearing SB_SILENT makes zero sense.
Parsing "silent" and setting SB_SILENT would make a bit more sense, but apparently nobody cares.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
caaef1ba |
| 14-May-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: don't parse "posixacl" option
Unlike the others, this is _not_ a standard option accepted by mount(8).
In fact SB_POSIXACL is an internal flag, and accepting MS_POSIXACL on the mount(2) interf
vfs: don't parse "posixacl" option
Unlike the others, this is _not_ a standard option accepted by mount(8).
In fact SB_POSIXACL is an internal flag, and accepting MS_POSIXACL on the mount(2) interface is possibly a bug.
The only filesystem that apparently wants to handle the "posixacl" option is 9p, but it has special handling of that option besides setting SB_POSIXACL.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
9193ae87 |
| 14-May-2020 |
Miklos Szeredi <mszeredi@redhat.com> |
vfs: don't parse forbidden flags
Makes little sense to keep this blacklist synced with what mount(8) parses and what it doesn't. E.g. it has various forms of "*atime" options, but not "atime"...
S
vfs: don't parse forbidden flags
Makes little sense to keep this blacklist synced with what mount(8) parses and what it doesn't. E.g. it has various forms of "*atime" options, but not "atime"...
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
show more ...
|
#
cc3c0b53 |
| 21-Dec-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
add prefix to fs_context->log
... turning it into struct p_log embedded into fs_context. Initialize the prefix with fs_type->name, turning fs_parse() into a trivial inline wrapper for __fs_parse().
add prefix to fs_context->log
... turning it into struct p_log embedded into fs_context. Initialize the prefix with fs_type->name, turning fs_parse() into a trivial inline wrapper for __fs_parse().
This makes fs_parameter_description->name completely unused.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
9f09f649 |
| 21-Dec-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
teach logfc() to handle prefices, give it saner calling conventions
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
#
34264ae3 |
| 16-Dec-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
don't bother with explicit length argument for __lookup_constant()
Have the arrays of constant_table self-terminated (by NULL ->name in the final entry). Simplifies lookup_constant() and allows to
don't bother with explicit length argument for __lookup_constant()
Have the arrays of constant_table self-terminated (by NULL ->name in the final entry). Simplifies lookup_constant() and allows to reuse the search for enum params as well.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
0f89589a |
| 17-Dec-2019 |
Al Viro <viro@zeniv.linux.org.uk> |
Pass consistent param->type to fs_parse()
As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable; both get fs_value_is_string for ->type and NULL for ->string. To make it even mor
Pass consistent param->type to fs_parse()
As it is, vfs_parse_fs_string() makes "foo" and "foo=" indistinguishable; both get fs_value_is_string for ->type and NULL for ->string. To make it even more unpleasant, that combination is impossible to produce with fsconfig().
Much saner rules would be "foo" => fs_value_is_flag, NULL "foo=" => fs_value_is_string, "" "foo=bar" => fs_value_is_string, "bar" All cases are distinguishable, all results are expressable by fsconfig(), ->has_value checks are much simpler that way (to the point of the field being useless) and quite a few regressions go away (gfs2 has no business accepting -o nodebug=, for example).
Partially based upon patches from Miklos.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
c7eb6869 |
| 25-Mar-2019 |
David Howells <dhowells@redhat.com> |
vfs: subtype handling moved to fuse
The unused vfs code can be removed. Don't pass empty subtype (same as if ->parse callback isn't called).
The bits that are left involve determining whether it's
vfs: subtype handling moved to fuse
The unused vfs code can be removed. Don't pass empty subtype (same as if ->parse callback isn't called).
The bits that are left involve determining whether it's permitted to split the filesystem type string passed in to mount(2). Consequently, this means that we cannot get rid of the FS_HAS_SUBTYPE flag unless we define that a type string with a dot in it always indicates a subtype specification.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
show more ...
|
#
1dd9bc08 |
| 22-Aug-2019 |
Eric Biggers <ebiggers@google.com> |
vfs: set fs_context::user_ns for reconfigure
fs_context::user_ns is used by fuse_parse_param(), even during remount, so it needs to be set to the existing value for reconfigure.
Reproducer:
#incl
vfs: set fs_context::user_ns for reconfigure
fs_context::user_ns is used by fuse_parse_param(), even during remount, so it needs to be set to the existing value for reconfigure.
Reproducer:
#include <fcntl.h> #include <sys/mount.h>
int main() { char opts[128]; int fd = open("/dev/fuse", O_RDWR);
sprintf(opts, "fd=%d,rootmode=040000,user_id=0,group_id=0", fd); mkdir("mnt", 0777); mount("foo", "mnt", "fuse.foo", 0, opts); mount("foo", "mnt", "fuse.foo", MS_REMOUNT, opts); }
Crash: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 0 PID: 129 Comm: syz_make_kuid Not tainted 5.3.0-rc5-next-20190821 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014 RIP: 0010:map_id_range_down+0xb/0xc0 kernel/user_namespace.c:291 [...] Call Trace: map_id_down kernel/user_namespace.c:312 [inline] make_kuid+0xe/0x10 kernel/user_namespace.c:389 fuse_parse_param+0x116/0x210 fs/fuse/inode.c:523 vfs_parse_fs_param+0xdb/0x1b0 fs/fs_context.c:145 vfs_parse_fs_string+0x6a/0xa0 fs/fs_context.c:188 generic_parse_monolithic+0x85/0xc0 fs/fs_context.c:228 parse_monolithic_mount_data+0x1b/0x20 fs/fs_context.c:708 do_remount fs/namespace.c:2525 [inline] do_mount+0x39a/0xa60 fs/namespace.c:3107 ksys_mount+0x7d/0xd0 fs/namespace.c:3325 __do_sys_mount fs/namespace.c:3339 [inline] __se_sys_mount fs/namespace.c:3336 [inline] __x64_sys_mount+0x20/0x30 fs/namespace.c:3336 do_syscall_64+0x4a/0x1a0 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Reported-by: syzbot+7d6a57304857423318a5@syzkaller.appspotmail.com Fixes: 408cbe695350 ("vfs: Convert fuse to use the new mount API") Cc: David Howells <dhowells@redhat.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
b4d0d230 |
| 20-May-2019 |
Thomas Gleixner <tglx@linutronix.de> |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 36
Based on 1 normalized pattern(s):
this program is free software you can redistribute it and or modify it under the terms of the gnu general public licence as published by the free software foundation either version 2 of the licence or at your option any later version
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-or-later
has been chosen to replace the boilerplate/reference in 114 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
show more ...
|
#
ecdab150 |
| 01-Nov-2018 |
David Howells <dhowells@redhat.com> |
vfs: syscall: Add fsconfig() for configuring and managing a context
Add a syscall for configuring a filesystem creation context and triggering actions upon it, to be used in conjunction with fsopen,
vfs: syscall: Add fsconfig() for configuring and managing a context
Add a syscall for configuring a filesystem creation context and triggering actions upon it, to be used in conjunction with fsopen, fspick and fsmount.
long fsconfig(int fs_fd, unsigned int cmd, const char *key, const void *value, int aux);
Where fs_fd indicates the context, cmd indicates the action to take, key indicates the parameter name for parameter-setting actions and, if needed, value points to a buffer containing the value and aux can give more information for the value.
The following command IDs are proposed:
(*) FSCONFIG_SET_FLAG: No value is specified. The parameter must be boolean in nature. The key may be prefixed with "no" to invert the setting. value must be NULL and aux must be 0.
(*) FSCONFIG_SET_STRING: A string value is specified. The parameter can be expecting boolean, integer, string or take a path. A conversion to an appropriate type will be attempted (which may include looking up as a path). value points to a NUL-terminated string and aux must be 0.
(*) FSCONFIG_SET_BINARY: A binary blob is specified. value points to the blob and aux indicates its size. The parameter must be expecting a blob.
(*) FSCONFIG_SET_PATH: A non-empty path is specified. The parameter must be expecting a path object. value points to a NUL-terminated string that is the path and aux is a file descriptor at which to start a relative lookup or AT_FDCWD.
(*) FSCONFIG_SET_PATH_EMPTY: As fsconfig_set_path, but with AT_EMPTY_PATH implied.
(*) FSCONFIG_SET_FD: An open file descriptor is specified. value must be NULL and aux indicates the file descriptor.
(*) FSCONFIG_CMD_CREATE: Trigger superblock creation.
(*) FSCONFIG_CMD_RECONFIGURE: Trigger superblock reconfiguration.
For the "set" command IDs, the idea is that the file_system_type will point to a list of parameters and the types of value that those parameters expect to take. The core code can then do the parse and argument conversion and then give the LSM and FS a cooked option or array of options to use.
Source specification is also done the same way same way, using special keys "source", "source1", "source2", etc..
[!] Note that, for the moment, the key and value are just glued back together and handed to the filesystem. Every filesystem that uses options uses match_token() and co. to do this, and this will need to be changed - but not all at once.
Example usage:
fd = fsopen("ext4", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_path, "source", "/dev/sda1", AT_FDCWD); fsconfig(fd, fsconfig_set_path_empty, "journal_path", "", journal_fd); fsconfig(fd, fsconfig_set_fd, "journal_fd", "", journal_fd); fsconfig(fd, fsconfig_set_flag, "user_xattr", NULL, 0); fsconfig(fd, fsconfig_set_flag, "noacl", NULL, 0); fsconfig(fd, fsconfig_set_string, "sb", "1", 0); fsconfig(fd, fsconfig_set_string, "errors", "continue", 0); fsconfig(fd, fsconfig_set_string, "data", "journal", 0); fsconfig(fd, fsconfig_set_string, "context", "unconfined_u:...", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);
or:
fd = fsopen("ext4", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_string, "source", "/dev/sda1", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);
or:
fd = fsopen("afs", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_string, "source", "#grand.central.org:root.cell", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);
or:
fd = fsopen("jffs2", FSOPEN_CLOEXEC); fsconfig(fd, fsconfig_set_string, "source", "mtd0", 0); fsconfig(fd, fsconfig_cmd_create, NULL, NULL, 0); mfd = fsmount(fd, FSMOUNT_CLOEXEC, MS_NOEXEC);
Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-api@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
007ec26c |
| 01-Nov-2018 |
David Howells <dhowells@redhat.com> |
vfs: Implement logging through fs_context
Implement the ability for filesystems to log error, warning and informational messages through the fs_context. These can be extracted by userspace by readi
vfs: Implement logging through fs_context
Implement the ability for filesystems to log error, warning and informational messages through the fs_context. These can be extracted by userspace by reading from an fd created by fsopen().
Error messages are prefixed with "e ", warnings with "w " and informational messages with "i ".
Inside the kernel, formatted messages are malloc'd but unformatted messages are not copied if they're either in the core .rodata section or in the .rodata section of the filesystem module pinned by fs_context::fs_type. The messages are only good till the fs_type is released.
Note that the logging object is shared between duplicated fs_context structures. This is so that such as NFS which do a mount within a mount can get at least some of the errors from the inner mount.
Five logging functions are provided for this:
(1) void logfc(struct fs_context *fc, const char *fmt, ...);
This logs a message into the context. If the buffer is full, the earliest message is discarded.
(2) void errorf(fc, fmt, ...);
This wraps logfc() to log an error.
(3) void invalf(fc, fmt, ...);
This wraps errorf() and returns -EINVAL for convenience.
(4) void warnf(fc, fmt, ...);
This wraps logfc() to log a warning.
(5) void infof(fc, fmt, ...);
This wraps logfc() to log an informational message.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
24dcb3d9 |
| 01-Nov-2018 |
David Howells <dhowells@redhat.com> |
vfs: syscall: Add fsopen() to prepare for superblock creation
Provide an fsopen() system call that starts the process of preparing to create a superblock that will then be mountable, using an fd as
vfs: syscall: Add fsopen() to prepare for superblock creation
Provide an fsopen() system call that starts the process of preparing to create a superblock that will then be mountable, using an fd as a context handle. fsopen() is given the name of the filesystem that will be used:
int mfd = fsopen(const char *fsname, unsigned int flags);
where flags can be 0 or FSOPEN_CLOEXEC.
For example:
sfd = fsopen("ext4", FSOPEN_CLOEXEC); fsconfig(sfd, FSCONFIG_SET_PATH, "source", "/dev/sda1", AT_FDCWD); fsconfig(sfd, FSCONFIG_SET_FLAG, "noatime", NULL, 0); fsconfig(sfd, FSCONFIG_SET_FLAG, "acl", NULL, 0); fsconfig(sfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0); fsconfig(sfd, FSCONFIG_SET_STRING, "sb", "1", 0); fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); fsinfo(sfd, NULL, ...); // query new superblock attributes mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MS_RELATIME); move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
sfd = fsopen("afs", -1); fsconfig(fd, FSCONFIG_SET_STRING, "source", "#grand.central.org:root.cell", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); mfd = fsmount(sfd, 0, MS_NODEV); move_mount(mfd, "", sfd, AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
If an error is reported at any step, an error message may be available to be read() back (ENODATA will be reported if there isn't an error available) in the form:
"e <subsys>:<problem>" "e SELinux:Mount on mountpoint not permitted"
Once fsmount() has been called, further fsconfig() calls will incur EBUSY, even if the fsmount() fails. read() is still possible to retrieve error information.
The fsopen() syscall creates a mount context and hangs it of the fd that it returns.
Netlink is not used because it is optional and would make the core VFS dependent on the networking layer and also potentially add network namespace issues.
Note that, for the moment, the caller must have SYS_CAP_ADMIN to use fsopen().
Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-api@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
e7582e16 |
| 01-Nov-2018 |
David Howells <dhowells@redhat.com> |
vfs: Implement logging through fs_context
Implement the ability for filesystems to log error, warning and informational messages through the fs_context. In the future, these will be extractable by
vfs: Implement logging through fs_context
Implement the ability for filesystems to log error, warning and informational messages through the fs_context. In the future, these will be extractable by userspace by reading from an fd created by the fsopen() syscall.
Error messages are prefixed with "e ", warnings with "w " and informational messages with "i ".
In the future, inside the kernel, formatted messages will be malloc'd but unformatted messages will not copied if they're either in the core .rodata section or in the .rodata section of the filesystem module pinned by fs_context::fs_type. The messages will only be good till the fs_type is released.
Note that the logging object will be shared between duplicated fs_context structures. This is so that such as NFS which do a mount within a mount can get at least some of the errors from the inner mount.
Five logging functions are provided for this:
(1) void logfc(struct fs_context *fc, const char *fmt, ...);
This logs a message into the context. If the buffer is full, the earliest message is discarded.
(2) void errorf(fc, fmt, ...);
This wraps logfc() to log an error.
(3) void invalf(fc, fmt, ...);
This wraps errorf() and returns -EINVAL for convenience.
(4) void warnf(fc, fmt, ...);
This wraps logfc() to log a warning.
(5) void infof(fc, fmt, ...);
This wraps logfc() to log an informational message.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
0b52075e |
| 23-Dec-2018 |
Al Viro <viro@zeniv.linux.org.uk> |
introduce cloning of fs_context
new primitive: vfs_dup_fs_context(). Comes with fs_context method (->dup()) for copying the filesystem-specific parts of fs_context, along with LSM one (->fs_context
introduce cloning of fs_context
new primitive: vfs_dup_fs_context(). Comes with fs_context method (->dup()) for copying the filesystem-specific parts of fs_context, along with LSM one (->fs_context_dup()) for doing the same to LSM parts.
[needs better commit message, and change of Author:, anyway]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
3e1aeb00 |
| 01-Nov-2018 |
David Howells <dhowells@redhat.com> |
vfs: Implement a filesystem superblock creation/configuration context
[AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living mounts] [AV - reordering fs/namespace.c is badly overdu
vfs: Implement a filesystem superblock creation/configuration context
[AV - unfuck kern_mount_data(); we want non-NULL ->mnt_ns on long-living mounts] [AV - reordering fs/namespace.c is badly overdue, but let's keep it separate from that series] [AV - drop simple_pin_fs() change] [AV - clean vfs_kern_mount() failure exits up]
Implement a filesystem context concept to be used during superblock creation for mount and superblock reconfiguration for remount.
The mounting procedure then becomes:
(1) Allocate new fs_context context.
(2) Configure the context.
(3) Create superblock.
(4) Query the superblock.
(5) Create a mount for the superblock.
(6) Destroy the context.
Rather than calling fs_type->mount(), an fs_context struct is created and fs_type->init_fs_context() is called to set it up. Pointers exist for the filesystem and LSM to hang their private data off.
A set of operations has to be set by ->init_fs_context() to provide freeing, duplication, option parsing, binary data parsing, validation, mounting and superblock filling.
Legacy filesystems are supported by the provision of a set of legacy fs_context operations that build up a list of mount options and then invoke fs_type->mount() from within the fs_context ->get_tree() operation. This allows all filesystems to be accessed using fs_context.
It should be noted that, whilst this patch adds a lot of lines of code, there is quite a bit of duplication with existing code that can be eliminated should all filesystems be converted over.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|