#
4b2cfbda |
| 23-Jan-2024 |
Christian Brauner <brauner@kernel.org> |
nfs: port block device access to files
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-24-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <bra
nfs: port block device access to files
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-24-adbd023e19cc@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
3fe5d9fb |
| 27-Sep-2023 |
Jan Kara <jack@suse.cz> |
nfs/blocklayout: Convert to use bdev_open_by_dev/path()
Convert block device handling to use bdev_open_by_dev/path() and pass the handle around.
CC: linux-nfs@vger.kernel.org CC: Trond Myklebust <t
nfs/blocklayout: Convert to use bdev_open_by_dev/path()
Convert block device handling to use bdev_open_by_dev/path() and pass the handle around.
CC: linux-nfs@vger.kernel.org CC: Trond Myklebust <trond.myklebust@hammerspace.com> CC: Anna Schumaker <anna@kernel.org> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230927093442.25915-25-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
b3960475 |
| 25-Jan-2018 |
Benjamin Coddington <bcodding@redhat.com> |
pnfs/blocklayout: pnfs_block_dev_map uses bytes, not sectors
Fixup the field types to match their use.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond
pnfs/blocklayout: pnfs_block_dev_map uses bytes, not sectors
Fixup the field types to match their use.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trondmy@gmail.com>
show more ...
|
#
41963c10 |
| 22-Aug-2016 |
Benjamin Coddington <bcodding@redhat.com> |
pnfs/blocklayout: update last_write_offset atomically with extents
Block/SCSI layout write completion may add committable extents to the extent tree before updating the layout's last-written byte un
pnfs/blocklayout: update last_write_offset atomically with extents
Block/SCSI layout write completion may add committable extents to the extent tree before updating the layout's last-written byte under the inode lock. If a sync happens before this value is updated, then prepare_layoutcommit may find and encode these extents which would produce a LAYOUTCOMMIT request whose encoded extents are larger than the request's loca_length.
Fix this by using a last-written byte value that is updated atomically with the extent tree so that commitable extents always match.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
09cbfeaf |
| 01-Apr-2016 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to impleme
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable.
Let's stop pretending that pages in page cache are special. They are not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch.
virtual patch
@@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E
@@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E
@@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT
@@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE
@@ @@ - PAGE_CACHE_MASK + PAGE_MASK
@@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E)
@@ expression E; @@ - page_cache_get(E) + get_page(E)
@@ expression E; @@ - page_cache_release(E) + put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
d9186c03 |
| 04-Mar-2016 |
Christoph Hellwig <hch@lst.de> |
nfs/blocklayout: add SCSI layout support
This is a trivial extension to the block layout driver to support the new SCSI layouts draft. There are three changes:
- device identifcation through the
nfs/blocklayout: add SCSI layout support
This is a trivial extension to the block layout driver to support the new SCSI layouts draft. There are three changes:
- device identifcation through the SCSI VPD page. This allows us to directly use the udev generated persistent device names instead of requiring an expensive lookup by crawling every block device node in /dev and reading a signature for it. - use of SCSI persistent reservations to protect device access and allow for robust fencing. On the client sides this just means registering and unregistering a server supplied key. - an optimized LAYOUTCOMMIT payload that doesn't send unessecary fields to the server.
Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
show more ...
|
#
8bb28975 |
| 17-Aug-2015 |
Christoph Hellwig <hch@lst.de> |
pnfs: move common blocklayout XDR defintions to nfs4.h
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
5c83746a |
| 11-Sep-2014 |
Christoph Hellwig <hch@lst.de> |
pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing
This patches moves parsing of the GETDEVICEINFO XDR to kernel space, as well as the management of complex devices. The reason for that is we mi
pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing
This patches moves parsing of the GETDEVICEINFO XDR to kernel space, as well as the management of complex devices. The reason for that is we might have multiple outstanding complex devices after a NOTIFY_DEVICEID4_CHANGE, which device mapper or md can't handle as they claim devices exclusively.
But as is turns out simple striping / concatenation is fairly trivial to implement anyway, so we make our life simpler by reducing the reliance on blkmapd. For now we still use blkmapd by feeding it synthetic SIMPLE device XDR to translate device signatures to device numbers, but in the long runs I have plans to eliminate it entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
871760ce |
| 11-Sep-2014 |
Christoph Hellwig <hch@lst.de> |
pnfs/blocklayout: move all rpc_pipefs related code into a single file
Create a file to house all the rpc_pipefs boilerplate code instead of sprinkling it over a few files.
Signed-off-by: Christoph
pnfs/blocklayout: move all rpc_pipefs related code into a single file
Create a file to house all the rpc_pipefs boilerplate code instead of sprinkling it over a few files.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
9cc47541 |
| 11-Sep-2014 |
Christoph Hellwig <hch@lst.de> |
pnfs/blocklayout: move extent processing to blocklayout.c
This isn't device(id) related, so move it into the main file. Simple move for now, the next commit will clean it up a bit.
Signed-off-by:
pnfs/blocklayout: move extent processing to blocklayout.c
This isn't device(id) related, so move it into the main file. Simple move for now, the next commit will clean it up a bit.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
34dc93c2 |
| 11-Sep-2014 |
Christoph Hellwig <hch@lst.de> |
pnfs/blocklayout: allocate separate pages for the layoutcommit payload
Instead of overflowing the XDR send buffer with our extent list allocate pages and pre-encode the layoutupdate payload into the
pnfs/blocklayout: allocate separate pages for the layoutcommit payload
Instead of overflowing the XDR send buffer with our extent list allocate pages and pre-encode the layoutupdate payload into them. We optimistically allocate a single page use alloc_page and only switch to vmalloc when we have more extents outstanding. Currently there is only a single testcase (xfstests generic/113) which can reproduce large enough extent lists for this to occur.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
20d655d6 |
| 03-Sep-2014 |
Christoph Hellwig <hch@lst.de> |
pnfs/blocklayout: use the device id cache
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
#
8067253c |
| 10-Sep-2014 |
Christoph Hellwig <hch@lst.de> |
pnfs/blocklayout: rewrite extent tracking
Currently the block layout driver tracks extents in three separate data structures:
- the two list of pnfs_block_extent structures returned by the server
pnfs/blocklayout: rewrite extent tracking
Currently the block layout driver tracks extents in three separate data structures:
- the two list of pnfs_block_extent structures returned by the server - the list of sectors that were in invalid state but have been written to - a list of pnfs_block_short_extent structures for LAYOUTCOMMIT
All of these share the property that they are not only highly inefficient data structures, but also that operations on them are even more inefficient than nessecary.
In addition there are various implementation defects like:
- using an int to track sectors, causing corruption for large offsets - incorrect normalization of page or block granularity ranges - insufficient error handling - incorrect synchronization as extents can be modified while they are in use
This patch replace all three data with a single unified rbtree structure tracking all extents, as well as their in-memory state, although we still need to instance for read-only and read-write extent due to the arcane client side COW feature in the block layouts spec.
To fix the problem of extent possibly being modified while in use we make sure to return a copy of the extent for use in the write path - the extent can only be invalidated by a layout recall or return which has to wait until the I/O operations finished due to refcounts on the layout segment.
The new extent tree work similar to the schemes used by block based filesystems like XFS or ext4.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
show more ...
|
#
694e096f |
| 13-Nov-2013 |
Anna Schumaker <bjschuma@netapp.com> |
NFS: Enabling v4.2 should not recompile nfsd and lockd
When CONFIG_NFS_V4_2 is toggled nfsd and lockd will be recompiled, instead of only the nfs client. This patch moves a small amount of code int
NFS: Enabling v4.2 should not recompile nfsd and lockd
When CONFIG_NFS_V4_2 is toggled nfsd and lockd will be recompiled, instead of only the nfs client. This patch moves a small amount of code into the client directory to avoid unnecessary recompiles.
Signed-off-by: Anna Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
4385bab1 |
| 06-May-2013 |
Al Viro <viro@zeniv.linux.org.uk> |
make blkdev_put() return void
same story as with the previous patches - note that return value of blkdev_close() is lost, since there's nowhere the caller (__fput()) could return it to.
Signed-off-
make blkdev_put() return void
same story as with the previous patches - note that return value of blkdev_close() is lost, since there's nowhere the caller (__fput()) could return it to.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
af283885 |
| 25-Sep-2012 |
Peng Tao <bergwolf@gmail.com> |
pnfsblock: cleanup nfs4_blkdev_get
It is not needed at all and it is messing with return values...
Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Peng Tao <tao.peng@emc.com
pnfsblock: cleanup nfs4_blkdev_get
It is not needed at all and it is messing with return values...
Reported-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
fe6e1e8d |
| 23-Aug-2012 |
Peng Tao <bergwolf@gmail.com> |
pnfsblock: fix partial page buffer wirte
If applications use flock to protect its write range, generic NFS will not do read-modify-write cycle at page cache level. Therefore LD should know how to ha
pnfsblock: fix partial page buffer wirte
If applications use flock to protect its write range, generic NFS will not do read-modify-write cycle at page cache level. Therefore LD should know how to handle non-sector aligned writes. Otherwise there will be data corruption.
Cc: stable <stable@vger.kernel.org> Signed-off-by: Peng Tao <tao.peng@emc.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
5ffaf855 |
| 11-Mar-2012 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
NFS: replace global bl_wq with per-net one
This queue is used for sleeping in kernel and it have to be per-net since we don't want to wake any other waiters except in out network nemespace. BTW, mov
NFS: replace global bl_wq with per-net one
This queue is used for sleeping in kernel and it have to be per-net since we don't want to wake any other waiters except in out network nemespace. BTW, move wq to per-net data is easy. But some way to handle upcall timeouts have to be provided. On message destroy in case of timeout, tasks, waiting for message to be delivered, should be awakened. Thus, some data required to located the right wait queue. Chosen solution replaces rpc_pipe_msg object with new introduced bl_pipe_msg object, containing rpc_pipe_msg and proper wq.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
cb9c1c4a |
| 11-Mar-2012 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
NFS: replace global bl_mount_reply with per-net one
This global variable is used for blocklayout downcall and thus can be corrupted if case of existence of multiple networks namespaces.
Signed-off-
NFS: replace global bl_mount_reply with per-net one
This global variable is used for blocklayout downcall and thus can be corrupted if case of existence of multiple networks namespaces.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
9e2e74db |
| 10-Jan-2012 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
NFS: blocklayout pipe creation per network namespace context introduced
This patch implements blocklayout pipe creation and registration per each existent network namespace. This was achived by regi
NFS: blocklayout pipe creation per network namespace context introduced
This patch implements blocklayout pipe creation and registration per each existent network namespace. This was achived by registering NFS per-net operations, responsible for blocklayout pipe allocation/register and unregister/destruction instead of initialization and destruction of static "bl_device_pipe" pipe (this one was removed). Note, than pointer to network blocklayout pipe is stored in per-net "nfs_net" structure, because allocating of one more per-net structure for blocklayout module looks redundant. This patch also changes dev_remove() function prototype (and all it's callers, where it' requied) by adding network namespace pointer parameter, which is used to discover proper blocklayout pipe for rpc_queue_upcall() call.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
c239d83b |
| 26-Dec-2011 |
Stanislav Kinsbursky <skinsbursky@parallels.com> |
SUNRPC: split SUNPRC PipeFS dentry and private pipe data creation
This patch is a final step towards to removing PipeFS inode references from kernel code other than PipeFS itself. It makes all kerne
SUNRPC: split SUNPRC PipeFS dentry and private pipe data creation
This patch is a final step towards to removing PipeFS inode references from kernel code other than PipeFS itself. It makes all kernel SUNRPC PipeFS users depends on pipe private data, which state depend on their specific operations, etc. This patch completes SUNRPC PipeFS preparations and allows to create pipe private data and PipeFS dentries independently. Next step will be making SUNPRC PipeFS dentries allocated by SUNRPC PipeFS network namespace aware routines.
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
7c5465d6 |
| 12-Jan-2012 |
Peng Tao <bergwolf@gmail.com> |
pnfsblock: alloc short extent before submit bio
As discussed earlier, it is better for block client to allocate memory for tracking extents state before submitting bio. So the patch does it by alloc
pnfsblock: alloc short extent before submit bio
As discussed earlier, it is better for block client to allocate memory for tracking extents state before submitting bio. So the patch does it by allocating a short_extent for every INVALID extent touched by write pagelist and for every zeroing page we created, saving them in layout header. Then in end_io we can just use them to create commit list items and avoid memory allocation there.
Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
60c52e3a |
| 12-Jan-2012 |
Peng Tao <bergwolf@gmail.com> |
pnfsblock: cleanup bl_mark_sectors_init
It does not need to manipulate on partial initialized blocks. Writeback code takes care of it.
Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benn
pnfsblock: cleanup bl_mark_sectors_init
It does not need to manipulate on partial initialized blocks. Writeback code takes care of it.
Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
c1225158 |
| 23-Sep-2011 |
Peng Tao <bergwolf@gmail.com> |
SUNRPC/NFS: make rpc pipe upcall generic
The same function is used by idmap, gss and blocklayout code. Make it generic.
Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umic
SUNRPC/NFS: make rpc pipe upcall generic
The same function is used by idmap, gss and blocklayout code. Make it generic.
Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|
#
fdc17abb |
| 23-Sep-2011 |
Jim Rees <rees@umich.edu> |
pnfsblock: fix size of upcall message
Make the status field explicitly 32 bits. "...it's unlikely that the kernel and userspace would differ on the size of an int here, but it might be a good idea
pnfsblock: fix size of upcall message
Make the status field explicitly 32 bits. "...it's unlikely that the kernel and userspace would differ on the size of an int here, but it might be a good idea to go ahead and make that explicitly 32 bits in case we end up dealing with more exotic arches at some point in the future."
Suggested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Benny Halevy <bhalevy@tonian.com> Cc: stable@kernel.org [3.0] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
show more ...
|