rw.c - OpenGrok history log for /linux/io

Revision	Date	Author	Comments
# 039a2e80	25-Apr-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: reinstate thread check for retries Allowing retries for everything is arguably the right thing to do, now that every command type is async read from the start. But it's exposed a few is io_uring/rw: reinstate thread check for retries Allowing retries for everything is arguably the right thing to do, now that every command type is async read from the start. But it's exposed a few issues around missing check for a retry (which cca6571381a0 exposed), and the fixup commit for that isn't necessarily 100% sound in terms of iov_iter state. For now, just revert these two commits. This unfortunately then re-opens the fact that -EAGAIN can get bubbled to userspace for some cases where the kernel very well could just sanely retry them. But until we have all the conditions covered around that, we cannot safely enable that. This reverts commit df604d2ad480fcf7b39767280c9093e13b1de952. This reverts commit cca6571381a0bdc88021a1f7a4c2349df21279f7. Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# df604d2a	17-Apr-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: ensure retry condition isn't lost A previous commit removed the checking on whether or not it was possible to retry a request, since it's now possible to retry any of them. This would p io_uring/rw: ensure retry condition isn't lost A previous commit removed the checking on whether or not it was possible to retry a request, since it's now possible to retry any of them. This would previously have caused the request to have been ended with an error, but now the retry condition can simply get lost instead. Cleanup the retry handling and always just punt it to task_work, which will queue it with io-wq appropriately. Reported-by: Changhui Zhong <czhong@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Fixes: cca6571381a0 ("io_uring/rw: cleanup retry path") Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 7c98f7cb	28-Aug-2023	Miklos Szeredi <mszeredi@redhat.com>	remove call_{read,write}_iter() functions These have no clear purpose. This is effectively a revert of commit bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()"). The patch was remove call_{read,write}_iter() functions These have no clear purpose. This is effectively a revert of commit bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()"). The patch was created with the help of a coccinelle script. Fixes: bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> show more ...
# 414d0f45	20-Mar-2024	Jens Axboe <axboe@kernel.dk>	io_uring/alloc_cache: switch to array based caching Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage. Outsi io_uring/alloc_cache: switch to array based caching Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage. Outside of that detail, games are also played with KASAN as the list is inside the cached entry itself. Finally, all users of this need a struct io_cache_entry embedded in their struct, which is union'ized with something else in there that isn't used across the free -> realloc cycle. Get rid of all of that, and simply have it be an array. This will not change the memory used, as we're just trading an 8-byte member entry for the per-elem array size. This reduces the overhead of the recycled allocations, and it reduces the amount of code code needed to support recycling to about half of what it currently is. Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# d6f911a6	18-Mar-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: add iovec recycling Let the io_async_rw hold on to the iovec and reuse it, rather than always allocate and free them. Also enables KASAN for the iovec entries, so that reuse can be det io_uring/rw: add iovec recycling Let the io_async_rw hold on to the iovec and reuse it, rather than always allocate and free them. Also enables KASAN for the iovec entries, so that reuse can be detected even while they are in the cache. While doing so, shrink io_async_rw by getting rid of the bigger embedded fast iovec. Since iovecs are being recycled now, shrink it from 8 to 1. This reduces the io_async_rw size from 264 to 160 bytes, a 40% reduction. Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# cca65713	23-Mar-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: cleanup retry path We no longer need to gate a potential retry on whether or not the context matches our original task, as all read/write operations have been fully prepared upfront. Th io_uring/rw: cleanup retry path We no longer need to gate a potential retry on whether or not the context matches our original task, as all read/write operations have been fully prepared upfront. This means there's never any re-import needed, and hence we can always retry requests. Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 0d10bd77	18-Mar-2024	Jens Axboe <axboe@kernel.dk>	io_uring: get rid of struct io_rw_state A separate state struct is not needed anymore, just fold it in with io_async_rw. Signed-off-by: Jens Axboe <axboe@kernel.dk>
# a9165b83	18-Mar-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: always setup io_async_rw for read/write requests read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty io_uring/rw: always setup io_async_rw for read/write requests read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty code that deals with intermediate state. Get rid of this, and have the prep side setup everything that is needed upfront, which greatly simplifies the opcode handlers. This includes adding an alloc cache for io_async_rw, to make it cheap to handle. In terms of cost, this should be basically free and transparent. For the worst case of {READ,WRITE}_FIXED which didn't need it before, performance is unaffected in the normal peak workload that is being used to test that. Still runs at 122M IOPS. Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# e5c12945	18-Mar-2024	Pavel Begunkov <asml.silence@gmail.com>	io_uring: refactor io_fill_cqe_req_aux The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task conte io_uring: refactor io_fill_cqe_req_aux The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task context, either from the syscall path or in task_work. Since task_work now always takes the ctx lock implying IO_URING_F_COMPLETE_DEFER, we can just assume that the function is always called with its defer argument set to true. Kill the argument. Also rename the function for more consistency as "fill" in CQE related functions was usually meant for raw interfaces only copying data into the CQ without any locking, waking the user and other accounting "post" functions take care of. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 8e5b3b89	18-Mar-2024	Pavel Begunkov <asml.silence@gmail.com>	io_uring: remove struct io_tw_state::locked ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which io_uring: remove struct io_tw_state::locked ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 6e6b8c62	18-Mar-2024	Pavel Begunkov <asml.silence@gmail.com>	io_uring/rw: avoid punting to io-wq directly kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core io_uring/rw: avoid punting to io-wq directly kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 210a03c9	28-Mar-2024	Christian Brauner <brauner@kernel.org>	fs: claw back a few FMODE_* bits There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to cha fs: claw back a few FMODE_* bits There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to change for individual files. Imho, such flags don't need to live in f_mode they might as well live in the fops structs itself. And the fops struct already has that lonely mmap_supported_flags member. We might as well turn that into a generic fop_flags member and move a few flags from FMODE_* space into FOP_* space. That gets us four FMODE_* bits back and the ability for new static flags that are about file ops to not have to live in FMODE_* space but in their own FOP_* space. It's not the most beautiful thing ever but it gets the job done. Yes, there'll be an additional pointer chase but hopefully that won't matter for these flags. I suspect there's a few more we can move into there and that we can also redirect a bunch of new flag suggestions that follow this pattern into the fop_flags field instead of f_mode. Link: https://lore.kernel.org/r/20240328-gewendet-spargel-aa60a030ef74@brauner Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org> show more ...
# 2a975d42	01-Apr-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: don't allow multishot reads without NOWAIT support Supporting multishot reads requires support for NOWAIT, as the alternative would be always having io-wq execute the work item whenever io_uring/rw: don't allow multishot reads without NOWAIT support Supporting multishot reads requires support for NOWAIT, as the alternative would be always having io-wq execute the work item whenever the poll readiness triggered. Any fast file type will have NOWAIT support (eg it understands both O_NONBLOCK and IOCB_NOWAIT). If the given file type does not, then simply resort to single shot execution. Cc: stable@vger.kernel.org Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 0a3737db	12-Mar-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: return IOU_ISSUE_SKIP_COMPLETE for multishot retry If read multishot is being invoked from the poll retry handler, then we should return IOU_ISSUE_SKIP_COMPLETE rather than -EAGAIN. If io_uring/rw: return IOU_ISSUE_SKIP_COMPLETE for multishot retry If read multishot is being invoked from the poll retry handler, then we should return IOU_ISSUE_SKIP_COMPLETE rather than -EAGAIN. If not, then a CQE will be posted with -EAGAIN rather than triggering the retry when the file is flagged as readable again. Cc: stable@vger.kernel.org Reported-by: Sargun Dhillon <sargun@meta.com> Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# e0e4ab52	08-Mar-2024	Pavel Begunkov <asml.silence@gmail.com>	io_uring: refactor DEFER_TASKRUN multishot checks We disallow DEFER_TASKRUN multishots from running by io-wq, which is checked by individual opcodes in the issue path. We can consolidate all it in i io_uring: refactor DEFER_TASKRUN multishot checks We disallow DEFER_TASKRUN multishots from running by io-wq, which is checked by individual opcodes in the issue path. We can consolidate all it in io_wq_submit_work() at the same time moving the checks out of the hot path. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e492f0f11588bb5aa11d7d24e6f53b7c7628afdb.1709905727.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 186daf23	07-Mar-2024	Jens Axboe <axboe@kernel.dk>	io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE We only use the flag for this purpose, so rename it accordingly. This further prevents various other use cases of it, keeping it clean a io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE We only use the flag for this purpose, so rename it accordingly. This further prevents various other use cases of it, keeping it clean and consistent. Then we can also check it in one spot, when it's being attempted recycled, and remove some dead code in io_kbuf_recycle_ring(). Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 70581dcd	06-Mar-2024	Pavel Begunkov <asml.silence@gmail.com>	io_uring: fix mshot read defer taskrun cqe posting We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions are handled but aux should be explicitly disallowed by opcode handlers. C io_uring: fix mshot read defer taskrun cqe posting We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions are handled but aux should be explicitly disallowed by opcode handlers. Cc: stable@vger.kernel.org Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6fb7cba6f5366da25f4d3eb95273f062309d97fa.1709740837.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 3fb1764c	12-Feb-2024	Kuniyuki Iwashima <kuniyu@amazon.com>	io_uring: Don't include af_unix.h. Changes to AF_UNIX trigger rebuild of io_uring, but io_uring does not use AF_UNIX anymore. Let's not include af_unix.h and instead include necessary headers. Sig io_uring: Don't include af_unix.h. Changes to AF_UNIX trigger rebuild of io_uring, but io_uring does not use AF_UNIX anymore. Let's not include af_unix.h and instead include necessary headers. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240212234236.63714-1-kuniyu@amazon.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 949249e2	29-Jan-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: remove dead file == NULL check Any read/write opcode has needs_file == true, which means that we would've failed the request long before reaching the issue stage if we didn't successful io_uring/rw: remove dead file == NULL check Any read/write opcode has needs_file == true, which means that we would've failed the request long before reaching the issue stage if we didn't successfully assign a file. This check has been dead forever, and is really a leftover from generic code. Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 95041b93	29-Jan-2024	Jens Axboe <axboe@kernel.dk>	io_uring: add io_file_can_poll() helper This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at leas io_uring: add io_file_can_poll() helper This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads, and if using ring provided buffers, we do it on every buffer selection. Particularly the latter is troublesome, as it's otherwise a very fast operation. Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# c79f52f0	27-Jan-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: ensure poll based multishot read retries appropriately io_read_mshot() always relies on poll triggering retries, and this works fine as long as we do a retry per size of the buffer bein io_uring/rw: ensure poll based multishot read retries appropriately io_read_mshot() always relies on poll triggering retries, and this works fine as long as we do a retry per size of the buffer being read. The buffer size is given by the size of the buffer(s) in the given buffer group ID. But if we're reading less than what is available, then we don't always get to read everything that is available. For example, if the buffers available are 32 bytes and we have 64 bytes to read, then we'll correctly read the first 32 bytes and then wait for another poll trigger before we attempt the next read. This next poll trigger may never happen, in which case we just sit forever and never make progress, or it may trigger at some point in the future, and now we're just delivering the available data much later than we should have. io_read_mshot() could do retries itself, but that is wasteful as we'll be going through all of __io_read() again, and most likely in vain. Rather than do that, bump our poll reference count and have io_poll_check_events() do one more loop and check with vfs_poll() if we have more data to read. If we do, io_read_mshot() will get invoked again directly and we'll read the next chunk. io_poll_multishot_retry() must only get called from inside io_poll_issue(), which is our multishot retry handler, as we know we already "own" the request at this point. Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/1041 Fixes: fc68fcda0491 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# fe80eb15	10-Jan-2024	Jens Axboe <axboe@kernel.dk>	io_uring/rw: cleanup io_rw_done() This originally came from the aio side, and it's laid out rather oddly. The common case here is that we either get -EIOCBQUEUED from submitting an async request, or io_uring/rw: cleanup io_rw_done() This originally came from the aio side, and it's laid out rather oddly. The common case here is that we either get -EIOCBQUEUED from submitting an async request, or that we complete the request correctly with the given number of bytes. Handling the odd internal restart error codes is not a common operation. Lay it out a bit more optimally that better explains the normal flow, and switch to avoiding the indirect call completely as this is our kiocb and we know the completion handler can only be one of two possible variants. While at it, move it to where it belongs in the file, with fellow end IO helpers. Outside of being easier to read, this also reduces the text size of the function by 24 bytes for me on arm64. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# 0a535edd	21-Dec-2023	Jens Axboe <axboe@kernel.dk>	io_uring/rw: ensure io->bytes_done is always initialized If IOSQE_ASYNC is set and we fail importing an iovec for a readv or writev request, then we leave ->bytes_done uninitialized and hence the ev io_uring/rw: ensure io->bytes_done is always initialized If IOSQE_ASYNC is set and we fail importing an iovec for a readv or writev request, then we leave ->bytes_done uninitialized and hence the eventual failure CQE posted can potentially have a random res value rather than the expected -EINVAL. Setup ->bytes_done before potentially failing, so we have a consistent value if we fail the request early. Cc: stable@vger.kernel.org Reported-by: xingwei lee <xrivendell7@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# b66509b8	01-Dec-2023	Pavel Begunkov <asml.silence@gmail.com>	io_uring: split out cmd api into a separate header linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring io_uring: split out cmd api into a separate header linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring cmd infra are completely orthogonal and don't need each other's definitions. Start cleaning it up by splitting out all command bits into a new header file. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.1701391955.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
# e5375929	06-Nov-2023	Dylan Yudaken <dyudaken@gmail.com>	io_uring: do not clamp read length for multishot read When doing a multishot read, the code path reuses the old read paths. However this breaks an assumption built into those paths, namely that stru io_uring: do not clamp read length for multishot read When doing a multishot read, the code path reuses the old read paths. However this breaks an assumption built into those paths, namely that struct io_rw::len is available for reuse by __io_import_iovec. For multishot this results in len being set for the first receive call, and then subsequent calls are clamped to that buffer length incorrectly. Instead keep len as zero after recycling buffers, to reuse the full buffer size of the next selected buffer. Fixes: fc68fcda0491 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Dylan Yudaken <dyudaken@gmail.com> Link: https://lore.kernel.org/r/20231106203909.197089-4-dyudaken@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk> show more ...
12 3 4