#
039a2e80 |
| 25-Apr-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: reinstate thread check for retries
Allowing retries for everything is arguably the right thing to do, now that every command type is async read from the start. But it's exposed a few is
io_uring/rw: reinstate thread check for retries
Allowing retries for everything is arguably the right thing to do, now that every command type is async read from the start. But it's exposed a few issues around missing check for a retry (which cca6571381a0 exposed), and the fixup commit for that isn't necessarily 100% sound in terms of iov_iter state.
For now, just revert these two commits. This unfortunately then re-opens the fact that -EAGAIN can get bubbled to userspace for some cases where the kernel very well could just sanely retry them. But until we have all the conditions covered around that, we cannot safely enable that.
This reverts commit df604d2ad480fcf7b39767280c9093e13b1de952. This reverts commit cca6571381a0bdc88021a1f7a4c2349df21279f7.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
df604d2a |
| 17-Apr-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: ensure retry condition isn't lost
A previous commit removed the checking on whether or not it was possible to retry a request, since it's now possible to retry any of them. This would p
io_uring/rw: ensure retry condition isn't lost
A previous commit removed the checking on whether or not it was possible to retry a request, since it's now possible to retry any of them. This would previously have caused the request to have been ended with an error, but now the retry condition can simply get lost instead.
Cleanup the retry handling and always just punt it to task_work, which will queue it with io-wq appropriately.
Reported-by: Changhui Zhong <czhong@redhat.com> Tested-by: Ming Lei <ming.lei@redhat.com> Fixes: cca6571381a0 ("io_uring/rw: cleanup retry path") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
7c98f7cb |
| 28-Aug-2023 |
Miklos Szeredi <mszeredi@redhat.com> |
remove call_{read,write}_iter() functions
These have no clear purpose. This is effectively a revert of commit bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()").
The patch was
remove call_{read,write}_iter() functions
These have no clear purpose. This is effectively a revert of commit bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()").
The patch was created with the help of a coccinelle script.
Fixes: bb7462b6fd64 ("vfs: use helpers for calling f_op->{read,write}_iter()") Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
414d0f45 |
| 20-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/alloc_cache: switch to array based caching
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage.
Outsi
io_uring/alloc_cache: switch to array based caching
Currently lists are being used to manage this, but best practice is usually to have these in an array instead as that it cheaper to manage.
Outside of that detail, games are also played with KASAN as the list is inside the cached entry itself.
Finally, all users of this need a struct io_cache_entry embedded in their struct, which is union'ized with something else in there that isn't used across the free -> realloc cycle.
Get rid of all of that, and simply have it be an array. This will not change the memory used, as we're just trading an 8-byte member entry for the per-elem array size.
This reduces the overhead of the recycled allocations, and it reduces the amount of code code needed to support recycling to about half of what it currently is.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
d6f911a6 |
| 18-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: add iovec recycling
Let the io_async_rw hold on to the iovec and reuse it, rather than always allocate and free them.
Also enables KASAN for the iovec entries, so that reuse can be det
io_uring/rw: add iovec recycling
Let the io_async_rw hold on to the iovec and reuse it, rather than always allocate and free them.
Also enables KASAN for the iovec entries, so that reuse can be detected even while they are in the cache.
While doing so, shrink io_async_rw by getting rid of the bigger embedded fast iovec. Since iovecs are being recycled now, shrink it from 8 to 1. This reduces the io_async_rw size from 264 to 160 bytes, a 40% reduction.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
cca65713 |
| 23-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: cleanup retry path
We no longer need to gate a potential retry on whether or not the context matches our original task, as all read/write operations have been fully prepared upfront. Th
io_uring/rw: cleanup retry path
We no longer need to gate a potential retry on whether or not the context matches our original task, as all read/write operations have been fully prepared upfront. This means there's never any re-import needed, and hence we can always retry requests.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0d10bd77 |
| 18-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring: get rid of struct io_rw_state
A separate state struct is not needed anymore, just fold it in with io_async_rw.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
#
a9165b83 |
| 18-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: always setup io_async_rw for read/write requests
read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty
io_uring/rw: always setup io_async_rw for read/write requests
read/write requests try to put everything on the stack, and then alloc and copy if a retry is needed. This necessitates a bunch of nasty code that deals with intermediate state.
Get rid of this, and have the prep side setup everything that is needed upfront, which greatly simplifies the opcode handlers.
This includes adding an alloc cache for io_async_rw, to make it cheap to handle.
In terms of cost, this should be basically free and transparent. For the worst case of {READ,WRITE}_FIXED which didn't need it before, performance is unaffected in the normal peak workload that is being used to test that. Still runs at 122M IOPS.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
e5c12945 |
| 18-Mar-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: refactor io_fill_cqe_req_aux
The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task conte
io_uring: refactor io_fill_cqe_req_aux
The restriction on multishot execution context disallowing io-wq is driven by rules of io_fill_cqe_req_aux(), it should only be called in the master task context, either from the syscall path or in task_work. Since task_work now always takes the ctx lock implying IO_URING_F_COMPLETE_DEFER, we can just assume that the function is always called with its defer argument set to true.
Kill the argument. Also rename the function for more consistency as "fill" in CQE related functions was usually meant for raw interfaces only copying data into the CQ without any locking, waking the user and other accounting "post" functions take care of.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/93423d106c33116c7d06bf277f651aa68b427328.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
8e5b3b89 |
| 18-Mar-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: remove struct io_tw_state::locked
ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which
io_uring: remove struct io_tw_state::locked
ctx is always locked for task_work now, so get rid of struct io_tw_state::locked. Note I'm stopping one step before removing io_tw_state altogether, which is not empty, because it still serves the purpose of indicating which function is a tw callback and forcing users not to invoke them carelessly out of a wrong context. The removal can always be done later.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/e95e1ea116d0bfa54b656076e6a977bc221392a4.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
6e6b8c62 |
| 18-Mar-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/rw: avoid punting to io-wq directly
kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core
io_uring/rw: avoid punting to io-wq directly
kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
210a03c9 |
| 28-Mar-2024 |
Christian Brauner <brauner@kernel.org> |
fs: claw back a few FMODE_* bits
There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to cha
fs: claw back a few FMODE_* bits
There's a bunch of flags that are purely based on what the file operations support while also never being conditionally set or unset. IOW, they're not subject to change for individual files. Imho, such flags don't need to live in f_mode they might as well live in the fops structs itself. And the fops struct already has that lonely mmap_supported_flags member. We might as well turn that into a generic fop_flags member and move a few flags from FMODE_* space into FOP_* space. That gets us four FMODE_* bits back and the ability for new static flags that are about file ops to not have to live in FMODE_* space but in their own FOP_* space. It's not the most beautiful thing ever but it gets the job done. Yes, there'll be an additional pointer chase but hopefully that won't matter for these flags.
I suspect there's a few more we can move into there and that we can also redirect a bunch of new flag suggestions that follow this pattern into the fop_flags field instead of f_mode.
Link: https://lore.kernel.org/r/20240328-gewendet-spargel-aa60a030ef74@brauner Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
#
2a975d42 |
| 01-Apr-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: don't allow multishot reads without NOWAIT support
Supporting multishot reads requires support for NOWAIT, as the alternative would be always having io-wq execute the work item whenever
io_uring/rw: don't allow multishot reads without NOWAIT support
Supporting multishot reads requires support for NOWAIT, as the alternative would be always having io-wq execute the work item whenever the poll readiness triggered. Any fast file type will have NOWAIT support (eg it understands both O_NONBLOCK and IOCB_NOWAIT). If the given file type does not, then simply resort to single shot execution.
Cc: stable@vger.kernel.org Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0a3737db |
| 12-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: return IOU_ISSUE_SKIP_COMPLETE for multishot retry
If read multishot is being invoked from the poll retry handler, then we should return IOU_ISSUE_SKIP_COMPLETE rather than -EAGAIN. If
io_uring/rw: return IOU_ISSUE_SKIP_COMPLETE for multishot retry
If read multishot is being invoked from the poll retry handler, then we should return IOU_ISSUE_SKIP_COMPLETE rather than -EAGAIN. If not, then a CQE will be posted with -EAGAIN rather than triggering the retry when the file is flagged as readable again.
Cc: stable@vger.kernel.org Reported-by: Sargun Dhillon <sargun@meta.com> Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
e0e4ab52 |
| 08-Mar-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: refactor DEFER_TASKRUN multishot checks
We disallow DEFER_TASKRUN multishots from running by io-wq, which is checked by individual opcodes in the issue path. We can consolidate all it in i
io_uring: refactor DEFER_TASKRUN multishot checks
We disallow DEFER_TASKRUN multishots from running by io-wq, which is checked by individual opcodes in the issue path. We can consolidate all it in io_wq_submit_work() at the same time moving the checks out of the hot path.
Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e492f0f11588bb5aa11d7d24e6f53b7c7628afdb.1709905727.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
186daf23 |
| 07-Mar-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
We only use the flag for this purpose, so rename it accordingly. This further prevents various other use cases of it, keeping it clean a
io_uring/kbuf: rename REQ_F_PARTIAL_IO to REQ_F_BL_NO_RECYCLE
We only use the flag for this purpose, so rename it accordingly. This further prevents various other use cases of it, keeping it clean and consistent. Then we can also check it in one spot, when it's being attempted recycled, and remove some dead code in io_kbuf_recycle_ring().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
70581dcd |
| 06-Mar-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: fix mshot read defer taskrun cqe posting
We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions are handled but aux should be explicitly disallowed by opcode handlers.
C
io_uring: fix mshot read defer taskrun cqe posting
We can't post CQEs from io-wq with DEFER_TASKRUN set, normal completions are handled but aux should be explicitly disallowed by opcode handlers.
Cc: stable@vger.kernel.org Fixes: fc68fcda04910 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6fb7cba6f5366da25f4d3eb95273f062309d97fa.1709740837.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
3fb1764c |
| 12-Feb-2024 |
Kuniyuki Iwashima <kuniyu@amazon.com> |
io_uring: Don't include af_unix.h.
Changes to AF_UNIX trigger rebuild of io_uring, but io_uring does not use AF_UNIX anymore.
Let's not include af_unix.h and instead include necessary headers.
Sig
io_uring: Don't include af_unix.h.
Changes to AF_UNIX trigger rebuild of io_uring, but io_uring does not use AF_UNIX anymore.
Let's not include af_unix.h and instead include necessary headers.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240212234236.63714-1-kuniyu@amazon.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
949249e2 |
| 29-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: remove dead file == NULL check
Any read/write opcode has needs_file == true, which means that we would've failed the request long before reaching the issue stage if we didn't successful
io_uring/rw: remove dead file == NULL check
Any read/write opcode has needs_file == true, which means that we would've failed the request long before reaching the issue stage if we didn't successfully assign a file. This check has been dead forever, and is really a leftover from generic code.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
95041b93 |
| 29-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring: add io_file_can_poll() helper
This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at leas
io_uring: add io_file_can_poll() helper
This adds a flag to avoid dipping dereferencing file and then f_op to figure out if the file has a poll handler defined or not. We generally call this at least twice for networked workloads, and if using ring provided buffers, we do it on every buffer selection. Particularly the latter is troublesome, as it's otherwise a very fast operation.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
c79f52f0 |
| 27-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: ensure poll based multishot read retries appropriately
io_read_mshot() always relies on poll triggering retries, and this works fine as long as we do a retry per size of the buffer bein
io_uring/rw: ensure poll based multishot read retries appropriately
io_read_mshot() always relies on poll triggering retries, and this works fine as long as we do a retry per size of the buffer being read. The buffer size is given by the size of the buffer(s) in the given buffer group ID.
But if we're reading less than what is available, then we don't always get to read everything that is available. For example, if the buffers available are 32 bytes and we have 64 bytes to read, then we'll correctly read the first 32 bytes and then wait for another poll trigger before we attempt the next read. This next poll trigger may never happen, in which case we just sit forever and never make progress, or it may trigger at some point in the future, and now we're just delivering the available data much later than we should have.
io_read_mshot() could do retries itself, but that is wasteful as we'll be going through all of __io_read() again, and most likely in vain. Rather than do that, bump our poll reference count and have io_poll_check_events() do one more loop and check with vfs_poll() if we have more data to read. If we do, io_read_mshot() will get invoked again directly and we'll read the next chunk.
io_poll_multishot_retry() must only get called from inside io_poll_issue(), which is our multishot retry handler, as we know we already "own" the request at this point.
Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/1041 Fixes: fc68fcda0491 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
fe80eb15 |
| 10-Jan-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: cleanup io_rw_done()
This originally came from the aio side, and it's laid out rather oddly. The common case here is that we either get -EIOCBQUEUED from submitting an async request, or
io_uring/rw: cleanup io_rw_done()
This originally came from the aio side, and it's laid out rather oddly. The common case here is that we either get -EIOCBQUEUED from submitting an async request, or that we complete the request correctly with the given number of bytes. Handling the odd internal restart error codes is not a common operation.
Lay it out a bit more optimally that better explains the normal flow, and switch to avoiding the indirect call completely as this is our kiocb and we know the completion handler can only be one of two possible variants. While at it, move it to where it belongs in the file, with fellow end IO helpers.
Outside of being easier to read, this also reduces the text size of the function by 24 bytes for me on arm64.
Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
0a535edd |
| 21-Dec-2023 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rw: ensure io->bytes_done is always initialized
If IOSQE_ASYNC is set and we fail importing an iovec for a readv or writev request, then we leave ->bytes_done uninitialized and hence the ev
io_uring/rw: ensure io->bytes_done is always initialized
If IOSQE_ASYNC is set and we fail importing an iovec for a readv or writev request, then we leave ->bytes_done uninitialized and hence the eventual failure CQE posted can potentially have a random res value rather than the expected -EINVAL.
Setup ->bytes_done before potentially failing, so we have a consistent value if we fail the request early.
Cc: stable@vger.kernel.org Reported-by: xingwei lee <xrivendell7@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
b66509b8 |
| 01-Dec-2023 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: split out cmd api into a separate header
linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring
io_uring: split out cmd api into a separate header
linux/io_uring.h is slowly becoming a rubbish bin where we put anything exposed to other subsystems. For instance, the task exit hooks and io_uring cmd infra are completely orthogonal and don't need each other's definitions. Start cleaning it up by splitting out all command bits into a new header file.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7ec50bae6e21f371d3850796e716917fc141225a.1701391955.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
#
e5375929 |
| 06-Nov-2023 |
Dylan Yudaken <dyudaken@gmail.com> |
io_uring: do not clamp read length for multishot read
When doing a multishot read, the code path reuses the old read paths. However this breaks an assumption built into those paths, namely that stru
io_uring: do not clamp read length for multishot read
When doing a multishot read, the code path reuses the old read paths. However this breaks an assumption built into those paths, namely that struct io_rw::len is available for reuse by __io_import_iovec.
For multishot this results in len being set for the first receive call, and then subsequent calls are clamped to that buffer length incorrectly.
Instead keep len as zero after recycling buffers, to reuse the full buffer size of the next selected buffer.
Fixes: fc68fcda0491 ("io_uring/rw: add support for IORING_OP_READ_MULTISHOT") Signed-off-by: Dylan Yudaken <dyudaken@gmail.com> Link: https://lore.kernel.org/r/20231106203909.197089-4-dyudaken@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|