#
8b1d7239 |
| 20-Jan-2024 |
Helge Deller <deller@gmx.de> |
parisc: Fix random data corruption from exception handler
The current exception handler implementation, which assists when accessing user space memory, may exhibit random data corruption if the comp
parisc: Fix random data corruption from exception handler
The current exception handler implementation, which assists when accessing user space memory, may exhibit random data corruption if the compiler decides to use a different register than the specified register %r29 (defined in ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another register, the fault handler will nevertheless store -EFAULT into %r29 and thus trash whatever this register is used for. Looking at the assembly I found that this happens sometimes in emulate_ldd().
To solve the issue, the easiest solution would be if it somehow is possible to tell the fault handler which register is used to hold the error code. Using %0 or %1 in the inline assembly is not posssible as it will show up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not convert to an integer.
This patch takes another, better and more flexible approach: We extend the __ex_table (which is out of the execution path) by one 32-word. In this word we tell the compiler to insert the assembler instruction "or %r0,%r0,%reg", where %reg references the register which the compiler choosed for the error return code. In case of an access failure, the fault handler finds the __ex_table entry and can examine the opcode. The used register is encoded in the lowest 5 bits, and the fault handler can then store -EFAULT into this register.
Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT config option any longer.
Signed-off-by: Helge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org> # v6.0+
show more ...
|
#
aa1bb8b6 |
| 10-Aug-2023 |
Helge Deller <deller@gmx.de> |
parisc: fault: Use C99 arrary initializers
Sparse wants C99 array initializers.
Signed-off-by: Helge Deller <deller@gmx.de>
|
#
ea3f8272 |
| 30-Jun-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
parisc: fix expand_stack() conversion
In commit 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") I tried to deal with the remaining odd page fault handling cases. The odde
parisc: fix expand_stack() conversion
In commit 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") I tried to deal with the remaining odd page fault handling cases. The oddest one is ia64, which has stacks that grow both up and down. And because ia64 was _so_ odd, I asked people to verify the end result.
But a close second oddity is parisc, which is the only one that has a main stack growing up (our "CONFIG_STACK_GROWSUP" config option). But it looked obvious enough that I didn't worry about it.
I should have worried a bit more. Not because it was particularly complex, but because I just used the wrong variable name.
The previous vma isn't called "prev", it's called "prev_vma". Blush.
Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
8d7071af |
| 24-Jun-2023 |
Linus Torvalds <torvalds@linux-foundation.org> |
mm: always expand the stack with the mmap write lock held
This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from
mm: always expand the stack with the mmap write lock held
This finishes the job of always holding the mmap write lock when extending the user stack vma, and removes the 'write_locked' argument from the vm helper functions again.
For some cases, we just avoid expanding the stack at all: drivers and page pinning really shouldn't be extending any stacks. Let's see if any strange users really wanted that.
It's worth noting that architectures that weren't converted to the new lock_mm_and_find_vma() helper function are left using the legacy "expand_stack()" function, but it has been changed to drop the mmap_lock and take it for writing while expanding the vma. This makes it fairly straightforward to convert the remaining architectures.
As a result of dropping and re-taking the lock, the calling conventions for this function have also changed, since the old vma may no longer be valid. So it will now return the new vma if successful, and NULL - and the lock dropped - if the area could not be extended.
Tested-by: Vegard Nossum <vegard.nossum@oracle.com> Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64 Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
15261678 |
| 31-Jan-2023 |
Al Viro <viro@zeniv.linux.org.uk> |
parisc: fix livelock in uaccess
parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end u
parisc: fix livelock in uaccess
parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling" If e.g. get_user() triggers a page fault and a fatal signal is caught, we might end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything to page tables. In such case we must *not* return to the faulting insn - that would repeat the entire thing without making any progress; what we need instead is to treat that as failed (user) memory access.
Tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|
#
16bf37bf |
| 15-Jul-2022 |
Jason Wang <wangborong@cdjrlc.com> |
parisc: Fix comment typo in fault.c
The double `the' is duplicated in line 41, remove one.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Helge Deller <deller@gmx.de>
|
#
d9272525 |
| 30-May-2022 |
Peter Xu <peterx@redhat.com> |
mm: avoid unnecessary page fault retires on shared memory types
I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon
mm: avoid unnecessary page fault retires on shared memory types
I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).
Then after that throttling we return VM_FAULT_RETRY.
We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock.
However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary.
It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all.
To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that.
To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it.
This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs:
Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%)
I believe it could help more than that.
We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward.
Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.
I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is.
Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vineet Gupta <vgupta@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part] Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Stafford Horne <shorne@gmail.com> Cc: David S. Miller <davem@davemloft.net> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Brian Cain <bcain@quicinc.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Weinberger <richard@nod.at> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Will Deacon <will@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Simek <monstr@monstr.eu> Cc: Matt Turner <mattst88@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: David Hildenbrand <david@redhat.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Chris Zankel <chris@zankel.net> Cc: Hugh Dickins <hughd@google.com> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Helge Deller <deller@gmx.de> Cc: Yoshinori Sato <ysato@users.osdn.me> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
#
67c35a3b |
| 16-May-2022 |
John David Anglin <dave.anglin@bell.net> |
parisc: Disable debug code regarding cache flushes in handle_nadtlb_fault()
Change the "BUG" to "WARNING" and disable the message because it triggers occasionally in spite of the check in flush_cach
parisc: Disable debug code regarding cache flushes in handle_nadtlb_fault()
Change the "BUG" to "WARNING" and disable the message because it triggers occasionally in spite of the check in flush_cache_page_if_present.
The pte value extracted for the "from" page in copy_user_highpage is racy and occasionally the pte is cleared before the flush is complete. I assume that the page is simultaneously flushed by flush_cache_mm before the pte is cleared as nullifying the fdc doesn't seem to cause problems.
I investigated various locking scenarios but I wasn't able to find a way to sequence the flushes. This code is called for every COW break and locks impact performance.
This patch is related to the bigger cache flush patch because we need the pte on PA8800/PA8900 to flush using the vma context. I have also seen this from copy_to_user_page and copy_from_user_page.
The messages appear infrequently when enabled.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
show more ...
|
#
e00b0a2a |
| 09-Mar-2022 |
John David Anglin <dave.anglin@bell.net> |
parisc: Fix handling off probe non-access faults
Currently, the parisc kernel does not fully support non-access TLB fault handling for probe instructions. In the fast path, we set the target registe
parisc: Fix handling off probe non-access faults
Currently, the parisc kernel does not fully support non-access TLB fault handling for probe instructions. In the fast path, we set the target register to zero if it is not a shadowed register. The slow path is not implemented, so we call do_page_fault. The architecture indicates that non-access faults should not cause a page fault from disk.
This change adds to code to provide non-access fault support for probe instructions. It also modifies the handling of faults on userspace so that if the address lies in a valid VMA and the access type matches that for the VMA, the probe target register is set to one. Otherwise, the target register is set to zero.
This was done to make probe instructions more useful for userspace. Probe instructions are not very useful if they set the target register to zero whenever a page is not present in memory. Nominally, the purpose of the probe instruction is determine whether read or write access to a given address is allowed.
This fixes a problem in function pointer comparison noticed in the glibc testsuite (stdio-common/tst-vfprintf-user-type). The same problem is likely in glibc (_dl_lookup_address).
V2 adds flush and lpa instruction support to handle_nadtlb_fault.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
show more ...
|
#
36ef159f |
| 14-Jan-2022 |
Qi Zheng <zhengqi.arch@bytedance.com> |
mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit
Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") allowed VM_FAULT_RETRY for multiple times, the FAULT_FLAG_ALLOW_
mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit
Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times") allowed VM_FAULT_RETRY for multiple times, the FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page fault path, so the following check is no longer needed:
flags & FAULT_FLAG_ALLOW_RETRY
So just remove it.
[akpm@linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
20dda87b |
| 04-Jan-2022 |
John David Anglin <dave.anglin@bell.net> |
parisc: Enhance page fault termination message
In debugging kernel panics, I believe it is useful to know what type of page fault caused the termination. "Bad Address" is too vague.
Signed-off-by:
parisc: Enhance page fault termination message
In debugging kernel panics, I believe it is useful to know what type of page fault caused the termination. "Bad Address" is too vague.
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
show more ...
|
#
9d90a908 |
| 04-Jan-2022 |
John David Anglin <dave.anglin@bell.net> |
parisc: Don't call faulthandler_disabled() in do_page_fault()
It is dangerous to call faulthandler_disabled() when user_mode(regs) is true. The task pagefault_disabled counter is racy and it is not
parisc: Don't call faulthandler_disabled() in do_page_fault()
It is dangerous to call faulthandler_disabled() when user_mode(regs) is true. The task pagefault_disabled counter is racy and it is not updated atomically on parisc. As a result, calling faulthandler_disabled() may cause erroneous termination.
We now handle execption fixups and termination when user_mode(regs) is false in handle_interruption(). Thus, we can just remove the faulthandler_disabled() check from do_page_fault().
Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de>
show more ...
|
#
4b9d2a73 |
| 23-Dec-2021 |
Helge Deller <deller@gmx.de> |
parisc: Switch user access functions to signal errors in r29 instead of r8
Use register r29 instead of register r8 to signal faults when accessing user memory. In case of faults, the fixup routine w
parisc: Switch user access functions to signal errors in r29 instead of r8
Use register r29 instead of register r8 to signal faults when accessing user memory. In case of faults, the fixup routine will store -EFAULT in this register.
This change saves up to 752 bytes on a 32bit kernel, partly because the compiler doesn't need to save and restore the old r8 value on the stack.
bloat-o-meter results for usage with r29 register: add/remove: 0/0 grow/shrink: 23/86 up/down: 228/-980 (-752)
bloat-o-meter results for usage with r28 register: add/remove: 0/0 grow/shrink: 28/83 up/down: 296/-956 (-660)
Signed-off-by: Helge Deller <deller@gmx.de>
show more ...
|
#
a348eab3 |
| 04-May-2021 |
Helge Deller <deller@gmx.de> |
parisc: make parisc_acctyp() available outside of faults.c
When adding kfence support, we need to tell kfence_handle_page_fault() if the interrupted assembler statement is a read or write operation.
parisc: make parisc_acctyp() available outside of faults.c
When adding kfence support, we need to tell kfence_handle_page_fault() if the interrupted assembler statement is a read or write operation.
Signed-off-by: Helge Deller <deller@gmx.de>
show more ...
|
#
df561f66 |
| 23-Aug-2020 |
Gustavo A. R. Silva <gustavoars@kernel.org> |
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through mar
treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case.
[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
show more ...
|
#
af8a7926 |
| 12-Aug-2020 |
Peter Xu <peterx@redhat.com> |
mm/parisc: use general page fault accounting
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page f
mm/parisc: use general page fault accounting
Use the general page fault accounting by passing regs into handle_mm_fault(). It naturally solve the issue of multiple page fault accounting when page fault retry happened.
Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in handle_mm_fault().
Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Link: http://lkml.kernel.org/r/20200707225021.200906-16-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
bce617ed |
| 12-Aug-2020 |
Peter Xu <peterx@redhat.com> |
mm: do page fault accounting in handle_mm_fault
Patch series "mm: Page fault accounting cleanups", v5.
This is v5 of the pf accounting cleanup series. It originates from Gerald Schaefer's report o
mm: do page fault accounting in handle_mm_fault
Patch series "mm: Page fault accounting cleanups", v5.
This is v5 of the pf accounting cleanup series. It originates from Gerald Schaefer's report on an issue a week ago regarding to incorrect page fault accountings for retried page fault after commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple times"):
https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
What this series did:
- Correct page fault accounting: we do accounting for a page fault (no matter whether it's from #PF handling, or gup, or anything else) only with the one that completed the fault. For example, page fault retries should not be counted in page fault counters. Same to the perf events.
- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf event is used in an adhoc way across different archs.
Case (1): for many archs it's done at the entry of a page fault handler, so that it will also cover e.g. errornous faults.
Case (2): for some other archs, it is only accounted when the page fault is resolved successfully.
Case (3): there're still quite some archs that have not enabled this perf event.
Since this series will touch merely all the archs, we unify this perf event to always follow case (1), which is the one that makes most sense. And since we moved the accounting into handle_mm_fault, the other two MAJ/MIN perf events are well taken care of naturally.
- Unify definition of "major faults": the definition of "major fault" is slightly changed when used in accounting (not VM_FAULT_MAJOR). More information in patch 1.
- Always account the page fault onto the one that triggered the page fault. This does not matter much for #PF handlings, but mostly for gup. More information on this in patch 25.
Patchset layout:
Patch 1: Introduced the accounting in handle_mm_fault(), not enabled. Patch 2-23: Enable the new accounting for arch #PF handlers one by one. Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.) Patch 25: Cleanup GUP task_struct pointer since it's not needed any more
This patch (of 25):
This is a preparation patch to move page fault accountings into the general code in handle_mm_fault(). This includes both the per task flt_maj/flt_min counters, and the major/minor page fault perf events. To do this, the pt_regs pointer is passed into handle_mm_fault().
PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault handlers.
So far, all the pt_regs pointer that passed into handle_mm_fault() is NULL, which means this patch should have no intented functional change.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Chris Zankel <chris@zankel.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nick Hu <nickhu@andestech.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
3e4e28c5 |
| 09-Jun-2020 |
Michel Lespinasse <walken@google.com> |
mmap locking API: convert mmap_sem API comments
Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead.
Signed-off-by: Michel Lespinasse <walken@
mmap locking API: convert mmap_sem API comments
Convert comments that reference old mmap_sem APIs to reference corresponding new mmap locking APIs instead.
Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
d8ed45c5 |
| 09-Jun-2020 |
Michel Lespinasse <walken@google.com> |
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead.
The change is generated using c
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead.
The change is generated using coccinelle with the following rule:
// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .
@@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm)
Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
4064b982 |
| 02-Apr-2020 |
Peter Xu <peterx@redhat.com> |
mm: allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1].
Before this patch we only allow a page fault to retry once. We achieved this by clearing
mm: allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1].
Before this patch we only allow a page fault to retry once. We achieved this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing handle_mm_fault() the second time. This was majorly used to avoid unexpected starvation of the system by looping over forever to handle the page fault on a single page. However that should hardly happen, and after all for each code path to return a VM_FAULT_RETRY we'll first wait for a condition (during which time we should possibly yield the cpu) to happen before VM_FAULT_RETRY is really returned.
This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY flag when we receive VM_FAULT_RETRY. It means that the page fault handler now can retry the page fault for multiple times if necessary without the need to generate another page fault event. Meanwhile we still keep the FAULT_FLAG_TRIED flag so page fault handler can still identify whether a page fault is the first attempt or not.
Then we'll have these combinations of fault flags (only considering ALLOW_RETRY flag and TRIED flag):
- ALLOW_RETRY and !TRIED: this means the page fault allows to retry, and this is the first try
- ALLOW_RETRY and TRIED: this means the page fault allows to retry, and this is not the first try
- !ALLOW_RETRY and !TRIED: this means the page fault does not allow to retry at all
- !ALLOW_RETRY and TRIED: this is forbidden and should never be used
In existing code we have multiple places that has taken special care of the first condition above by checking against (fault_flags & FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect the first retry of a page fault by checking against both (fault_flags & FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now even the 2nd try will have the ALLOW_RETRY set, then use that helper in all existing special paths. One example is in __lock_page_or_retry(), now we'll drop the mmap_sem only in the first attempt of page fault and we'll keep it in follow up retries, so old locking behavior will be retained.
This will be a nice enhancement for current code [2] at the same time a supporting material for the future userfaultfd-writeprotect work, since in that work there will always be an explicit userfault writeprotect retry for protected pages, and if that cannot resolve the page fault (e.g., when userfaultfd-writeprotect is used in conjunction with swapped pages) then we'll possibly need a 3rd retry of the page fault. It might also benefit other potential users who will have similar requirement like userfault write-protection.
GUP code is not touched yet and will be covered in follow up patch.
Please read the thread below for more information.
[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/ [2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
dde16072 |
| 02-Apr-2020 |
Peter Xu <peterx@redhat.com> |
mm: introduce FAULT_FLAG_DEFAULT
Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the p
mm: introduce FAULT_FLAG_DEFAULT
Although there're tons of arch-specific page fault handlers, most of them are still sharing the same initial value of the page fault flags. Say, merely all of the page fault handlers would allow the fault to be retried, and they also allow the fault to respond to SIGKILL.
Let's define a default value for the fault flags to replace those initial page fault flags that were copied over. With this, it'll be far easier to introduce new fault flag that can be used by all the architectures instead of touching all the archs.
Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
4ef87322 |
| 02-Apr-2020 |
Peter Xu <peterx@redhat.com> |
mm: introduce fault_signal_pending()
For most architectures, we've got a quick path to detect fatal signal after a handle_mm_fault(). Introduce a helper for that quick path.
It cleans the current
mm: introduce fault_signal_pending()
For most architectures, we've got a quick path to detect fatal signal after a handle_mm_fault(). Introduce a helper for that quick path.
It cleans the current codes a bit so we don't need to duplicate the same check across archs. More importantly, this will be an unified place that we handle the signal immediately right after an interrupted page fault, so it'll be much easier for us if we want to change the behavior of handling signals later on for all the archs.
Note that currently only part of the archs are using this new helper, because some archs have their own way to handle signals. In the follow up patches, we'll try to apply this helper to all the rest of archs.
Another note is that the "regs" parameter in the new helper is not used yet. It'll be used very soon. Now we kept it in this patch only to avoid touching all the archs again in the follow up patches.
[peterx@redhat.com: fix sparse warnings] Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1 Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Brian Geffon <bgeffon@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Bobby Powers <bobbypowers@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: Denis Plotnikov <dplotnikov@virtuozzo.com> Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Kirill A . Shutemov" <kirill@shutemov.name> Cc: Martin Cracauer <cracauer@cons.org> Cc: Marty McFadden <mcfadden8@llnl.gov> Cc: Matthew Wilcox <willy@infradead.org> Cc: Maya Gokhale <gokhale2@llnl.gov> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|
#
12d1402c |
| 31-Jul-2019 |
Helge Deller <deller@gmx.de> |
parisc: Mark expected switch fall-throughs in fault.c
Fix a fall-through warning in fault.c.
Fixes: a035d552a93b ("Makefile: Globally enable fall-through warning") Signed-off-by: Helge Deller <dell
parisc: Mark expected switch fall-throughs in fault.c
Fix a fall-through warning in fault.c.
Fixes: a035d552a93b ("Makefile: Globally enable fall-through warning") Signed-off-by: Helge Deller <deller@gmx.de>
show more ...
|
#
2e1661d2 |
| 23-May-2019 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Remove the task parameter from force_sig_fault
As synchronous exceptions really only make sense against the current task (otherwise how are you synchronous) remove the task parameter from fr
signal: Remove the task parameter from force_sig_fault
As synchronous exceptions really only make sense against the current task (otherwise how are you synchronous) remove the task parameter from from force_sig_fault to make it explicit that is what is going on.
The two known exceptions that deliver a synchronous exception to a stopped ptraced task have already been changed to force_sig_fault_to_task.
The callers have been changed with the following emacs regular expression (with obvious variations on the architectures that take more arguments) to avoid typos:
force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)] -> force_sig_fault(\1,\2,\3)
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|
#
f8eac901 |
| 06-Feb-2019 |
Eric W. Biederman <ebiederm@xmission.com> |
signal: Remove task parameter from force_sig_mceerr
All of the callers pass current into force_sig_mceer so remove the task parameter to make this obvious.
This also makes it clear that force_sig_m
signal: Remove task parameter from force_sig_mceerr
All of the callers pass current into force_sig_mceer so remove the task parameter to make this obvious.
This also makes it clear that force_sig_mceerr passes current into force_sig_info.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
show more ...
|