History log of /linux/arch/parisc/mm/fault.c (Results 1 – 25 of 67)
Revision Date Author Comments
# 8b1d7239 20-Jan-2024 Helge Deller <deller@gmx.de>

parisc: Fix random data corruption from exception handler

The current exception handler implementation, which assists when accessing
user space memory, may exhibit random data corruption if the comp

parisc: Fix random data corruption from exception handler

The current exception handler implementation, which assists when accessing
user space memory, may exhibit random data corruption if the compiler decides
to use a different register than the specified register %r29 (defined in
ASM_EXCEPTIONTABLE_REG) for the error code. If the compiler choose another
register, the fault handler will nevertheless store -EFAULT into %r29 and thus
trash whatever this register is used for.
Looking at the assembly I found that this happens sometimes in emulate_ldd().

To solve the issue, the easiest solution would be if it somehow is
possible to tell the fault handler which register is used to hold the error
code. Using %0 or %1 in the inline assembly is not posssible as it will show
up as e.g. %r29 (with the "%r" prefix), which the GNU assembler can not
convert to an integer.

This patch takes another, better and more flexible approach:
We extend the __ex_table (which is out of the execution path) by one 32-word.
In this word we tell the compiler to insert the assembler instruction
"or %r0,%r0,%reg", where %reg references the register which the compiler
choosed for the error return code.
In case of an access failure, the fault handler finds the __ex_table entry and
can examine the opcode. The used register is encoded in the lowest 5 bits, and
the fault handler can then store -EFAULT into this register.

Since we extend the __ex_table to 3 words we can't use the BUILDTIME_TABLE_SORT
config option any longer.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: <stable@vger.kernel.org> # v6.0+

show more ...


# aa1bb8b6 10-Aug-2023 Helge Deller <deller@gmx.de>

parisc: fault: Use C99 arrary initializers

Sparse wants C99 array initializers.

Signed-off-by: Helge Deller <deller@gmx.de>


# ea3f8272 30-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

parisc: fix expand_stack() conversion

In commit 8d7071af8907 ("mm: always expand the stack with the mmap write
lock held") I tried to deal with the remaining odd page fault handling
cases. The odde

parisc: fix expand_stack() conversion

In commit 8d7071af8907 ("mm: always expand the stack with the mmap write
lock held") I tried to deal with the remaining odd page fault handling
cases. The oddest one is ia64, which has stacks that grow both up and
down. And because ia64 was _so_ odd, I asked people to verify the end
result.

But a close second oddity is parisc, which is the only one that has a
main stack growing up (our "CONFIG_STACK_GROWSUP" config option). But
it looked obvious enough that I didn't worry about it.

I should have worried a bit more. Not because it was particularly
complex, but because I just used the wrong variable name.

The previous vma isn't called "prev", it's called "prev_vma". Blush.

Fixes: 8d7071af8907 ("mm: always expand the stack with the mmap write lock held")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 8d7071af 24-Jun-2023 Linus Torvalds <torvalds@linux-foundation.org>

mm: always expand the stack with the mmap write lock held

This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from

mm: always expand the stack with the mmap write lock held

This finishes the job of always holding the mmap write lock when
extending the user stack vma, and removes the 'write_locked' argument
from the vm helper functions again.

For some cases, we just avoid expanding the stack at all: drivers and
page pinning really shouldn't be extending any stacks. Let's see if any
strange users really wanted that.

It's worth noting that architectures that weren't converted to the new
lock_mm_and_find_vma() helper function are left using the legacy
"expand_stack()" function, but it has been changed to drop the mmap_lock
and take it for writing while expanding the vma. This makes it fairly
straightforward to convert the remaining architectures.

As a result of dropping and re-taking the lock, the calling conventions
for this function have also changed, since the old vma may no longer be
valid. So it will now return the new vma if successful, and NULL - and
the lock dropped - if the area could not be extended.

Tested-by: Vegard Nossum <vegard.nossum@oracle.com>
Tested-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> # ia64
Tested-by: Frank Scheiner <frank.scheiner@web.de> # ia64
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 15261678 31-Jan-2023 Al Viro <viro@zeniv.linux.org.uk>

parisc: fix livelock in uaccess

parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end u

parisc: fix livelock in uaccess

parisc equivalent of 26178ec11ef3 "x86: mm: consolidate VM_FAULT_RETRY handling"
If e.g. get_user() triggers a page fault and a fatal signal is caught, we might
end up with handle_mm_fault() returning VM_FAULT_RETRY and not doing anything
to page tables. In such case we must *not* return to the faulting insn -
that would repeat the entire thing without making any progress; what we need
instead is to treat that as failed (user) memory access.

Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

show more ...


# 16bf37bf 15-Jul-2022 Jason Wang <wangborong@cdjrlc.com>

parisc: Fix comment typo in fault.c

The double `the' is duplicated in line 41, remove one.

Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
Signed-off-by: Helge Deller <deller@gmx.de>


# d9272525 30-May-2022 Peter Xu <peterx@redhat.com>

mm: avoid unnecessary page fault retires on shared memory types

I observed that for each of the shared file-backed page faults, we're very
likely to retry one more time for the 1st write fault upon

mm: avoid unnecessary page fault retires on shared memory types

I observed that for each of the shared file-backed page faults, we're very
likely to retry one more time for the 1st write fault upon no page. It's
because we'll need to release the mmap lock for dirty rate limit purpose
with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()).

Then after that throttling we return VM_FAULT_RETRY.

We did that probably because VM_FAULT_RETRY is the only way we can return
to the fault handler at that time telling it we've released the mmap lock.

However that's not ideal because it's very likely the fault does not need
to be retried at all since the pgtable was well installed before the
throttling, so the next continuous fault (including taking mmap read lock,
walk the pgtable, etc.) could be in most cases unnecessary.

It's not only slowing down page faults for shared file-backed, but also add
more mmap lock contention which is in most cases not needed at all.

To observe this, one could try to write to some shmem page and look at
"pgfault" value in /proc/vmstat, then we should expect 2 counts for each
shmem write simply because we retried, and vm event "pgfault" will capture
that.

To make it more efficient, add a new VM_FAULT_COMPLETED return code just to
show that we've completed the whole fault and released the lock. It's also
a hint that we should very possibly not need another fault immediately on
this page because we've just completed it.

This patch provides a ~12% perf boost on my aarch64 test VM with a simple
program sequentially dirtying 400MB shmem file being mmap()ed and these are
the time it needs:

Before: 650.980 ms (+-1.94%)
After: 569.396 ms (+-1.38%)

I believe it could help more than that.

We need some special care on GUP and the s390 pgfault handler (for gmap
code before returning from pgfault), the rest changes in the page fault
handlers should be relatively straightforward.

Another thing to mention is that mm_account_fault() does take this new
fault as a generic fault to be accounted, unlike VM_FAULT_RETRY.

I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do
not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping
them as-is.

Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [arm part]
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Stafford Horne <shorne@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Will Deacon <will@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Chris Zankel <chris@zankel.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Rich Felker <dalias@libc.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Helge Deller <deller@gmx.de>
Cc: Yoshinori Sato <ysato@users.osdn.me>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

show more ...


# 67c35a3b 16-May-2022 John David Anglin <dave.anglin@bell.net>

parisc: Disable debug code regarding cache flushes in handle_nadtlb_fault()

Change the "BUG" to "WARNING" and disable the message because it triggers
occasionally in spite of the check in flush_cach

parisc: Disable debug code regarding cache flushes in handle_nadtlb_fault()

Change the "BUG" to "WARNING" and disable the message because it triggers
occasionally in spite of the check in flush_cache_page_if_present.

The pte value extracted for the "from" page in copy_user_highpage is racy and
occasionally the pte is cleared before the flush is complete. I assume that
the page is simultaneously flushed by flush_cache_mm before the pte is cleared
as nullifying the fdc doesn't seem to cause problems.

I investigated various locking scenarios but I wasn't able to find a way to
sequence the flushes. This code is called for every COW break and locks impact
performance.

This patch is related to the bigger cache flush patch because we need the pte
on PA8800/PA8900 to flush using the vma context.
I have also seen this from copy_to_user_page and copy_from_user_page.

The messages appear infrequently when enabled.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

show more ...


# e00b0a2a 09-Mar-2022 John David Anglin <dave.anglin@bell.net>

parisc: Fix handling off probe non-access faults

Currently, the parisc kernel does not fully support non-access TLB
fault handling for probe instructions. In the fast path, we set the
target registe

parisc: Fix handling off probe non-access faults

Currently, the parisc kernel does not fully support non-access TLB
fault handling for probe instructions. In the fast path, we set the
target register to zero if it is not a shadowed register. The slow
path is not implemented, so we call do_page_fault. The architecture
indicates that non-access faults should not cause a page fault from
disk.

This change adds to code to provide non-access fault support for
probe instructions. It also modifies the handling of faults on
userspace so that if the address lies in a valid VMA and the access
type matches that for the VMA, the probe target register is set to
one. Otherwise, the target register is set to zero.

This was done to make probe instructions more useful for userspace.
Probe instructions are not very useful if they set the target register
to zero whenever a page is not present in memory. Nominally, the
purpose of the probe instruction is determine whether read or write
access to a given address is allowed.

This fixes a problem in function pointer comparison noticed in the
glibc testsuite (stdio-common/tst-vfprintf-user-type). The same
problem is likely in glibc (_dl_lookup_address).

V2 adds flush and lpa instruction support to handle_nadtlb_fault.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

show more ...


# 36ef159f 14-Jan-2022 Qi Zheng <zhengqi.arch@bytedance.com>

mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit

Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_

mm: remove redundant check about FAULT_FLAG_ALLOW_RETRY bit

Since commit 4064b9827063 ("mm: allow VM_FAULT_RETRY for multiple
times") allowed VM_FAULT_RETRY for multiple times, the
FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page
fault path, so the following check is no longer needed:

flags & FAULT_FLAG_ALLOW_RETRY

So just remove it.

[akpm@linux-foundation.org: coding style fixes]

Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Peter Xu <peterx@redhat.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 20dda87b 04-Jan-2022 John David Anglin <dave.anglin@bell.net>

parisc: Enhance page fault termination message

In debugging kernel panics, I believe it is useful to know what type
of page fault caused the termination. "Bad Address" is too vague.

Signed-off-by:

parisc: Enhance page fault termination message

In debugging kernel panics, I believe it is useful to know what type
of page fault caused the termination. "Bad Address" is too vague.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

show more ...


# 9d90a908 04-Jan-2022 John David Anglin <dave.anglin@bell.net>

parisc: Don't call faulthandler_disabled() in do_page_fault()

It is dangerous to call faulthandler_disabled() when user_mode(regs)
is true. The task pagefault_disabled counter is racy and it is not

parisc: Don't call faulthandler_disabled() in do_page_fault()

It is dangerous to call faulthandler_disabled() when user_mode(regs)
is true. The task pagefault_disabled counter is racy and it is not
updated atomically on parisc. As a result, calling faulthandler_disabled()
may cause erroneous termination.

We now handle execption fixups and termination when user_mode(regs) is
false in handle_interruption(). Thus, we can just remove the
faulthandler_disabled() check from do_page_fault().

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>

show more ...


# 4b9d2a73 23-Dec-2021 Helge Deller <deller@gmx.de>

parisc: Switch user access functions to signal errors in r29 instead of r8

Use register r29 instead of register r8 to signal faults when accessing
user memory. In case of faults, the fixup routine w

parisc: Switch user access functions to signal errors in r29 instead of r8

Use register r29 instead of register r8 to signal faults when accessing
user memory. In case of faults, the fixup routine will store -EFAULT in
this register.

This change saves up to 752 bytes on a 32bit kernel, partly because the
compiler doesn't need to save and restore the old r8 value on the stack.

bloat-o-meter results for usage with r29 register:
add/remove: 0/0 grow/shrink: 23/86 up/down: 228/-980 (-752)

bloat-o-meter results for usage with r28 register:
add/remove: 0/0 grow/shrink: 28/83 up/down: 296/-956 (-660)

Signed-off-by: Helge Deller <deller@gmx.de>

show more ...


# a348eab3 04-May-2021 Helge Deller <deller@gmx.de>

parisc: make parisc_acctyp() available outside of faults.c

When adding kfence support, we need to tell kfence_handle_page_fault()
if the interrupted assembler statement is a read or write operation.

parisc: make parisc_acctyp() available outside of faults.c

When adding kfence support, we need to tell kfence_handle_page_fault()
if the interrupted assembler statement is a read or write operation.

Signed-off-by: Helge Deller <deller@gmx.de>

show more ...


# df561f66 23-Aug-2020 Gustavo A. R. Silva <gustavoars@kernel.org>

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through mar

treewide: Use fallthrough pseudo-keyword

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>

show more ...


# af8a7926 12-Aug-2020 Peter Xu <peterx@redhat.com>

mm/parisc: use general page fault accounting

Use the general page fault accounting by passing regs into
handle_mm_fault(). It naturally solve the issue of multiple page fault
accounting when page f

mm/parisc: use general page fault accounting

Use the general page fault accounting by passing regs into
handle_mm_fault(). It naturally solve the issue of multiple page fault
accounting when page fault retry happened.

Add the missing PERF_COUNT_SW_PAGE_FAULTS perf events too. Note, the
other two perf events (PERF_COUNT_SW_PAGE_FAULTS_[MAJ|MIN]) were done in
handle_mm_fault().

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Helge Deller <deller@gmx.de>
Link: http://lkml.kernel.org/r/20200707225021.200906-16-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# bce617ed 12-Aug-2020 Peter Xu <peterx@redhat.com>

mm: do page fault accounting in handle_mm_fault

Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report o

mm: do page fault accounting in handle_mm_fault

Patch series "mm: Page fault accounting cleanups", v5.

This is v5 of the pf accounting cleanup series. It originates from Gerald
Schaefer's report on an issue a week ago regarding to incorrect page fault
accountings for retried page fault after commit 4064b9827063 ("mm: allow
VM_FAULT_RETRY for multiple times"):

https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/

What this series did:

- Correct page fault accounting: we do accounting for a page fault
(no matter whether it's from #PF handling, or gup, or anything else)
only with the one that completed the fault. For example, page fault
retries should not be counted in page fault counters. Same to the
perf events.

- Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
event is used in an adhoc way across different archs.

Case (1): for many archs it's done at the entry of a page fault
handler, so that it will also cover e.g. errornous faults.

Case (2): for some other archs, it is only accounted when the page
fault is resolved successfully.

Case (3): there're still quite some archs that have not enabled
this perf event.

Since this series will touch merely all the archs, we unify this
perf event to always follow case (1), which is the one that makes most
sense. And since we moved the accounting into handle_mm_fault, the
other two MAJ/MIN perf events are well taken care of naturally.

- Unify definition of "major faults": the definition of "major
fault" is slightly changed when used in accounting (not
VM_FAULT_MAJOR). More information in patch 1.

- Always account the page fault onto the one that triggered the page
fault. This does not matter much for #PF handlings, but mostly for
gup. More information on this in patch 25.

Patchset layout:

Patch 1: Introduced the accounting in handle_mm_fault(), not enabled.
Patch 2-23: Enable the new accounting for arch #PF handlers one by one.
Patch 24: Enable the new accounting for the rest outliers (gup, iommu, etc.)
Patch 25: Cleanup GUP task_struct pointer since it's not needed any more

This patch (of 25):

This is a preparation patch to move page fault accountings into the
general code in handle_mm_fault(). This includes both the per task
flt_maj/flt_min counters, and the major/minor page fault perf events. To
do this, the pt_regs pointer is passed into handle_mm_fault().

PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
handlers.

So far, all the pt_regs pointer that passed into handle_mm_fault() is
NULL, which means this patch should have no intented functional change.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 3e4e28c5 09-Jun-2020 Michel Lespinasse <walken@google.com>

mmap locking API: convert mmap_sem API comments

Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.

Signed-off-by: Michel Lespinasse <walken@

mmap locking API: convert mmap_sem API comments

Convert comments that reference old mmap_sem APIs to reference
corresponding new mmap locking APIs instead.

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-12-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# d8ed45c5 09-Jun-2020 Michel Lespinasse <walken@google.com>

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using c

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)

Signed-off-by: Michel Lespinasse <walken@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 4064b982 02-Apr-2020 Peter Xu <peterx@redhat.com>

mm: allow VM_FAULT_RETRY for multiple times

The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once. We achieved
this by clearing

mm: allow VM_FAULT_RETRY for multiple times

The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once. We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time. This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page. However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY. It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event. Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

- ALLOW_RETRY and !TRIED: this means the page fault allows to
retry, and this is the first try

- ALLOW_RETRY and TRIED: this means the page fault allows to
retry, and this is not the first try

- !ALLOW_RETRY and !TRIED: this means the page fault does not allow
to retry at all

- !ALLOW_RETRY and TRIED: this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY). This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths. One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault. It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# dde16072 02-Apr-2020 Peter Xu <peterx@redhat.com>

mm: introduce FAULT_FLAG_DEFAULT

Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags. Say,
merely all of the p

mm: introduce FAULT_FLAG_DEFAULT

Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags. Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over. With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 4ef87322 02-Apr-2020 Peter Xu <peterx@redhat.com>

mm: introduce fault_signal_pending()

For most architectures, we've got a quick path to detect fatal signal
after a handle_mm_fault(). Introduce a helper for that quick path.

It cleans the current

mm: introduce fault_signal_pending()

For most architectures, we've got a quick path to detect fatal signal
after a handle_mm_fault(). Introduce a helper for that quick path.

It cleans the current codes a bit so we don't need to duplicate the same
check across archs. More importantly, this will be an unified place that
we handle the signal immediately right after an interrupted page fault, so
it'll be much easier for us if we want to change the behavior of handling
signals later on for all the archs.

Note that currently only part of the archs are using this new helper,
because some archs have their own way to handle signals. In the follow up
patches, we'll try to apply this helper to all the rest of archs.

Another note is that the "regs" parameter in the new helper is not used
yet. It'll be used very soon. Now we kept it in this patch only to avoid
touching all the archs again in the follow up patches.

[peterx@redhat.com: fix sparse warnings]
Link: http://lkml.kernel.org/r/20200311145921.GD479302@xz-x1
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


# 12d1402c 31-Jul-2019 Helge Deller <deller@gmx.de>

parisc: Mark expected switch fall-throughs in fault.c

Fix a fall-through warning in fault.c.

Fixes: a035d552a93b ("Makefile: Globally enable fall-through warning")
Signed-off-by: Helge Deller <dell

parisc: Mark expected switch fall-throughs in fault.c

Fix a fall-through warning in fault.c.

Fixes: a035d552a93b ("Makefile: Globally enable fall-through warning")
Signed-off-by: Helge Deller <deller@gmx.de>

show more ...


# 2e1661d2 23-May-2019 Eric W. Biederman <ebiederm@xmission.com>

signal: Remove the task parameter from force_sig_fault

As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from fr

signal: Remove the task parameter from force_sig_fault

As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.

The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.

The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:

force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

show more ...


# f8eac901 06-Feb-2019 Eric W. Biederman <ebiederm@xmission.com>

signal: Remove task parameter from force_sig_mceerr

All of the callers pass current into force_sig_mceer so remove the
task parameter to make this obvious.

This also makes it clear that force_sig_m

signal: Remove task parameter from force_sig_mceerr

All of the callers pass current into force_sig_mceer so remove the
task parameter to make this obvious.

This also makes it clear that force_sig_mceerr passes current
into force_sig_info.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>

show more ...


123