Home
last modified time | relevance | path

Searched hist:c9de630f (Results 1 – 20 of 20) sorted by relevance

/openbsd/sys/arch/amd64/include/
H A Dproc.hc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dpcb.hc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dfpu.hc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dintrdefs.hc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dcodepatch.hc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dspecialreg.hc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dcpu.hc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
/openbsd/sys/arch/amd64/amd64/
H A Dprocess_machdep.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dmptramp.Sc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dvia.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dfpu.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dipifuncs.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dvm_machdep.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dgenassym.cfc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dacpi_machdep.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dvector.Sc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dtrap.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dlocore.Sc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dcpu.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@
H A Dmachdep.cc9de630f Tue Jun 05 06:39:10 GMT 2018 guenther <guenther@openbsd.org> Switch from lazy FPU switching to semi-eager FPU switching: track whether
curproc's xstate ("extended state") is loaded in the CPU or not.
- context switch, sendsig(), vmm, and doing CPU crypto in the kernel all
check the flag and, if set, save the old thread's state to the PCB,
clear the flag, and then load the _blank_ state
- when returning to userspace, if the flag is clear then set it and restore
the thread's state

This simpler tracking also fixes the restoring of FPU state after nested
signal handlers.

With this, %cr0's TS flag is never set, the FPU #DNA trap can no
longer happen, and IPIs are no longer necessary for flushing or
syncing FPU state; on the other hand, restoring xstate while returning
to userspace means we have to handle xrstor faulting if we could
be loading an altered state. If that happens, reset the state,
fake a #GP fault (SIGBUS), and recheck for ASTs.

While here, regularize fxsave/fxrstor vs xsave/xrstor handling, by
using codepatching to switch to xsave/xrstor when present in the
CPU. In addition, code patch in use of xsaveopt in most places
when the CPU supports that. Use the 64bit-wide variants of the
instructions in all cases so that x87 instruction fault IPs are
reported correctly.

This change has three motivations:
1) with modern clang, SSE registers are used even in rcrt0.o, making
lazy FPU switching a smaller benefit vs trap costs
2) the Intel SDM warns that lazy FPU switching may increase power costs
3) post-Spectre rumors suggest that the %cr0 TS flag might not block
speculation, permitting leaking of information about FPU state
(AES keys?) across protection boundaries.

tested by many in snaps; prodding from deraadt@