kern_sched.c - OpenGrok history log for /openbsd/sys/kern/kern

Revision	Date	Author	Comments
# a09e9584	03-Jun-2024	claudio <claudio@openbsd.org>	Remove the now unsued s argument to SCHED_LOCK and SCHED_UNLOCK. The SPL level is not tacked by the mutex and we no longer need to track this in the callers. OK miod@ mlarkin@ tb@ jca@
# 2286c11f	28-Feb-2024	mpi <mpi@openbsd.org>	No need to kick a CPU twice when putting a thread on its runqueue. From Christian Ludwig, ok claudio@
# 1d970828	24-Jan-2024	cheloha <cheloha@openbsd.org>	clockintr: switch from callee- to caller-allocated clockintr structs Currently, clockintr_establish() calls malloc(9) to allocate a clockintr struct on behalf of the caller. mpi@ says this behavior clockintr: switch from callee- to caller-allocated clockintr structs Currently, clockintr_establish() calls malloc(9) to allocate a clockintr struct on behalf of the caller. mpi@ says this behavior is incompatible with dt(4). In particular, calling malloc(9) during the initialization of a PCB outside of dt_pcb_alloc() is (a) awkward and (b) may conflict with future changes/optimizations to PCB allocation. To side-step the problem, this patch changes the clockintr subsystem to use caller-allocated clockintr structs instead of callee-allocated structs. clockintr_establish() is named after softintr_establish(), which uses malloc(9) internally to create softintr objects. The clockintr subsystem is no longer using malloc(9), so the "establish" naming is no longer apt. To avoid confusion, this patch also renames "clockintr_establish" to "clockintr_bind". Requested by mpi@. Tweaked by mpi@. Thread: https://marc.info/?l=openbsd-tech&m=170597126103504&w=2 ok claudio@ mlarkin@ mpi@ show more ...
# bb00e811	24-Oct-2023	claudio <claudio@openbsd.org>	Normally context switches happen in mi_switch() but there are 3 cases where a switch happens outside. Cleanup these code paths and make the machine independent. - when a process forks (fork, tfork, Normally context switches happen in mi_switch() but there are 3 cases where a switch happens outside. Cleanup these code paths and make the machine independent. - when a process forks (fork, tfork, kthread), the new proc needs to somehow be scheduled for the first time. This is done by proc_trampoline. Since proc_trampoline is machine dependent assembler code change the MP specific proc_trampoline_mp() to proc_trampoline_mi() and make sure it is now always called. - cpu_hatch: when booting APs the code needs to jump to the first proc running on that CPU. This should be the idle thread for that CPU. - sched_exit: when a proc exits it needs to switch away from itself and then instruct the reaper to clean up the rest. This is done by switching to the idle loop. Since the last two cases require a context switch to the idle proc factor out the common code to sched_toidle() and use it in those places. Tested by many on all archs. OK miod@ mpi@ cheloha@ show more ...
# 709f9596	19-Sep-2023	claudio <claudio@openbsd.org>	Add a KASSERT for p->p_wchan == NULL to setrunqueue() There is the same check in sched_chooseproc() but that is too late to know where the bad insertion into the runqueue was done. OK mpi@
# a332869a	14-Sep-2023	cheloha <cheloha@openbsd.org>	clockintr, scheduler: move statclock handle from clockintr_queue to schedstate_percpu Move the statclock handle from clockintr_queue.cq_statclock to schedstate_percpu.spc_statclock. Establish spc_s clockintr, scheduler: move statclock handle from clockintr_queue to schedstate_percpu Move the statclock handle from clockintr_queue.cq_statclock to schedstate_percpu.spc_statclock. Establish spc_statclock during sched_init_cpu() alongside the other scheduler clock interrupts. Thread: https://marc.info/?l=openbsd-tech&m=169428749720476&w=2 show more ...
# a3464c93	10-Sep-2023	cheloha <cheloha@openbsd.org>	clockintr: support an arbitrary callback function argument Callers can now provide an argument pointer to clockintr_establish(). The pointer is kept in a new struct clockintr member, cl_arg. The po clockintr: support an arbitrary callback function argument Callers can now provide an argument pointer to clockintr_establish(). The pointer is kept in a new struct clockintr member, cl_arg. The pointer is passed as the third parameter to clockintr.cl_func when it is executed during clockintr_dispatch(). Like the callback function, the callback argument is immutable after the clockintr is established. At present, nothing uses this. All current clockintr_establish() callers pass a NULL arg pointer. However, I am confident that dt(4)'s profile provider will need this in the near future. Requested by dlg@ back in March. show more ...
# 529ac442	06-Sep-2023	cheloha <cheloha@openbsd.org>	clockintr: clockintr_establish: change first argument to a cpu_info pointer All CPUs control a single clockintr_queue. clockintr_establish() callers don't need to know about the underlying clockint clockintr: clockintr_establish: change first argument to a cpu_info pointer All CPUs control a single clockintr_queue. clockintr_establish() callers don't need to know about the underlying clockintr_queue. Accepting a cpu_info pointer as argument simplifies the API. From mpi@. ok mpi@ show more ...
# 3cdedeae	31-Aug-2023	cheloha <cheloha@openbsd.org>	sched_cpu_init: remove unnecessary NULL-checks for clockintr pointers sched_cpu_init() is only run once per cpu_info struct, so we don't need these NULL-checks. The NULL-checks are a vestige of clo sched_cpu_init: remove unnecessary NULL-checks for clockintr pointers sched_cpu_init() is only run once per cpu_info struct, so we don't need these NULL-checks. The NULL-checks are a vestige of clockintr_cpu_init(), which runs more than once per CPU and uses the checks to avoid leaking clockintr handles. Thread: https://marc.info/?l=openbsd-tech&m=169349579804340&w=2 ok claudio@ show more ...
# 94c38e45	29-Aug-2023	claudio <claudio@openbsd.org>	Remove p_rtime from struct proc and replace it by passing the timespec as argument to the tuagg_locked function. - Remove incorrect use of p_rtime in other parts of the tree. p_rtime was almost alwa Remove p_rtime from struct proc and replace it by passing the timespec as argument to the tuagg_locked function. - Remove incorrect use of p_rtime in other parts of the tree. p_rtime was almost always 0 so including it in any sum did not alter the result. - In main() the update of time can be further simplified since at that time only the primary cpu is running. - Add missing nanouptime() call in cpu_hatch() for hppa - Rename tuagg_unlocked to tuagg_locked like it is done in the rest of the tree. OK cheloha@ dlg@ show more ...
# 9b3d5a4a	14-Aug-2023	mpi <mpi@openbsd.org>	Extend scheduler tracepoints to follow CPU jumping. - Add two new tracpoints sched:fork & sched:steal - Include selected CPU number in sched:wakeup - Add sched:unsleep corresponding to sched:sleep w Extend scheduler tracepoints to follow CPU jumping. - Add two new tracpoints sched:fork & sched:steal - Include selected CPU number in sched:wakeup - Add sched:unsleep corresponding to sched:sleep which matches add/removal of threads on the sleep queue ok claudio@ show more ...
# 9ac452c7	11-Aug-2023	cheloha <cheloha@openbsd.org>	hardclock(9), roundrobin: make roundrobin() an independent clock interrupt - Remove the roundrobin() call from hardclock(9). - Revise roundrobin() to make it a valid clock interrupt callback. It hardclock(9), roundrobin: make roundrobin() an independent clock interrupt - Remove the roundrobin() call from hardclock(9). - Revise roundrobin() to make it a valid clock interrupt callback. It is still periodic and it still runs at one tenth of the hardclock frequency. - Account for multiple expirations in roundrobin(): if two or more roundrobin periods have elapsed, set SPCF_SHOULDYIELD on the running thread immediately to simulate normal behavior. - Each schedstate_percpu has its own roundrobin() handle, spc_roundrobin. spc_roundrobin is started/advanced during clockintr_cpu_init(). Intervals elapsed across suspend/resume are discarded. - rrticks_init and schedstate_percpu.spc_rrticks are now useless: delete them. Tweaked by mpi@. With input from mpi@ and claudio@. Thread: https://marc.info/?l=openbsd-tech&m=169127381314651&w=2 ok mpi@ claudio@ show more ...
# 44e0cbf2	05-Aug-2023	cheloha <cheloha@openbsd.org>	hardclock(9): move setitimer(2) code into itimer_update() - Move the setitimer(2) code responsible for updating the ITIMER_VIRTUAL and ITIMER_PROF timers from hardclock(9) into a new clock interru hardclock(9): move setitimer(2) code into itimer_update() - Move the setitimer(2) code responsible for updating the ITIMER_VIRTUAL and ITIMER_PROF timers from hardclock(9) into a new clock interrupt routine, itimer_update(). itimer_update() is periodic and runs at the same frequency as the hardclock. + Revise itimerdecr() to run within itimer_mtx instead of entering and leaving it. - Each schedstate_percpu has its own itimer_update() handle, spc_itimer. A new scheduler flag, SPCF_ITIMER, indicates whether spc_itimer was started during the last mi_switch() and needs to be stopped during the next mi_switch() or sched_exit(). - A new per-process flag, PS_ITIMER, indicates whether ITIMER_VIRTUAL and/or ITIMER_PROF are running. Checking the flag is easier than entering itimer_mtx to check process.ps_timer[]. The flag is set and cleared in a new helper function, process_reset_itimer_flag(). - In setitimer(), call need_resched() when the state of ITIMER_VIRTUAL or ITIMER_PROF is changed to force an mi_switch() and update spc_itimer. claudio@ notes that ITIMER_PROF could be implemented as a high-res timer using the thread's execution time as a guide for when to interrupt the process and assert SIGPROF. This would probably work really well in single-threaded processes. ITIMER_VIRTUAL would be more difficult to make high-res, though, as you need to exclude time spent in the kernel. Tested on powerpc64 by gkoehler@. With input from claudio@. Thread: https://marc.info/?l=openbsd-tech&m=169038818517101&w=2 ok claudio@ show more ...
# 1588c842	05-Aug-2023	claudio <claudio@openbsd.org>	Remove the P_WSLEEP specific KASSERT(). Not only procs in state SSTOP can be added to the run queue but also procs in state SRUN. The latter happens when schedcpu() kicks in before the proc had a cha Remove the P_WSLEEP specific KASSERT(). Not only procs in state SSTOP can be added to the run queue but also procs in state SRUN. The latter happens when schedcpu() kicks in before the proc had a chance to run. Problem spotted by gkoehler@ OK cheloha@ show more ...
# 834cc80d	03-Aug-2023	claudio <claudio@openbsd.org>	Remove the per-cpu loadavg calculation. The current scheduler useage is highly questionable and probably not helpful. OK kettenis@ cheloha@ deraadt@
# 96496668	27-Jul-2023	cheloha <cheloha@openbsd.org>	sched_init_cpu: move profclock staggering to clockintr_cpu_init() initclocks() runs after sched_init_cpu() is called for secondary CPUs, so profclock_period is still zero and the clockintr_stagger() sched_init_cpu: move profclock staggering to clockintr_cpu_init() initclocks() runs after sched_init_cpu() is called for secondary CPUs, so profclock_period is still zero and the clockintr_stagger() call for spc_profclock is useless. For now, just stagger spc_profclock during clockintr_cpu_init() along with everything else. show more ...
# 671537bf	25-Jul-2023	cheloha <cheloha@openbsd.org>	statclock: move profil(2), GPROF code to profclock(), gmonclock() This patch isolates profil(2) and GPROF from statclock(). Currently, statclock() implements both profil(2) and GPROF through a comp statclock: move profil(2), GPROF code to profclock(), gmonclock() This patch isolates profil(2) and GPROF from statclock(). Currently, statclock() implements both profil(2) and GPROF through a complex mechanism involving both platform code (setstatclockrate) and the scheduler (pscnt, psdiv, and psratio). We have a machine-independent interface to the clock interrupt hardware now, so we no longer need to do it this way. - Move profil(2)-specific code from statclock() to a new clock interrupt callback, profclock(), in subr_prof.c. Each schedstate_percpu has its own profclock handle. The profclock is enabled/disabled for a given CPU when it is needed by the running thread during mi_switch() and sched_exit(). - Move GPROF-specific code from statclock() to a new clock interrupt callback, gmonclock(), in subr_prof.c. Where available, each cpu_info has its own gmonclock handle . The gmonclock is enabled/disabled for a given CPU via sysctl(2) in prof_state_toggle(). - Both profclock() and gmonclock() have a fixed period, profclock_period, that is initialized during initclocks(). - Export clockintr_advance(), clockintr_cancel(), clockintr_establish(), and clockintr_stagger() via <sys/clockintr.h>. They have external callers now. - Delete pscnt, psdiv, psratio. From schedstate_percpu, also delete spc_pscnt and spc_psdiv. The statclock frequency is not dynamic anymore so these variables are now useless. - Delete code/state related to the dynamic statclock frequency from kern_clockintr.c. The statclock frequency can still be pseudo-random, so move the contents of clockintr_statvar_init() into clockintr_init(). With input from miod@, deraadt@, and claudio@. Early revisions cleaned up by claudio. Early revisions tested by claudio@. Tested by cheloha@ on amd64, arm64, macppc, octeon, and sparc64 (sun4v). Compile- and boot- tested on i386 by mlarkin@. riscv64 compilation bugs found by mlarkin@. Tested on riscv64 by jca@. Tested on powerpc64 by gkoehler@. show more ...
# f2e7dc09	14-Jul-2023	claudio <claudio@openbsd.org>	struct sleep_state is no longer used, remove it. Also remove the priority argument to sleep_finish() the code can use the p_flag P_SINTR flag to know if the signal check is needed or not. OK cheloha@ struct sleep_state is no longer used, remove it. Also remove the priority argument to sleep_finish() the code can use the p_flag P_SINTR flag to know if the signal check is needed or not. OK cheloha@ kettenis@ mpi@ show more ...
# aa563902	11-Jul-2023	claudio <claudio@openbsd.org>	Rework sleep_setup()/sleep_finish() to no longer hold the scheduler lock between calls. Instead of forcing an atomic operation across multiple calls use a three step transaction. 1. setup sleep stat Rework sleep_setup()/sleep_finish() to no longer hold the scheduler lock between calls. Instead of forcing an atomic operation across multiple calls use a three step transaction. 1. setup sleep state by calling sleep_setup() 2. recheck sleep condition to ensure that the event did not fire before sleep_setup() registered the proc onto the sleep queue 3. call sleep_finish() to either sleep or keep on running based on the step 2 outcome and any possible signal delivery To make this work wakeup from signals, single thread api and wakeup(9) need to be aware if a process is between step 1 and step 3 so that the process is not enqueued back onto the runqueue while going to sleep. Introduce the p_flag P_WSLEEP to detect this situation. On top of this remove the spl dance in msleep() which is no longer required. It is ok to process interrupts between step 1 and 3. OK mpi@ cheloha@ show more ...
# b2536c64	28-Jun-2023	claudio <claudio@openbsd.org>	First step at removing struct sleep_state. Pass the timeout and sleep priority not only to sleep_setup() but also to sleep_finish(). With that sls_timeout and sls_catch can be removed from struct sl First step at removing struct sleep_state. Pass the timeout and sleep priority not only to sleep_setup() but also to sleep_finish(). With that sls_timeout and sls_catch can be removed from struct sleep_state. The timeout is now setup first thing in sleep_finish() and no longer as last thing in sleep_setup(). This should not cause a noticeable difference since the code run between sleep_setup() and sleep_finish() is minimal. OK kettenis@ show more ...
# 2b46a8cb	05-Dec-2022	deraadt <deraadt@openbsd.org>	zap a pile of dangling tabs
# 0d280c5f	14-Aug-2022	jsg <jsg@openbsd.org>	remove unneeded includes in sys/kern ok mpi@ miod@
# 41d7544a	20-Jan-2022	bluhm <bluhm@openbsd.org>	Shifting signed integers left by 31 is undefined behavior in C. found by kubsan; joint work with tobhe@; OK miod@
# 9fde647a	09-Sep-2021	mpi <mpi@openbsd.org>	Add THREAD_PID_OFFSET to tracepoint arguments that pass a TID to userland. Bring these values in sync with the `tid' builtin which already include the offset. This is necessary to build script comp Add THREAD_PID_OFFSET to tracepoint arguments that pass a TID to userland. Bring these values in sync with the `tid' builtin which already include the offset. This is necessary to build script comparing them, like: tracepoint:sched:enqueue { @ts[arg0] = nsecs; } tracepoint:sched:on__cpu /@ts[tid]/ { latency = nsecs - @ts[tid]; } Discussed with and ok bluhm@ show more ...
# d73de46f	06-Jul-2021	kettenis <kettenis@openbsd.org>	Introduce CPU_IS_RUNNING() and us it in scheduler-related code to prevent waiting on CPUs that didn't spin up. This will allow us to spin down CPUs in the future to save power as well. ok mpi@
12 3 4