1*4506c531SRafael J. Wysocki.. SPDX-License-Identifier: GPL-2.0 2*4506c531SRafael J. Wysocki.. include:: <isonum.txt> 3*4506c531SRafael J. Wysocki 4*4506c531SRafael J. Wysocki========================= 5*4506c531SRafael J. WysockiSystem Suspend Code Flows 6*4506c531SRafael J. Wysocki========================= 7*4506c531SRafael J. Wysocki 8*4506c531SRafael J. Wysocki:Copyright: |copy| 2020 Intel Corporation 9*4506c531SRafael J. Wysocki 10*4506c531SRafael J. Wysocki:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com> 11*4506c531SRafael J. Wysocki 12*4506c531SRafael J. WysockiAt least one global system-wide transition needs to be carried out for the 13*4506c531SRafael J. Wysockisystem to get from the working state into one of the supported 14*4506c531SRafael J. Wysocki:doc:`sleep states <sleep-states>`. Hibernation requires more than one 15*4506c531SRafael J. Wysockitransition to occur for this purpose, but the other sleep states, commonly 16*4506c531SRafael J. Wysockireferred to as *system-wide suspend* (or simply *system suspend*) states, need 17*4506c531SRafael J. Wysockionly one. 18*4506c531SRafael J. Wysocki 19*4506c531SRafael J. WysockiFor those sleep states, the transition from the working state of the system into 20*4506c531SRafael J. Wysockithe target sleep state is referred to as *system suspend* too (in the majority 21*4506c531SRafael J. Wysockiof cases, whether this means a transition or a sleep state of the system should 22*4506c531SRafael J. Wysockibe clear from the context) and the transition back from the sleep state into the 23*4506c531SRafael J. Wysockiworking state is referred to as *system resume*. 24*4506c531SRafael J. Wysocki 25*4506c531SRafael J. WysockiThe kernel code flows associated with the suspend and resume transitions for 26*4506c531SRafael J. Wysockidifferent sleep states of the system are quite similar, but there are some 27*4506c531SRafael J. Wysockisignificant differences between the :ref:`suspend-to-idle <s2idle>` code flows 28*4506c531SRafael J. Wysockiand the code flows related to the :ref:`suspend-to-RAM <s2ram>` and 29*4506c531SRafael J. Wysocki:ref:`standby <standby>` sleep states. 30*4506c531SRafael J. Wysocki 31*4506c531SRafael J. WysockiThe :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states 32*4506c531SRafael J. Wysockicannot be implemented without platform support and the difference between them 33*4506c531SRafael J. Wysockiboils down to the platform-specific actions carried out by the suspend and 34*4506c531SRafael J. Wysockiresume hooks that need to be provided by the platform driver to make them 35*4506c531SRafael J. Wysockiavailable. Apart from that, the suspend and resume code flows for these sleep 36*4506c531SRafael J. Wysockistates are mostly identical, so they both together will be referred to as 37*4506c531SRafael J. Wysocki*platform-dependent suspend* states in what follows. 38*4506c531SRafael J. Wysocki 39*4506c531SRafael J. Wysocki 40*4506c531SRafael J. Wysocki.. _s2idle_suspend: 41*4506c531SRafael J. Wysocki 42*4506c531SRafael J. WysockiSuspend-to-idle Suspend Code Flow 43*4506c531SRafael J. Wysocki================================= 44*4506c531SRafael J. Wysocki 45*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from the working 46*4506c531SRafael J. Wysockistate to the :ref:`suspend-to-idle <s2idle>` sleep state: 47*4506c531SRafael J. Wysocki 48*4506c531SRafael J. Wysocki 1. Invoking system-wide suspend notifiers. 49*4506c531SRafael J. Wysocki 50*4506c531SRafael J. Wysocki Kernel subsystems can register callbacks to be invoked when the suspend 51*4506c531SRafael J. Wysocki transition is about to occur and when the resume transition has finished. 52*4506c531SRafael J. Wysocki 53*4506c531SRafael J. Wysocki That allows them to prepare for the change of the system state and to clean 54*4506c531SRafael J. Wysocki up after getting back to the working state. 55*4506c531SRafael J. Wysocki 56*4506c531SRafael J. Wysocki 2. Freezing tasks. 57*4506c531SRafael J. Wysocki 58*4506c531SRafael J. Wysocki Tasks are frozen primarily in order to avoid unchecked hardware accesses 59*4506c531SRafael J. Wysocki from user space through MMIO regions or I/O registers exposed directly to 60*4506c531SRafael J. Wysocki it and to prevent user space from entering the kernel while the next step 61*4506c531SRafael J. Wysocki of the transition is in progress (which might have been problematic for 62*4506c531SRafael J. Wysocki various reasons). 63*4506c531SRafael J. Wysocki 64*4506c531SRafael J. Wysocki All user space tasks are intercepted as though they were sent a signal and 65*4506c531SRafael J. Wysocki put into uninterruptible sleep until the end of the subsequent system resume 66*4506c531SRafael J. Wysocki transition. 67*4506c531SRafael J. Wysocki 68*4506c531SRafael J. Wysocki The kernel threads that choose to be frozen during system suspend for 69*4506c531SRafael J. Wysocki specific reasons are frozen subsequently, but they are not intercepted. 70*4506c531SRafael J. Wysocki Instead, they are expected to periodically check whether or not they need 71*4506c531SRafael J. Wysocki to be frozen and to put themselves into uninterruptible sleep if so. [Note, 72*4506c531SRafael J. Wysocki however, that kernel threads can use locking and other concurrency controls 73*4506c531SRafael J. Wysocki available in kernel space to synchronize themselves with system suspend and 74*4506c531SRafael J. Wysocki resume, which can be much more precise than the freezing, so the latter is 75*4506c531SRafael J. Wysocki not a recommended option for kernel threads.] 76*4506c531SRafael J. Wysocki 77*4506c531SRafael J. Wysocki 3. Suspending devices and reconfiguring IRQs. 78*4506c531SRafael J. Wysocki 79*4506c531SRafael J. Wysocki Devices are suspended in four phases called *prepare*, *suspend*, 80*4506c531SRafael J. Wysocki *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more 81*4506c531SRafael J. Wysocki information on what exactly happens in each phase). 82*4506c531SRafael J. Wysocki 83*4506c531SRafael J. Wysocki Every device is visited in each phase, but typically it is not physically 84*4506c531SRafael J. Wysocki accessed in more than two of them. 85*4506c531SRafael J. Wysocki 86*4506c531SRafael J. Wysocki The runtime PM API is disabled for every device during the *late* suspend 87*4506c531SRafael J. Wysocki phase and high-level ("action") interrupt handlers are prevented from being 88*4506c531SRafael J. Wysocki invoked before the *noirq* suspend phase. 89*4506c531SRafael J. Wysocki 90*4506c531SRafael J. Wysocki Interrupts are still handled after that, but they are only acknowledged to 91*4506c531SRafael J. Wysocki interrupt controllers without performing any device-specific actions that 92*4506c531SRafael J. Wysocki would be triggered in the working state of the system (those actions are 93*4506c531SRafael J. Wysocki deferred till the subsequent system resume transition as described 94*4506c531SRafael J. Wysocki `below <s2idle_resume_>`_). 95*4506c531SRafael J. Wysocki 96*4506c531SRafael J. Wysocki IRQs associated with system wakeup devices are "armed" so that the resume 97*4506c531SRafael J. Wysocki transition of the system is started when one of them signals an event. 98*4506c531SRafael J. Wysocki 99*4506c531SRafael J. Wysocki 4. Freezing the scheduler tick and suspending timekeeping. 100*4506c531SRafael J. Wysocki 101*4506c531SRafael J. Wysocki When all devices have been suspended, CPUs enter the idle loop and are put 102*4506c531SRafael J. Wysocki into the deepest available idle state. While doing that, each of them 103*4506c531SRafael J. Wysocki "freezes" its own scheduler tick so that the timer events associated with 104*4506c531SRafael J. Wysocki the tick do not occur until the CPU is woken up by another interrupt source. 105*4506c531SRafael J. Wysocki 106*4506c531SRafael J. Wysocki The last CPU to enter the idle state also stops the timekeeping which 107*4506c531SRafael J. Wysocki (among other things) prevents high resolution timers from triggering going 108*4506c531SRafael J. Wysocki forward until the first CPU that is woken up restarts the timekeeping. 109*4506c531SRafael J. Wysocki That allows the CPUs to stay in the deep idle state relatively long in one 110*4506c531SRafael J. Wysocki go. 111*4506c531SRafael J. Wysocki 112*4506c531SRafael J. Wysocki From this point on, the CPUs can only be woken up by non-timer hardware 113*4506c531SRafael J. Wysocki interrupts. If that happens, they go back to the idle state unless the 114*4506c531SRafael J. Wysocki interrupt that woke up one of them comes from an IRQ that has been armed for 115*4506c531SRafael J. Wysocki system wakeup, in which case the system resume transition is started. 116*4506c531SRafael J. Wysocki 117*4506c531SRafael J. Wysocki 118*4506c531SRafael J. Wysocki.. _s2idle_resume: 119*4506c531SRafael J. Wysocki 120*4506c531SRafael J. WysockiSuspend-to-idle Resume Code Flow 121*4506c531SRafael J. Wysocki================================ 122*4506c531SRafael J. Wysocki 123*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from the 124*4506c531SRafael J. Wysocki:ref:`suspend-to-idle <s2idle>` sleep state into the working state: 125*4506c531SRafael J. Wysocki 126*4506c531SRafael J. Wysocki 1. Resuming timekeeping and unfreezing the scheduler tick. 127*4506c531SRafael J. Wysocki 128*4506c531SRafael J. Wysocki When one of the CPUs is woken up (by a non-timer hardware interrupt), it 129*4506c531SRafael J. Wysocki leaves the idle state entered in the last step of the preceding suspend 130*4506c531SRafael J. Wysocki transition, restarts the timekeeping (unless it has been restarted already 131*4506c531SRafael J. Wysocki by another CPU that woke up earlier) and the scheduler tick on that CPU is 132*4506c531SRafael J. Wysocki unfrozen. 133*4506c531SRafael J. Wysocki 134*4506c531SRafael J. Wysocki If the interrupt that has woken up the CPU was armed for system wakeup, 135*4506c531SRafael J. Wysocki the system resume transition begins. 136*4506c531SRafael J. Wysocki 137*4506c531SRafael J. Wysocki 2. Resuming devices and restoring the working-state configuration of IRQs. 138*4506c531SRafael J. Wysocki 139*4506c531SRafael J. Wysocki Devices are resumed in four phases called *noirq resume*, *early resume*, 140*4506c531SRafael J. Wysocki *resume* and *complete* (see :ref:`driverapi_pm_devices` for more 141*4506c531SRafael J. Wysocki information on what exactly happens in each phase). 142*4506c531SRafael J. Wysocki 143*4506c531SRafael J. Wysocki Every device is visited in each phase, but typically it is not physically 144*4506c531SRafael J. Wysocki accessed in more than two of them. 145*4506c531SRafael J. Wysocki 146*4506c531SRafael J. Wysocki The working-state configuration of IRQs is restored after the *noirq* resume 147*4506c531SRafael J. Wysocki phase and the runtime PM API is re-enabled for every device whose driver 148*4506c531SRafael J. Wysocki supports it during the *early* resume phase. 149*4506c531SRafael J. Wysocki 150*4506c531SRafael J. Wysocki 3. Thawing tasks. 151*4506c531SRafael J. Wysocki 152*4506c531SRafael J. Wysocki Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_ 153*4506c531SRafael J. Wysocki transition are "thawed", which means that they are woken up from the 154*4506c531SRafael J. Wysocki uninterruptible sleep that they went into at that time and user space tasks 155*4506c531SRafael J. Wysocki are allowed to exit the kernel. 156*4506c531SRafael J. Wysocki 157*4506c531SRafael J. Wysocki 4. Invoking system-wide resume notifiers. 158*4506c531SRafael J. Wysocki 159*4506c531SRafael J. Wysocki This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition 160*4506c531SRafael J. Wysocki and the same set of callbacks is invoked at this point, but a different 161*4506c531SRafael J. Wysocki "notification type" parameter value is passed to them. 162*4506c531SRafael J. Wysocki 163*4506c531SRafael J. Wysocki 164*4506c531SRafael J. WysockiPlatform-dependent Suspend Code Flow 165*4506c531SRafael J. Wysocki==================================== 166*4506c531SRafael J. Wysocki 167*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from the working 168*4506c531SRafael J. Wysockistate to platform-dependent suspend state: 169*4506c531SRafael J. Wysocki 170*4506c531SRafael J. Wysocki 1. Invoking system-wide suspend notifiers. 171*4506c531SRafael J. Wysocki 172*4506c531SRafael J. Wysocki This step is the same as step 1 of the suspend-to-idle suspend transition 173*4506c531SRafael J. Wysocki described `above <s2idle_suspend_>`_. 174*4506c531SRafael J. Wysocki 175*4506c531SRafael J. Wysocki 2. Freezing tasks. 176*4506c531SRafael J. Wysocki 177*4506c531SRafael J. Wysocki This step is the same as step 2 of the suspend-to-idle suspend transition 178*4506c531SRafael J. Wysocki described `above <s2idle_suspend_>`_. 179*4506c531SRafael J. Wysocki 180*4506c531SRafael J. Wysocki 3. Suspending devices and reconfiguring IRQs. 181*4506c531SRafael J. Wysocki 182*4506c531SRafael J. Wysocki This step is analogous to step 3 of the suspend-to-idle suspend transition 183*4506c531SRafael J. Wysocki described `above <s2idle_suspend_>`_, but the arming of IRQs for system 184*4506c531SRafael J. Wysocki wakeup generally does not have any effect on the platform. 185*4506c531SRafael J. Wysocki 186*4506c531SRafael J. Wysocki There are platforms that can go into a very deep low-power state internally 187*4506c531SRafael J. Wysocki when all CPUs in them are in sufficiently deep idle states and all I/O 188*4506c531SRafael J. Wysocki devices have been put into low-power states. On those platforms, 189*4506c531SRafael J. Wysocki suspend-to-idle can reduce system power very effectively. 190*4506c531SRafael J. Wysocki 191*4506c531SRafael J. Wysocki On the other platforms, however, low-level components (like interrupt 192*4506c531SRafael J. Wysocki controllers) need to be turned off in a platform-specific way (implemented 193*4506c531SRafael J. Wysocki in the hooks provided by the platform driver) to achieve comparable power 194*4506c531SRafael J. Wysocki reduction. 195*4506c531SRafael J. Wysocki 196*4506c531SRafael J. Wysocki That usually prevents in-band hardware interrupts from waking up the system, 197*4506c531SRafael J. Wysocki which must be done in a special platform-dependent way. Then, the 198*4506c531SRafael J. Wysocki configuration of system wakeup sources usually starts when system wakeup 199*4506c531SRafael J. Wysocki devices are suspended and is finalized by the platform suspend hooks later 200*4506c531SRafael J. Wysocki on. 201*4506c531SRafael J. Wysocki 202*4506c531SRafael J. Wysocki 4. Disabling non-boot CPUs. 203*4506c531SRafael J. Wysocki 204*4506c531SRafael J. Wysocki On some platforms the suspend hooks mentioned above must run in a one-CPU 205*4506c531SRafael J. Wysocki configuration of the system (in particular, the hardware cannot be accessed 206*4506c531SRafael J. Wysocki by any code running in parallel with the platform suspend hooks that may, 207*4506c531SRafael J. Wysocki and often do, trap into the platform firmware in order to finalize the 208*4506c531SRafael J. Wysocki suspend transition). 209*4506c531SRafael J. Wysocki 210*4506c531SRafael J. Wysocki For this reason, the CPU offline/online (CPU hotplug) framework is used 211*4506c531SRafael J. Wysocki to take all of the CPUs in the system, except for one (the boot CPU), 212*4506c531SRafael J. Wysocki offline (typically, the CPUs that have been taken offline go into deep idle 213*4506c531SRafael J. Wysocki states). 214*4506c531SRafael J. Wysocki 215*4506c531SRafael J. Wysocki This means that all tasks are migrated away from those CPUs and all IRQs are 216*4506c531SRafael J. Wysocki rerouted to the only CPU that remains online. 217*4506c531SRafael J. Wysocki 218*4506c531SRafael J. Wysocki 5. Suspending core system components. 219*4506c531SRafael J. Wysocki 220*4506c531SRafael J. Wysocki This prepares the core system components for (possibly) losing power going 221*4506c531SRafael J. Wysocki forward and suspends the timekeeping. 222*4506c531SRafael J. Wysocki 223*4506c531SRafael J. Wysocki 6. Platform-specific power removal. 224*4506c531SRafael J. Wysocki 225*4506c531SRafael J. Wysocki This is expected to remove power from all of the system components except 226*4506c531SRafael J. Wysocki for the memory controller and RAM (in order to preserve the contents of the 227*4506c531SRafael J. Wysocki latter) and some devices designated for system wakeup. 228*4506c531SRafael J. Wysocki 229*4506c531SRafael J. Wysocki In many cases control is passed to the platform firmware which is expected 230*4506c531SRafael J. Wysocki to finalize the suspend transition as needed. 231*4506c531SRafael J. Wysocki 232*4506c531SRafael J. Wysocki 233*4506c531SRafael J. WysockiPlatform-dependent Resume Code Flow 234*4506c531SRafael J. Wysocki=================================== 235*4506c531SRafael J. Wysocki 236*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from a 237*4506c531SRafael J. Wysockiplatform-dependent suspend state into the working state: 238*4506c531SRafael J. Wysocki 239*4506c531SRafael J. Wysocki 1. Platform-specific system wakeup. 240*4506c531SRafael J. Wysocki 241*4506c531SRafael J. Wysocki The platform is woken up by a signal from one of the designated system 242*4506c531SRafael J. Wysocki wakeup devices (which need not be an in-band hardware interrupt) and 243*4506c531SRafael J. Wysocki control is passed back to the kernel (the working configuration of the 244*4506c531SRafael J. Wysocki platform may need to be restored by the platform firmware before the 245*4506c531SRafael J. Wysocki kernel gets control again). 246*4506c531SRafael J. Wysocki 247*4506c531SRafael J. Wysocki 2. Resuming core system components. 248*4506c531SRafael J. Wysocki 249*4506c531SRafael J. Wysocki The suspend-time configuration of the core system components is restored and 250*4506c531SRafael J. Wysocki the timekeeping is resumed. 251*4506c531SRafael J. Wysocki 252*4506c531SRafael J. Wysocki 3. Re-enabling non-boot CPUs. 253*4506c531SRafael J. Wysocki 254*4506c531SRafael J. Wysocki The CPUs disabled in step 4 of the preceding suspend transition are taken 255*4506c531SRafael J. Wysocki back online and their suspend-time configuration is restored. 256*4506c531SRafael J. Wysocki 257*4506c531SRafael J. Wysocki 4. Resuming devices and restoring the working-state configuration of IRQs. 258*4506c531SRafael J. Wysocki 259*4506c531SRafael J. Wysocki This step is the same as step 2 of the suspend-to-idle suspend transition 260*4506c531SRafael J. Wysocki described `above <s2idle_resume_>`_. 261*4506c531SRafael J. Wysocki 262*4506c531SRafael J. Wysocki 5. Thawing tasks. 263*4506c531SRafael J. Wysocki 264*4506c531SRafael J. Wysocki This step is the same as step 3 of the suspend-to-idle suspend transition 265*4506c531SRafael J. Wysocki described `above <s2idle_resume_>`_. 266*4506c531SRafael J. Wysocki 267*4506c531SRafael J. Wysocki 6. Invoking system-wide resume notifiers. 268*4506c531SRafael J. Wysocki 269*4506c531SRafael J. Wysocki This step is the same as step 4 of the suspend-to-idle suspend transition 270*4506c531SRafael J. Wysocki described `above <s2idle_resume_>`_. 271