1*4506c531SRafael J. Wysocki.. SPDX-License-Identifier: GPL-2.0
2*4506c531SRafael J. Wysocki.. include:: <isonum.txt>
3*4506c531SRafael J. Wysocki
4*4506c531SRafael J. Wysocki=========================
5*4506c531SRafael J. WysockiSystem Suspend Code Flows
6*4506c531SRafael J. Wysocki=========================
7*4506c531SRafael J. Wysocki
8*4506c531SRafael J. Wysocki:Copyright: |copy| 2020 Intel Corporation
9*4506c531SRafael J. Wysocki
10*4506c531SRafael J. Wysocki:Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
11*4506c531SRafael J. Wysocki
12*4506c531SRafael J. WysockiAt least one global system-wide transition needs to be carried out for the
13*4506c531SRafael J. Wysockisystem to get from the working state into one of the supported
14*4506c531SRafael J. Wysocki:doc:`sleep states <sleep-states>`.  Hibernation requires more than one
15*4506c531SRafael J. Wysockitransition to occur for this purpose, but the other sleep states, commonly
16*4506c531SRafael J. Wysockireferred to as *system-wide suspend* (or simply *system suspend*) states, need
17*4506c531SRafael J. Wysockionly one.
18*4506c531SRafael J. Wysocki
19*4506c531SRafael J. WysockiFor those sleep states, the transition from the working state of the system into
20*4506c531SRafael J. Wysockithe target sleep state is referred to as *system suspend* too (in the majority
21*4506c531SRafael J. Wysockiof cases, whether this means a transition or a sleep state of the system should
22*4506c531SRafael J. Wysockibe clear from the context) and the transition back from the sleep state into the
23*4506c531SRafael J. Wysockiworking state is referred to as *system resume*.
24*4506c531SRafael J. Wysocki
25*4506c531SRafael J. WysockiThe kernel code flows associated with the suspend and resume transitions for
26*4506c531SRafael J. Wysockidifferent sleep states of the system are quite similar, but there are some
27*4506c531SRafael J. Wysockisignificant differences between the :ref:`suspend-to-idle <s2idle>` code flows
28*4506c531SRafael J. Wysockiand the code flows related to the :ref:`suspend-to-RAM <s2ram>` and
29*4506c531SRafael J. Wysocki:ref:`standby <standby>` sleep states.
30*4506c531SRafael J. Wysocki
31*4506c531SRafael J. WysockiThe :ref:`suspend-to-RAM <s2ram>` and :ref:`standby <standby>` sleep states
32*4506c531SRafael J. Wysockicannot be implemented without platform support and the difference between them
33*4506c531SRafael J. Wysockiboils down to the platform-specific actions carried out by the suspend and
34*4506c531SRafael J. Wysockiresume hooks that need to be provided by the platform driver to make them
35*4506c531SRafael J. Wysockiavailable.  Apart from that, the suspend and resume code flows for these sleep
36*4506c531SRafael J. Wysockistates are mostly identical, so they both together will be referred to as
37*4506c531SRafael J. Wysocki*platform-dependent suspend* states in what follows.
38*4506c531SRafael J. Wysocki
39*4506c531SRafael J. Wysocki
40*4506c531SRafael J. Wysocki.. _s2idle_suspend:
41*4506c531SRafael J. Wysocki
42*4506c531SRafael J. WysockiSuspend-to-idle Suspend Code Flow
43*4506c531SRafael J. Wysocki=================================
44*4506c531SRafael J. Wysocki
45*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from the working
46*4506c531SRafael J. Wysockistate to the :ref:`suspend-to-idle <s2idle>` sleep state:
47*4506c531SRafael J. Wysocki
48*4506c531SRafael J. Wysocki 1. Invoking system-wide suspend notifiers.
49*4506c531SRafael J. Wysocki
50*4506c531SRafael J. Wysocki    Kernel subsystems can register callbacks to be invoked when the suspend
51*4506c531SRafael J. Wysocki    transition is about to occur and when the resume transition has finished.
52*4506c531SRafael J. Wysocki
53*4506c531SRafael J. Wysocki    That allows them to prepare for the change of the system state and to clean
54*4506c531SRafael J. Wysocki    up after getting back to the working state.
55*4506c531SRafael J. Wysocki
56*4506c531SRafael J. Wysocki 2. Freezing tasks.
57*4506c531SRafael J. Wysocki
58*4506c531SRafael J. Wysocki    Tasks are frozen primarily in order to avoid unchecked hardware accesses
59*4506c531SRafael J. Wysocki    from user space through MMIO regions or I/O registers exposed directly to
60*4506c531SRafael J. Wysocki    it and to prevent user space from entering the kernel while the next step
61*4506c531SRafael J. Wysocki    of the transition is in progress (which might have been problematic for
62*4506c531SRafael J. Wysocki    various reasons).
63*4506c531SRafael J. Wysocki
64*4506c531SRafael J. Wysocki    All user space tasks are intercepted as though they were sent a signal and
65*4506c531SRafael J. Wysocki    put into uninterruptible sleep until the end of the subsequent system resume
66*4506c531SRafael J. Wysocki    transition.
67*4506c531SRafael J. Wysocki
68*4506c531SRafael J. Wysocki    The kernel threads that choose to be frozen during system suspend for
69*4506c531SRafael J. Wysocki    specific reasons are frozen subsequently, but they are not intercepted.
70*4506c531SRafael J. Wysocki    Instead, they are expected to periodically check whether or not they need
71*4506c531SRafael J. Wysocki    to be frozen and to put themselves into uninterruptible sleep if so.  [Note,
72*4506c531SRafael J. Wysocki    however, that kernel threads can use locking and other concurrency controls
73*4506c531SRafael J. Wysocki    available in kernel space to synchronize themselves with system suspend and
74*4506c531SRafael J. Wysocki    resume, which can be much more precise than the freezing, so the latter is
75*4506c531SRafael J. Wysocki    not a recommended option for kernel threads.]
76*4506c531SRafael J. Wysocki
77*4506c531SRafael J. Wysocki 3. Suspending devices and reconfiguring IRQs.
78*4506c531SRafael J. Wysocki
79*4506c531SRafael J. Wysocki    Devices are suspended in four phases called *prepare*, *suspend*,
80*4506c531SRafael J. Wysocki    *late suspend* and *noirq suspend* (see :ref:`driverapi_pm_devices` for more
81*4506c531SRafael J. Wysocki    information on what exactly happens in each phase).
82*4506c531SRafael J. Wysocki
83*4506c531SRafael J. Wysocki    Every device is visited in each phase, but typically it is not physically
84*4506c531SRafael J. Wysocki    accessed in more than two of them.
85*4506c531SRafael J. Wysocki
86*4506c531SRafael J. Wysocki    The runtime PM API is disabled for every device during the *late* suspend
87*4506c531SRafael J. Wysocki    phase and high-level ("action") interrupt handlers are prevented from being
88*4506c531SRafael J. Wysocki    invoked before the *noirq* suspend phase.
89*4506c531SRafael J. Wysocki
90*4506c531SRafael J. Wysocki    Interrupts are still handled after that, but they are only acknowledged to
91*4506c531SRafael J. Wysocki    interrupt controllers without performing any device-specific actions that
92*4506c531SRafael J. Wysocki    would be triggered in the working state of the system (those actions are
93*4506c531SRafael J. Wysocki    deferred till the subsequent system resume transition as described
94*4506c531SRafael J. Wysocki    `below <s2idle_resume_>`_).
95*4506c531SRafael J. Wysocki
96*4506c531SRafael J. Wysocki    IRQs associated with system wakeup devices are "armed" so that the resume
97*4506c531SRafael J. Wysocki    transition of the system is started when one of them signals an event.
98*4506c531SRafael J. Wysocki
99*4506c531SRafael J. Wysocki 4. Freezing the scheduler tick and suspending timekeeping.
100*4506c531SRafael J. Wysocki
101*4506c531SRafael J. Wysocki    When all devices have been suspended, CPUs enter the idle loop and are put
102*4506c531SRafael J. Wysocki    into the deepest available idle state.  While doing that, each of them
103*4506c531SRafael J. Wysocki    "freezes" its own scheduler tick so that the timer events associated with
104*4506c531SRafael J. Wysocki    the tick do not occur until the CPU is woken up by another interrupt source.
105*4506c531SRafael J. Wysocki
106*4506c531SRafael J. Wysocki    The last CPU to enter the idle state also stops the timekeeping which
107*4506c531SRafael J. Wysocki    (among other things) prevents high resolution timers from triggering going
108*4506c531SRafael J. Wysocki    forward until the first CPU that is woken up restarts the timekeeping.
109*4506c531SRafael J. Wysocki    That allows the CPUs to stay in the deep idle state relatively long in one
110*4506c531SRafael J. Wysocki    go.
111*4506c531SRafael J. Wysocki
112*4506c531SRafael J. Wysocki    From this point on, the CPUs can only be woken up by non-timer hardware
113*4506c531SRafael J. Wysocki    interrupts.  If that happens, they go back to the idle state unless the
114*4506c531SRafael J. Wysocki    interrupt that woke up one of them comes from an IRQ that has been armed for
115*4506c531SRafael J. Wysocki    system wakeup, in which case the system resume transition is started.
116*4506c531SRafael J. Wysocki
117*4506c531SRafael J. Wysocki
118*4506c531SRafael J. Wysocki.. _s2idle_resume:
119*4506c531SRafael J. Wysocki
120*4506c531SRafael J. WysockiSuspend-to-idle Resume Code Flow
121*4506c531SRafael J. Wysocki================================
122*4506c531SRafael J. Wysocki
123*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from the
124*4506c531SRafael J. Wysocki:ref:`suspend-to-idle <s2idle>` sleep state into the working state:
125*4506c531SRafael J. Wysocki
126*4506c531SRafael J. Wysocki 1. Resuming timekeeping and unfreezing the scheduler tick.
127*4506c531SRafael J. Wysocki
128*4506c531SRafael J. Wysocki    When one of the CPUs is woken up (by a non-timer hardware interrupt), it
129*4506c531SRafael J. Wysocki    leaves the idle state entered in the last step of the preceding suspend
130*4506c531SRafael J. Wysocki    transition, restarts the timekeeping (unless it has been restarted already
131*4506c531SRafael J. Wysocki    by another CPU that woke up earlier) and the scheduler tick on that CPU is
132*4506c531SRafael J. Wysocki    unfrozen.
133*4506c531SRafael J. Wysocki
134*4506c531SRafael J. Wysocki    If the interrupt that has woken up the CPU was armed for system wakeup,
135*4506c531SRafael J. Wysocki    the system resume transition begins.
136*4506c531SRafael J. Wysocki
137*4506c531SRafael J. Wysocki 2. Resuming devices and restoring the working-state configuration of IRQs.
138*4506c531SRafael J. Wysocki
139*4506c531SRafael J. Wysocki    Devices are resumed in four phases called *noirq resume*, *early resume*,
140*4506c531SRafael J. Wysocki    *resume* and *complete* (see :ref:`driverapi_pm_devices` for more
141*4506c531SRafael J. Wysocki    information on what exactly happens in each phase).
142*4506c531SRafael J. Wysocki
143*4506c531SRafael J. Wysocki    Every device is visited in each phase, but typically it is not physically
144*4506c531SRafael J. Wysocki    accessed in more than two of them.
145*4506c531SRafael J. Wysocki
146*4506c531SRafael J. Wysocki    The working-state configuration of IRQs is restored after the *noirq* resume
147*4506c531SRafael J. Wysocki    phase and the runtime PM API is re-enabled for every device whose driver
148*4506c531SRafael J. Wysocki    supports it during the *early* resume phase.
149*4506c531SRafael J. Wysocki
150*4506c531SRafael J. Wysocki 3. Thawing tasks.
151*4506c531SRafael J. Wysocki
152*4506c531SRafael J. Wysocki    Tasks frozen in step 2 of the preceding `suspend <s2idle_suspend_>`_
153*4506c531SRafael J. Wysocki    transition are "thawed", which means that they are woken up from the
154*4506c531SRafael J. Wysocki    uninterruptible sleep that they went into at that time and user space tasks
155*4506c531SRafael J. Wysocki    are allowed to exit the kernel.
156*4506c531SRafael J. Wysocki
157*4506c531SRafael J. Wysocki 4. Invoking system-wide resume notifiers.
158*4506c531SRafael J. Wysocki
159*4506c531SRafael J. Wysocki    This is analogous to step 1 of the `suspend <s2idle_suspend_>`_ transition
160*4506c531SRafael J. Wysocki    and the same set of callbacks is invoked at this point, but a different
161*4506c531SRafael J. Wysocki    "notification type" parameter value is passed to them.
162*4506c531SRafael J. Wysocki
163*4506c531SRafael J. Wysocki
164*4506c531SRafael J. WysockiPlatform-dependent Suspend Code Flow
165*4506c531SRafael J. Wysocki====================================
166*4506c531SRafael J. Wysocki
167*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from the working
168*4506c531SRafael J. Wysockistate to platform-dependent suspend state:
169*4506c531SRafael J. Wysocki
170*4506c531SRafael J. Wysocki 1. Invoking system-wide suspend notifiers.
171*4506c531SRafael J. Wysocki
172*4506c531SRafael J. Wysocki    This step is the same as step 1 of the suspend-to-idle suspend transition
173*4506c531SRafael J. Wysocki    described `above <s2idle_suspend_>`_.
174*4506c531SRafael J. Wysocki
175*4506c531SRafael J. Wysocki 2. Freezing tasks.
176*4506c531SRafael J. Wysocki
177*4506c531SRafael J. Wysocki    This step is the same as step 2 of the suspend-to-idle suspend transition
178*4506c531SRafael J. Wysocki    described `above <s2idle_suspend_>`_.
179*4506c531SRafael J. Wysocki
180*4506c531SRafael J. Wysocki 3. Suspending devices and reconfiguring IRQs.
181*4506c531SRafael J. Wysocki
182*4506c531SRafael J. Wysocki    This step is analogous to step 3 of the suspend-to-idle suspend transition
183*4506c531SRafael J. Wysocki    described `above <s2idle_suspend_>`_, but the arming of IRQs for system
184*4506c531SRafael J. Wysocki    wakeup generally does not have any effect on the platform.
185*4506c531SRafael J. Wysocki
186*4506c531SRafael J. Wysocki    There are platforms that can go into a very deep low-power state internally
187*4506c531SRafael J. Wysocki    when all CPUs in them are in sufficiently deep idle states and all I/O
188*4506c531SRafael J. Wysocki    devices have been put into low-power states.  On those platforms,
189*4506c531SRafael J. Wysocki    suspend-to-idle can reduce system power very effectively.
190*4506c531SRafael J. Wysocki
191*4506c531SRafael J. Wysocki    On the other platforms, however, low-level components (like interrupt
192*4506c531SRafael J. Wysocki    controllers) need to be turned off in a platform-specific way (implemented
193*4506c531SRafael J. Wysocki    in the hooks provided by the platform driver) to achieve comparable power
194*4506c531SRafael J. Wysocki    reduction.
195*4506c531SRafael J. Wysocki
196*4506c531SRafael J. Wysocki    That usually prevents in-band hardware interrupts from waking up the system,
197*4506c531SRafael J. Wysocki    which must be done in a special platform-dependent way.  Then, the
198*4506c531SRafael J. Wysocki    configuration of system wakeup sources usually starts when system wakeup
199*4506c531SRafael J. Wysocki    devices are suspended and is finalized by the platform suspend hooks later
200*4506c531SRafael J. Wysocki    on.
201*4506c531SRafael J. Wysocki
202*4506c531SRafael J. Wysocki 4. Disabling non-boot CPUs.
203*4506c531SRafael J. Wysocki
204*4506c531SRafael J. Wysocki    On some platforms the suspend hooks mentioned above must run in a one-CPU
205*4506c531SRafael J. Wysocki    configuration of the system (in particular, the hardware cannot be accessed
206*4506c531SRafael J. Wysocki    by any code running in parallel with the platform suspend hooks that may,
207*4506c531SRafael J. Wysocki    and often do, trap into the platform firmware in order to finalize the
208*4506c531SRafael J. Wysocki    suspend transition).
209*4506c531SRafael J. Wysocki
210*4506c531SRafael J. Wysocki    For this reason, the CPU offline/online (CPU hotplug) framework is used
211*4506c531SRafael J. Wysocki    to take all of the CPUs in the system, except for one (the boot CPU),
212*4506c531SRafael J. Wysocki    offline (typically, the CPUs that have been taken offline go into deep idle
213*4506c531SRafael J. Wysocki    states).
214*4506c531SRafael J. Wysocki
215*4506c531SRafael J. Wysocki    This means that all tasks are migrated away from those CPUs and all IRQs are
216*4506c531SRafael J. Wysocki    rerouted to the only CPU that remains online.
217*4506c531SRafael J. Wysocki
218*4506c531SRafael J. Wysocki 5. Suspending core system components.
219*4506c531SRafael J. Wysocki
220*4506c531SRafael J. Wysocki    This prepares the core system components for (possibly) losing power going
221*4506c531SRafael J. Wysocki    forward and suspends the timekeeping.
222*4506c531SRafael J. Wysocki
223*4506c531SRafael J. Wysocki 6. Platform-specific power removal.
224*4506c531SRafael J. Wysocki
225*4506c531SRafael J. Wysocki    This is expected to remove power from all of the system components except
226*4506c531SRafael J. Wysocki    for the memory controller and RAM (in order to preserve the contents of the
227*4506c531SRafael J. Wysocki    latter) and some devices designated for system wakeup.
228*4506c531SRafael J. Wysocki
229*4506c531SRafael J. Wysocki    In many cases control is passed to the platform firmware which is expected
230*4506c531SRafael J. Wysocki    to finalize the suspend transition as needed.
231*4506c531SRafael J. Wysocki
232*4506c531SRafael J. Wysocki
233*4506c531SRafael J. WysockiPlatform-dependent Resume Code Flow
234*4506c531SRafael J. Wysocki===================================
235*4506c531SRafael J. Wysocki
236*4506c531SRafael J. WysockiThe following steps are taken in order to transition the system from a
237*4506c531SRafael J. Wysockiplatform-dependent suspend state into the working state:
238*4506c531SRafael J. Wysocki
239*4506c531SRafael J. Wysocki 1. Platform-specific system wakeup.
240*4506c531SRafael J. Wysocki
241*4506c531SRafael J. Wysocki    The platform is woken up by a signal from one of the designated system
242*4506c531SRafael J. Wysocki    wakeup devices (which need not be an in-band hardware interrupt)  and
243*4506c531SRafael J. Wysocki    control is passed back to the kernel (the working configuration of the
244*4506c531SRafael J. Wysocki    platform may need to be restored by the platform firmware before the
245*4506c531SRafael J. Wysocki    kernel gets control again).
246*4506c531SRafael J. Wysocki
247*4506c531SRafael J. Wysocki 2. Resuming core system components.
248*4506c531SRafael J. Wysocki
249*4506c531SRafael J. Wysocki    The suspend-time configuration of the core system components is restored and
250*4506c531SRafael J. Wysocki    the timekeeping is resumed.
251*4506c531SRafael J. Wysocki
252*4506c531SRafael J. Wysocki 3. Re-enabling non-boot CPUs.
253*4506c531SRafael J. Wysocki
254*4506c531SRafael J. Wysocki    The CPUs disabled in step 4 of the preceding suspend transition are taken
255*4506c531SRafael J. Wysocki    back online and their suspend-time configuration is restored.
256*4506c531SRafael J. Wysocki
257*4506c531SRafael J. Wysocki 4. Resuming devices and restoring the working-state configuration of IRQs.
258*4506c531SRafael J. Wysocki
259*4506c531SRafael J. Wysocki    This step is the same as step 2 of the suspend-to-idle suspend transition
260*4506c531SRafael J. Wysocki    described `above <s2idle_resume_>`_.
261*4506c531SRafael J. Wysocki
262*4506c531SRafael J. Wysocki 5. Thawing tasks.
263*4506c531SRafael J. Wysocki
264*4506c531SRafael J. Wysocki    This step is the same as step 3 of the suspend-to-idle suspend transition
265*4506c531SRafael J. Wysocki    described `above <s2idle_resume_>`_.
266*4506c531SRafael J. Wysocki
267*4506c531SRafael J. Wysocki 6. Invoking system-wide resume notifiers.
268*4506c531SRafael J. Wysocki
269*4506c531SRafael J. Wysocki    This step is the same as step 4 of the suspend-to-idle suspend transition
270*4506c531SRafael J. Wysocki    described `above <s2idle_resume_>`_.
271