1.. _skiboot-5.11:
2
3skiboot-5.11
4============
5
6skiboot v5.11 was released on Friday April 6th 2018. It is the first
7release of skiboot 5.11, which is now the new stable release
8of skiboot following the 5.10 release, first released February 23rd 2018.
9
10It is *not* expected to keep the 5.11 branch around for long, and instead
11quickly move onto a 6.0, which will mark the basis for op-build v2.0 and
12will be required for POWER9 systems.
13
14It is expected that skiboot 6.0 will follow very shortly. Consider 5.11
15more of a beta release to 6.0 than anything. For POWER9 systems it should
16certainly be more solid than previous releases though.
17
18skiboot v5.11 contains all bug fixes as of :ref:`skiboot-5.10.4`
19and :ref:`skiboot-5.4.9` (the currently maintained stable releases). There
20may be more 5.10.x stable releases, it will depend on demand.
21
22For how the skiboot stable releases work, see :ref:`stable-rules` for details.
23
24Over skiboot-5.10, we have the following changes:
25
26New Platforms
27-------------
28
29- Add VESNIN platform support
30
31  The Vesnin platform from YADRO is a 4 socked POWER8 system with up to 8TB
32  of memory with 460GB/s of memory bandwidth in only 2U. Many kudos to the
33  team from Yadro for submitting their code upstream!
34
35New Features
36------------
37
38- fast-reboot: enable by default for POWER9
39
40  - Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is used
41
42- PCI tunneled operations on PHB4
43
44  - phb4: set PBCQ Tunnel BAR for tunneled operations
45
46    P9 supports PCI tunneled operations (atomics and as_notify) that are
47    initiated by devices.
48
49    A subset of the tunneled operations require a response, that must be
50    sent back from the host to the device. For example, an atomic compare
51    and swap will return the compare status, as swap will only performed
52    in case of success.  Similarly, as_notify reports if the target thread
53    has been woken up or not, because the operation may fail.
54
55    To enable tunneled operations, a device driver must tell the host where
56    it expects tunneled operation responses, by setting the PBCQ Tunnel BAR
57    Response register with a specific value within the range of its BARs.
58
59    This register is currently initialized by enable_capi_mode(). But, as
60    tunneled operations may also operate in PCI mode, a new API is required
61    to set the PBCQ Tunnel BAR Response register, without switching to CAPI
62    mode.
63
64    This patch provides two new OPAL calls to get/set the PBCQ Tunnel
65    BAR Response register.
66
67    Note: as there is only one PBCQ Tunnel BAR register, shared between
68    all the devices connected to the same PHB, only one of these devices
69    will be able to use tunneled operations, at any time.
70  - phb4: set PHB CMPM registers for tunneled operations
71
72    P9 supports PCI tunneled operations (atomics and as_notify) that require
73    setting the PHB ASN Compare/Mask register with a 16-bit indication.
74
75    This register is currently initialized by enable_capi_mode(). But, as
76    tunneled operations may also work in PCI mode, the ASN Compare/Mask
77    register should rather be initialized in phb4_init_ioda3().
78
79    This patch also adds "ibm,phb-indications" to the device tree, to tell
80    Linux the values of CAPI, ASN, and NBW indications, when supported.
81
82    Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies
83    in PCI mode.
84
85- Tie tm-suspend fw-feature and opal_reinit_cpus() together
86
87  Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED)
88  always returns OPAL_UNSUPPORTED.
89
90  This ties the tm suspend fw-feature to the
91  opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when tm
92  suspend is disabled, we correctly report it to the kernel.  For
93  backwards compatibility, it's assumed tm suspend is available if the
94  fw-feature is not present.
95
96  Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N
97  DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED).  DD2.0 and
98  below has TM disabled completely (not just suspend).
99
100  We are using opal_reinit_cpus() to determine this setting (rather than
101  the device tree/HDAT) as some future firmware may let us change this
102  dynamically after boot. That is not the case currently though.
103
104Power Management
105----------------
106
107- SLW: Increase stop4-5 residency by 10x
108
109  Using DGEMM benchmark we observed there was a drop of 5-9% throughput with
110  and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup
111  and provide the subsequent data block to compute. The wakup latency
112  accumulates over the run and shows up as a performance drop.
113
114  Linux enters stop4/5 more aggressively for its wakeup latency. Increasing
115  the residency from 1ms to 10ms makes the performance drop <1%
116- occ: Set up OCC messaging even if we fail to setup pstates
117
118  This means that we no longer hit this bug if we fail to get valid pstates
119  from the OCC. ::
120
121    [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
122    echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear
123    [   94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
124    [   94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8
125    [   10.318805] Disabling lock debugging due to kernel taint
126    [   10.318808] Severe Machine check interrupt [Not recovered]
127    [   10.318812]   NIP [000000003003e434]: 0x3003e434
128    [   10.318813]   Initiator: CPU
129    [   10.318815]   Error type: Real address [Load/Store (foreign)]
130    [   10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception
131    [   10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G   M             4.15.9-openpower1 #3
132    [   10.318823] NIP:  000000003003e434 LR: 000000003003025c CTR: 0000000030030240
133    [   10.318825] REGS: c00000003fa7bd80 TRAP: 0200   Tainted: G   M              (4.15.9-openpower1)
134    [   10.318826] MSR:  9000000000201002 <SF,HV,ME,RI>  CR: 48002888  XER: 20040000
135    [   10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1
136
137
138mbox based platforms
139^^^^^^^^^^^^^^^^^^^^
140
141For platforms using the mbox protocol for host flash access (all BMC based
142OpenPOWER systems, most OpenBMC based systems) there have been some hardening
143efforts in the event of the BMC being poorly behaved.
144
145- mbox: Reduce default BMC timeouts
146
147  Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin for
148  70 seconds waiting for a BMC to come back. This also makes the current
149  default of 30 seconds a bit pointless, is it far too short to be a
150  worse case wait time but too long to avoid hitting hardlockup detectors
151  and wrecking havoc inside host linux.
152
153  Just change it to three seconds so that host linux will survive and
154  that, reads and writes will fail but at least the host stays up.
155
156  Also refactored the waiting loop just a bit so that it's easier to read.
157- mbox: Harden against BMC daemon errors
158
159  Bugs present in the BMC daemon mean that skiboot gets presented with
160  mbox windows of size zero. These windows cannot be valid and skiboot
161  already detects these conditions.
162
163  Currently skiboot warns quite strongly about the occurrence of these
164  problems. The problem for skiboot is that it doesn't take any action.
165  Initially I wanting to avoid putting policy like this into skiboot but
166  since these bugs aren't going away and skiboot barfing is leading to
167  lockups and ultimately the host going down something needs to be done.
168
169  I propose that when we detect the problem we fail the mbox call and punt
170  the problem back up to Linux. I don't like it but at least it will cause
171  errors to cascade and won't bring the host down. I'm not sure how Linux
172  is supposed to detect this or what it can even do but this is better
173  than a crash.
174
175  Diagnosing a failure to boot if skiboot its self fails to read flash may
176  be marginally more difficult with this patch. This is because skiboot
177  will now only print one warning about the zero sized window rather than
178  continuously spitting it out.
179
180Fast Reboot Improvements
181------------------------
182
183Around fast-reboot we have made several improvements to harden the fast
184reboot code paths and resort to a full IPL if something doesn't look right.
185
186- core/fast-reboot: zero memory after fast reboot
187
188  This improves the security and predictability of the fast reboot
189  environment.
190
191  There can not be a secure fence between fast reboots, because a
192  malicious OS can modify the firmware itself. However a well-behaved
193  OS can have a reasonable expectation that OS memory regions it has
194  modified will be cleared upon fast reboot.
195
196  The memory is zeroed after all other CPUs come up from fast reboot,
197  just before the new kernel is loaded and booted into. This allows
198  image preloading to run concurrently, and will allow parallelisation
199  of the clearing in future.
200- core/fast-reboot: verify mem regions before fast reboot
201
202  Run the mem_region sanity checkers before proceeding with fast
203  reboot.
204
205  This is the beginning of proactive sanity checks on opal data
206  for fast reboot (with complements the reactive disable_fast_reboot
207  cases). This is encouraged to re-use and share any kind of debug
208  code and unit test code.
209- fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they exist
210- core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors
211
212  This disables fast reboot in several more cases where serious errors
213  like lock corruption or call re-entrancy are detected.
214- capp: Disable fast-reboot whenever enable_capi_mode() is called
215
216  This patch updates phb4_set_capi_mode() to disable fast-reboot
217  whenever enable_capi_mode() is called, irrespective to its return
218  value. This should prevent against a possibility of not disabling
219  fast-reboot when some changes to enable_capi_mode() causing return of
220  an error and leaving CAPP in enabled mode.
221- fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt
222
223  Fast-reboot in P8 fails to re-init OCC data as there are chipwise OCC
224  nodes which are already present in the /ibm,opal/power-mgt node. These
225  per-chip nodes hold the voltage IDs for each pstate and these can be
226  changed on OCC pstate table biasing. So delete these before calling
227  the re-init code to re-parse and populate the pstate data.
228
229Debugging/SRESET improvemens
230----------------------------
231
232Since :ref:`skiboot-5.11-rc1`:
233
234- core/cpu: Prevent clobbering of stack guard for boot-cpu
235
236  Commit 90d53934c2da ("core/cpu: discover stack region size before
237  initialising memory regions") introduced memzero for struct cpu_thread
238  in init_cpu_thread(). This has an unintended side effect of clobbering
239  the stack-guard cannery of the boot_cpu stack. This results in opal
240  failing to init with this failure message: ::
241
242    CPU: P9 generation processor (max 4 threads/core)
243    CPU: Boot CPU PIR is 0x0004 PVR is 0x004e1200
244    Guard skip = 0
245    Stack corruption detected !
246    Aborting!
247    CPU 0004 Backtrace:
248     S: 0000000031c13ab0 R: 0000000030013b0c   .backtrace+0x5c
249     S: 0000000031c13b50 R: 000000003001bd18   ._abort+0x60
250     S: 0000000031c13be0 R: 0000000030013bbc   .__stack_chk_fail+0x54
251     S: 0000000031c13c60 R: 00000000300c5b70   .memset+0x12c
252     S: 0000000031c13d00 R: 0000000030019aa8   .init_cpu_thread+0x40
253     S: 0000000031c13d90 R: 000000003001b520   .init_boot_cpu+0x188
254     S: 0000000031c13e30 R: 0000000030015050   .main_cpu_entry+0xd0
255     S: 0000000031c13f00 R: 0000000030002700   boot_entry+0x1c0
256
257  So the patch provides a fix by tweaking the memset() call in
258  init_cpu_thread() to skip over the stack-guard cannery.
259- core/lock.c: ensure valid start value for lock spin duration warning
260
261  The previous fix in a8e6cc3f4 only addressed half of the problem, as
262  we could also get an invalid value for start, causing us to fail
263  in a weird way.
264
265  This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS
266  test in op-test-framework.
267
268  You'd get to this part of the test and get the erroneous lock
269  spinning warnings: ::
270
271    PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000
272    0000080000000000
273    [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
274    [  790.140976993,4] WARNING: Lock has been spinning for 790275ms
275    [  790.140976918,4] WARNING: Lock has been spinning for 790275ms
276
277  This patch checks the validity of timebase before setting start,
278  and only checks the lock timeout if we got a valid start value.
279
280
281Since :ref:`skiboot-5.10`:
282
283- core/opal: allow some re-entrant calls
284
285  This allows a small number of OPAL calls to succeed despite re-entering
286  the firmware, and rejects others rather than aborting.
287
288  This allows a system reset interrupt that interrupts OPAL to do something
289  useful. Sreset other CPUs, use the console, which allows xmon to work or
290  stack traces to be printed, reboot the system.
291
292  Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is
293  used for many other things that does not mean a serious permanent error.
294- core/opal: abort in case of re-entrant OPAL call
295
296  The stack is already destroyed by the time we get here, so there
297  is not much point continuing.
298- core/lock: Add lock timeout warnings
299
300  There are currently no timeout warnings for locks in skiboot. We assume
301  that the lock will eventually become free, which may not always be the
302  case.
303
304  This patch adds timeout warnings for locks. Any lock which spins for more
305  than 5 seconds will throw a warning and stacktrace for that thread. This is
306  useful for debugging siturations where a lock which hang, waiting for the
307  lock to be freed.
308- core/lock: Add deadlock detection
309
310  This adds simple deadlock detection. The detection looks for circular
311  dependencies in the lock requests. It will abort and display a stack trace
312  when a deadlock occurs.
313  The detection is enabled by DEBUG_LOCKS (enabled by default).
314  While the detection may have a slight performance overhead, as there are
315  not a huge number of locks in skiboot this overhead isn't significant.
316- core/hmi: report processor recovery reason from core FIR bits on P9
317
318  When an error is encountered that causes processor recovery, HMI is
319  generated if the recovery was successful. The reason is recorded in
320  the core FIR, which gets copied into the WOF.
321
322  In this case dump the WOF register and an error string into the OPAL
323  msglog.
324
325  A broken init setting led to HMIs reported in Linux as: ::
326
327    [    3.591547] Harmless Hypervisor Maintenance interrupt [Recovered]
328    [    3.591648]  Error detail: Processor Recovery done
329    [    3.591714]  HMER: 2040000000000000
330
331  This patch would have been useful because it tells us exactly that
332  the problem is in the d-side ERAT: ::
333
334    [  414.489690798,7] HMI: Received HMI interrupt: HMER = 0x2040000000000000
335    [  414.489693339,7] HMI: [Loc: UOPWR.0000000-Node0-Proc0]: P:0 C:1 T:1: Processor recovery occurred.
336    [  414.489699837,7] HMI: Core WOF = 0x0000000410000000 recovered error:
337    [  414.489701543,7] HMI: LSU - SRAM (DCACHE parity, etc)
338    [  414.489702341,7] HMI: LSU - ERAT multi hit
339
340  In future it will be good to unify this reporting, so Linux could
341  print something more useful. Until then, this gives some good data.
342
343NPU2/NVLink2 Fixes
344------------------
345- npu2: Add performance tuning SCOM inits
346
347  Peer-to-peer GPU bandwidth latency testing has produced some tunable
348  values that improve performance. Add them to our device initialization.
349
350  File these under things that need to be cleaned up with nice #defines
351  for the register names and bitfields when we get time.
352
353  A few of the settings are dependent on the system's particular NVLink
354  topology, so introduce a helper to determine how many links go to a
355  single GPU.
356- hw/npu2: Assign a unique LPARSHORTID per GPU
357
358  This gets used elsewhere to index items in the XTS tables.
359- NPU2: dump NPU2 registers on npu2 HMI
360
361  Due to the nature of debugging npu2 issues, folk are wanting the
362  full list of NPU2 registers dumped when there's a problem.
363- npu2: Remove DD1 support
364
365  Major changes in the NPU between DD1 and DD2 necessitated a fair bit of
366  revision-specific code.
367
368  Now that all our lab machines are DD2, we no longer test anything on DD1
369  and it's time to get rid of it.
370
371  Remove DD1-specific code and abort probe if we're running on a DD1 machine.
372- npu2: Disable fast reboot
373
374  Fast reboot does not yet work right with the NPU. It's been disabled on
375  NVLink and OpenCAPI machines. Do the same for NVLink2.
376
377  This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset")
378  from the npu code to npu2.
379- npu2: Use unfiltered mode in XTS tables
380
381  The XTS_PID context table is limited to 256 possible pids/contexts. To
382  relieve this limitation, make use of "unfiltered mode" instead.
383
384  If an entry in the XTS_BDF table has the bit for unfiltered mode set, we
385  can just use one context for that entire bdf/lpar, regardless of pid.
386  Instead of of searching the XTS_PID table, the NMMU checkout request
387  will simply use the entry indexed by lparshort id instead.
388
389  Change opal_npu_init_context() to create these lparshort-indexed
390  wildcard entries (0-15) instead of allocating one for each pid. Check
391  that multiple calls for the same bdf all specify the same msr value.
392
393  In opal_npu_destroy_context(), continue validating the bdf argument,
394  ensuring that it actually maps to an lpar, but no longer remove anything
395  from the XTS_PID table. If/when we start supporting virtualized GPUs, we
396  might consider actually removing these wildcard entries by keeping a
397  refcount, but keep things simple for now.
398
399CAPI/OpenCAPI
400-------------
401
402Since :ref:`skiboot-5.11-rc1`:
403
404- capi: Poll Err/Status register during CAPP recovery
405
406  This patch updates do_capp_recovery_scoms() to poll the CAPP
407  Err/Status control register, check for CAPP-Recovery to complete/fail
408  based on indications of BITS-1,5,9 and then proceed with the
409  CAPP-Recovery scoms iif recovery completed successfully. This would
410  prevent cases where we bring-up the PCIe link while recovery sequencer
411  on CAPP is still busy with casting out cache lines.
412
413  In case CAPP-Recovery didn't complete successfully an error is returned
414  from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4
415  fenced and mark it as broken.
416
417  The loop that implements polling of Err/Status register will also log
418  an error on the PHB when it continues for more than 168ms which is the
419  max time to failure for CAPP-Recovery.
420
421Since :ref:`skiboot-5.10`:
422
423- npu2-opencapi: Add OpenCAPI OPAL API calls
424
425  Add three OPAL API calls that are required by the ocxl driver.
426
427  - OPAL_NPU_SPA_SETUP
428
429    The Shared Process Area (SPA) is a table containing one entry (a
430    "Process Element") per memory context which can be accessed by the
431    OpenCAPI device.
432
433  - OPAL_NPU_SPA_CLEAR_CACHE
434
435    The NPU keeps a cache of recently accessed memory contexts. When a
436    Process Element is removed from the SPA, the cache for the link must be
437    cleared.
438
439  - OPAL_NPU_TL_SET
440
441    The Transaction Layer specification defines several templates for
442    messages to be exchanged on the link. During link setup, the host and
443    device must negotiate what templates are supported on both sides and at
444    what rates those messages can be sent.
445- npu2-opencapi: Train OpenCAPI links and setup devices
446
447  Scan the OpenCAPI links under the NPU, and for each link, reset the card,
448  set up a device, train the link and register a PHB.
449
450  Implement the necessary operations for the OpenCAPI PHB type.
451
452  For bringup, test and debug purposes, we allow an NVRAM setting,
453  "opencapi-link-training" that can be set to either disable link training
454  completely or to use the prbs31 test pattern.
455
456  To disable link training: ::
457
458    nvram -p ibm,skiboot --update-config opencapi-link-training=none
459
460  To use prbs31: ::
461
462    nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31
463- npu2-hw-procedures: Add support for OpenCAPI PHY link training
464
465  Unlike NVLink, which uses the pci-virt framework to fake a PCI
466  configuration space for NVLink devices, the OpenCAPI device model presents
467  us with a real configuration space handled by the device over the OpenCAPI
468  link.
469
470  As a result, we have to train the OpenCAPI link in skiboot before we do PCI
471  probing, so that config space can be accessed, rather than having link
472  training being triggered by the Linux driver.
473- npu2-opencapi: Configure NPU for OpenCAPI
474
475  Scan the device tree for NPUs with OpenCAPI links and configure the NPU per
476  the initialisation sequence in the NPU OpenCAPI workbook.
477- capp: Make error in capp timebase sync a non-fatal error
478
479  Presently when we encounter an error while synchronizing capp timebase
480  with chip-tod at the end of enable_capi_mode() we return an
481  error. This has an to unintended consequences. First this will prevent
482  disabling of fast-reboot even though CAPP is already enabled by this
483  point. Secondly, failure during timebase sync is a non fatal error or
484  capp initialization as CAPP/PSL can continue working after this and an
485  AFU will only see an error when it tries to read the timebase value
486  from PSL.
487
488  So this patch updates enable_capi_mode() to not return an error in
489  case call to chiptod_capp_timebase_sync() fails. The function will now
490  just log an error and continue further with capp init sequence. This
491  make the current implementation align with the one in kernel 'cxl'
492  driver which also assumes the PSL timebase sync errors as non-fatal
493  init error.
494- npu2-opencapi: Fix assert on link reset during init
495
496  We don't support resetting an opencapi link yet.
497
498  Commit fe6d86b9 ("pci: Make fast reboot creset PHBs in parallel")
499  tries resetting any PHB whose slot defines a 'run_sm' callback. It
500  raises an assert when applied to an opencapi PHB, as 'run_sm' calls
501  the 'freset' callback, which is not yet defined for opencapi.
502
503  Fix it for now by removing the currently useless definition of
504  'run_sm' on the opencapi slot. It will print a message in the skiboot
505  log because the PHB cannot be reset, which is correct. It will all go
506  away when we add support for resetting an opencapi link.
507- capp: Add lid definition for P9 DD-2.2
508
509  Update fsp_lid_map to include CAPP ucode lid for phb4-chipid ==
510  0x202d1 that corresponds to P9 DD-2.2 chip.
511- capp: Disable fast-reboot when capp is enabled
512
513
514PCI
515---
516
517Since :ref:`skiboot-5.11-rc1`:
518
519- phb4: Reset FIR/NFIR registers before PHB4 probe
520
521  The function phb4_probe_stack() resets "ETU Reset Register" to
522  unfreeze the PHB before it performs mmio access on the PHB. However in
523  case the FIR/NFIR registers are set while entering this function,
524  the reset of "ETU Reset Register" wont unfreeze the PHB and it will
525  remain fenced. This leads to failure during initial CRESET of the PHB
526  as mmio access is still not enabled and an error message of the form
527  below is logged: ::
528
529     PHB#0000[0:0]: Initializing PHB4...
530     PHB#0000[0:0]: Default system config: 0xffffffffffffffff
531     PHB#0000[0:0]: New system config    : 0xffffffffffffffff
532     PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff
533     PHB#0000[0:0]: Waiting for DLP PG reset to complete...
534     <snip>
535     PHB#0000[0:0]: Timeout waiting for DLP PG reset !
536     PHB#0000[0:0]: Initialization failed
537
538  This is especially seen happening during the MPIPL flow where SBE
539  would quiesces and fence the PHB so that it doesn't stomp on the main
540  memory. However when skiboot enters phb4_probe_stack() after MPIPL,
541  the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU
542  reset is done.
543
544  So to fix this issue the patch introduces new xscom writes to
545  phb4_probe_stack() to reset the FIR/NFIR registers before performing
546  ETU reset to enable mmio access to the PHB.
547
548Since :ref:`skiboot-5.10`:
549
550- pci: Reduce log level of error message
551
552  If a link doesn't train, we can end up with error messages like this: ::
553
554    [   63.027261959,3] PHB#0032[8:2]: LINK: Timeout waiting for electrical link
555    [   63.027265573,3] PHB#0032:00:00.0 Error -6 resetting
556
557  The first message is useful but the second message is just debug from
558  the core PCI code and is confusing to print to the console.
559
560  This reduces the second print to debug level so it's not seen by the
561  console by default.
562- Revert "platforms/astbmc/slots.c: Allow comparison of bus numbers when matching slots"
563
564  This reverts commit bda7cc4d0354eb3f66629d410b2afc08c79f795f.
565
566  Ben says:
567  It's on purpose that we do NOT compare the bus numbers,
568  they are always 0 in the slot table
569  we do a hierarchical walk of the tree, matching only the
570  devfn's along the way bcs the bus numbering isn't fixed
571  this breaks all slot naming etc... stuff on anything using
572  the "skiboot" slot tables (P8 opp typically)
573- core/pci-dt-slot: Fix booting with no slot map
574
575  Currently if you don't have a slot map in the device tree in
576  /ibm,pcie-slots, you can crash with a back trace like this: ::
577
578    CPU 0034 Backtrace:
579     S: 0000000031cd3370 R: 000000003001362c   .backtrace+0x48
580     S: 0000000031cd3410 R: 0000000030019e38   ._abort+0x4c
581     S: 0000000031cd3490 R: 000000003002760c   .exception_entry+0x180
582     S: 0000000031cd3670 R: 0000000000001f10 *
583     S: 0000000031cd3850 R: 00000000300b4f3e * cpu_features_table+0x1d9e
584     S: 0000000031cd38e0 R: 000000003002682c   .dt_node_is_compatible+0x20
585     S: 0000000031cd3960 R: 0000000030030e08   .map_pci_dev_to_slot+0x16c
586     S: 0000000031cd3a30 R: 0000000030091054   .dt_slot_get_slot_info+0x28
587     S: 0000000031cd3ac0 R: 000000003001e27c   .pci_scan_one+0x2ac
588     S: 0000000031cd3ba0 R: 000000003001e588   .pci_scan_bus+0x70
589     S: 0000000031cd3cb0 R: 000000003001ee74   .pci_scan_phb+0x100
590     S: 0000000031cd3d40 R: 0000000030017ff0   .cpu_process_jobs+0xdc
591     S: 0000000031cd3e00 R: 0000000030014cb0   .__secondary_cpu_entry+0x44
592     S: 0000000031cd3e80 R: 0000000030014d04   .secondary_cpu_entry+0x34
593     S: 0000000031cd3f00 R: 0000000030002770   secondary_wait+0x8c
594    [   73.016947149,3] Fatal MCE at 0000000030026054   .dt_find_property+0x30
595    [   73.017073254,3] CFAR : 0000000030026040
596    [   73.017138048,3] SRR0 : 0000000030026054 SRR1 : 9000000000201000
597    [   73.017198375,3] HSRR0: 0000000000000000 HSRR1: 0000000000000000
598    [   73.017263210,3] DSISR: 00000008         DAR  : 7c7b1b7848002524
599    [   73.017352517,3] LR   : 000000003002602c CTR  : 000000003009102c
600    [   73.017419778,3] CR   : 20004204         XER  : 20040000
601    [   73.017502425,3] GPR00: 000000003002682c GPR16: 0000000000000000
602    [   73.017586924,3] GPR01: 0000000031c23670 GPR17: 0000000000000000
603    [   73.017643873,3] GPR02: 00000000300fd500 GPR18: 0000000000000000
604    [   73.017767091,3] GPR03: fffffffffffffff8 GPR19: 0000000000000000
605    [   73.017855707,3] GPR04: 00000000300b3dc6 GPR20: 0000000000000000
606    [   73.017943944,3] GPR05: 0000000000000000 GPR21: 00000000300bb6d2
607    [   73.018024709,3] GPR06: 0000000031c23910 GPR22: 0000000000000000
608    [   73.018117716,3] GPR07: 0000000031c23930 GPR23: 0000000000000000
609    [   73.018195974,3] GPR08: 0000000000000000 GPR24: 0000000000000000
610    [   73.018278350,3] GPR09: 0000000000000000 GPR25: 0000000000000000
611    [   73.018353795,3] GPR10: 0000000000000028 GPR26: 00000000300be6fb
612    [   73.018424362,3] GPR11: 0000000000000000 GPR27: 0000000000000000
613    [   73.018533159,3] GPR12: 0000000020004208 GPR28: 0000000030767d38
614    [   73.018642725,3] GPR13: 0000000031c20000 GPR29: 00000000300b3dc6
615    [   73.018737925,3] GPR14: 0000000000000000 GPR30: 0000000000000010
616    [   73.018794428,3] GPR15: 0000000000000000 GPR31: 7c7b1b7848002514
617
618  This has been seen in the lab on a witherspoon using the device tree
619  entry point (ie. no HDAT).
620
621  This fixes the null pointer deref.
622
623Bugs Fixed
624----------
625Since :ref:`skiboot-5.11-rc1`:
626
627- cpufeatures: Fix setting DARN and SCV HWCAP feature bits
628
629  DARN and SCV has been assigned AT_HWCAP2 (32-63) bits: ::
630
631    #define PPC_FEATURE2_DARN               0x00200000 /* darn random number insn */
632    #define PPC_FEATURE2_SCV                0x00100000 /* scv syscall */
633
634  A cpufeatures-aware OS will not advertise these to userspace without
635  this patch.
636- xive: disable store EOI support
637
638  Hardware has limitations which would require to put a sync after each
639  store EOI to make sure the MMIO operations that change the ESB state
640  are ordered. This is a killer for performance and the PHBs do not
641  support the sync. So remove the store EOI for the moment, until
642  hardware is improved.
643
644  Also, while we are at changing the XIVE source flags, let's fix the
645  settings for the PHB4s which should follow these rules :
646
647  - SHIFT_BUG    for DD10
648  - STORE_EOI    for DD20 and if enabled
649  - TRIGGER_PAGE for DDx0 and if not STORE_EOI
650
651Since :ref:`skiboot-5.10`:
652
653- xive: fix opal_xive_set_vp_info() error path
654
655  In case of error, opal_xive_set_vp_info() will return without
656  unlocking the xive object. This is most certainly a typo.
657- hw/imc: don't access homer memory if it was not initialised
658
659  This can happen under mambo, at least.
660- nvram: run nvram_validate() after nvram_reformat()
661
662  nvram_reformat() sets nvram_valid = true, but it does not set
663  skiboot_part_hdr. Call nvram_validate() instead, which sets
664  everything up properly.
665- dts: Zero struct to avoid using uninitialised value
666- hw/imc: Don't dereference possible NULL
667- libstb/create-container: munmap() signature file address
668- npu2-opencapi: Fix memory leak
669- npu2: Fix possible NULL dereference
670- occ-sensors: Remove NULL checks after dereference
671- core/ipmi-opal: Add interrupt-parent property for ipmi node on P9 and above.
672
673  dtc complains below warning with newer 4.2+ kernels. ::
674
675    dts: Warning (interrupts_property): Missing interrupt-parent for /ibm,opal/ipmi
676
677  This fix adds interrupt-parent property under /ibm,opal/ipmi DT node on P9
678  and above, which allows ipmi-opal to properly use the OPAL irqchip.
679
680Other fixes and improvements
681----------------------------
682
683- core/cpu: discover stack region size before initialising memory regions
684
685  Stack allocation first allocates a memory region sized to hold stacks
686  for all possible CPUs up to the maximum PIR of the architecture, zeros
687  the region, then initialises all stacks. Max PIR is 32768 on POWER9,
688  which is 512MB for stacks.
689
690  The stack region is then shrunk after CPUs are discovered, but this is
691  a bit of a hack, and it leaves a hole in the memory allocation regions
692  as it's done after mem regions are initialised. ::
693
694      0x000000000000..00002fffffff : ibm,os-reserve - OS
695      0x000030000000..0000303fffff : ibm,firmware-code - OPAL
696      0x000030400000..000030ffffff : ibm,firmware-heap - OPAL
697      0x000031000000..000031bfffff : ibm,firmware-data - OPAL
698      0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL
699      *** gap ***
700      0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL
701      0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS
702      0x000080000000..000080b3cdff : initramfs - OPAL
703      0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL
704      0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS
705
706  This change moves zeroing into the per-cpu stack setup. The boot CPU
707  stack is set up based on the current PIR. Then the size of the stack
708  region is set, by discovering the maximum PIR of the system from the
709  device tree, before mem regions are intialised.
710
711  This results in all memory being accounted within memory regions,
712  and less memory fragmentation of OPAL allocations.
713- Make gard display show that a record is cleared
714
715  When clearing gard records, Hostboot only modifies the record_id
716  portion to be 0xFFFFFFFF.  The remainder of the entry remains.
717  Without this change it can be confusing to users to know that
718  the record they are looking at is no longer valid.
719- Reserve OPAL API number for opal_handle_hmi2 function.
720- dts: spl_wakeup: Remove all workarounds in the spl wakeup logic
721
722  We coded few workarounds in special wakeup logic to handle the
723  buggy firmware. Now that is fixed remove them as they break the
724  special wakeup protocol. As per the spec we should not de-assert
725  beofre assert is complete. So follow this protocol.
726- build: use thin archives rather than incremental linking
727
728  This changes to build system to use thin archives rather than
729  incremental linking for built-in.o, similar to recent change to Linux.
730  built-in.o is renamed to built-in.a, and is created as a thin archive
731  with no index, for speed and size. All built-in.a are aggregated into
732  a skiboot.tmp.a which is a thin archive built with an index, making it
733  suitable or linking. This is input into the final link.
734
735  The advantags of build size and linker code placement flexibility are
736  not as great with skiboot as a bigger project like Linux, but it's a
737  conceptually better way to build, and is more compatible with link
738  time optimisation in toolchains which might be interesting for skiboot
739  particularly for size reductions.
740
741  Size of build tree before this patch is 34.4MB, afterwards 23.1MB.
742- core/init: Assert when kernel not found
743
744  If the kernel doesn't load out of flash or there is nothing at
745  KERNEL_LOAD_BASE, we end up with an esoteric message as we try to
746  branch to out of skiboot into nothing ::
747
748      [    0.007197688,3] INIT: ELF header not found. Assuming raw binary.
749      [    0.014035267,5] INIT: Starting kernel at 0x0, fdt at 0x3044ad90 13029
750      [    0.014042254,3] ***********************************************
751      [    0.014069947,3] Fatal Exception 0xe40 at 0000000000000000
752      [    0.014085574,3] CFAR : 00000000300051c4
753      [    0.014090118,3] SRR0 : 0000000000000000 SRR1 : 0000000000000000
754      [    0.014096243,3] HSRR0: 0000000000000000 HSRR1: 9000000000001000
755      [    0.014102546,3] DSISR: 00000000         DAR  : 0000000000000000
756      [    0.014108538,3] LR   : 00000000300144c8 CTR  : 0000000000000000
757      [    0.014114756,3] CR   : 40002202         XER  : 00000000
758      [    0.014120301,3] GPR00: 000000003001447c GPR16: 0000000000000000
759
760  This improves the message and asserts in this case: ::
761
762    [    0.014042685,5] INIT: Starting kernel at 0x0, fdt at 0x3044ad90 13049 bytes)
763    [    0.014049556,0] FATAL: Kernel is zeros, can't execute!
764    [    0.014054237,0] Assert fail: core/init.c:566:0
765    [    0.014060472,0] Aborting!
766- core: Fix 'opal-runtime-size' property
767
768  We are populating 'opal-runtime-size' before calculating actual stack size.
769  Hence we endup having wrong runtime size (ex: on P9 it shows ~540MB while
770  actual size is around ~40MB). Note that only device tree property is shows
771  wrong value, but reserved-memory reflects correct size.
772
773  init_all_cpus() calculates and updates actual stack size. Hence move this
774  function call before add_opal_node().
775
776- mambo: Add fw-feature flags for security related settings
777
778  Newer firmwares report some feature flags related to security
779  settings via HDAT. On real hardware skiboot translates these into
780  device tree properties. For testing purposes just create the
781  properties manually in the tcl.
782
783  These values don't exactly match any actual chip revision, but the
784  code should not rely on any exact set of values anyway. We just define
785  the most interesting flags, that if toggled to "disable" will change
786  Linux behaviour. You can see the actual values in the hostboot source
787  in src/usr/hdat/hdatiplparms.H.
788
789  Also add an environment variable for easily toggling the top-level
790  "security on" setting.
791- direct-controls: mambo fix for multiple chips
792- libflash/blocklevel: Correct miscalculation in blocklevel_smart_erase()
793
794  If blocklevel_smart_erase() detects that the smart erase fits entire in
795  one erase block, it has an early bail path. In this path it miscaculates
796  where in the buffer the backend needs to read from to perform the final
797  write.
798- libstb/secureboot: Fix logging of secure verify messages.
799
800  Currently we are logging secure verify/enforce messages in PR_EMERG
801  level even when there is no secureboot mode enabled. So reduce the
802  log level to PR_ERR when secureboot mode is OFF.
803
804Testing / Code coverage improvements
805------------------------------------
806
807Improvements in gcov support include support for newer GCCs as well
808as easily exporting the area of memory you need to dump to feed to
809`extract-gcov`.
810
811- cpu_idle_job: relax a bit
812
813  This *dramatically* improves kernel boot time with GCOV builds
814
815  from ~3minutes between loading kernel and switching the HILE
816  bit down to around 10 seconds.
817- gcov: Another GCC, another gcov tweak
818- Keep constructors with priorities
819
820  Fixes GCOV builds with gcc7, which uses this.
821- gcov: Add gcov data struct to sysfs
822
823  Extracting the skiboot gcov data is currently a tedious process which
824  involves taking a mem dump of skiboot and searching for the gcov_info
825  struct.
826  This patch adds the gcov struct to sysfs under /opal/exports. Allowing the
827  data to be copied directly into userspace and processed.
828
829