1.. _skiboot-5.11: 2 3skiboot-5.11 4============ 5 6skiboot v5.11 was released on Friday April 6th 2018. It is the first 7release of skiboot 5.11, which is now the new stable release 8of skiboot following the 5.10 release, first released February 23rd 2018. 9 10It is *not* expected to keep the 5.11 branch around for long, and instead 11quickly move onto a 6.0, which will mark the basis for op-build v2.0 and 12will be required for POWER9 systems. 13 14It is expected that skiboot 6.0 will follow very shortly. Consider 5.11 15more of a beta release to 6.0 than anything. For POWER9 systems it should 16certainly be more solid than previous releases though. 17 18skiboot v5.11 contains all bug fixes as of :ref:`skiboot-5.10.4` 19and :ref:`skiboot-5.4.9` (the currently maintained stable releases). There 20may be more 5.10.x stable releases, it will depend on demand. 21 22For how the skiboot stable releases work, see :ref:`stable-rules` for details. 23 24Over skiboot-5.10, we have the following changes: 25 26New Platforms 27------------- 28 29- Add VESNIN platform support 30 31 The Vesnin platform from YADRO is a 4 socked POWER8 system with up to 8TB 32 of memory with 460GB/s of memory bandwidth in only 2U. Many kudos to the 33 team from Yadro for submitting their code upstream! 34 35New Features 36------------ 37 38- fast-reboot: enable by default for POWER9 39 40 - Fast reboot is disabled if NPU2 is present or CAPI2/OpenCAPI is used 41 42- PCI tunneled operations on PHB4 43 44 - phb4: set PBCQ Tunnel BAR for tunneled operations 45 46 P9 supports PCI tunneled operations (atomics and as_notify) that are 47 initiated by devices. 48 49 A subset of the tunneled operations require a response, that must be 50 sent back from the host to the device. For example, an atomic compare 51 and swap will return the compare status, as swap will only performed 52 in case of success. Similarly, as_notify reports if the target thread 53 has been woken up or not, because the operation may fail. 54 55 To enable tunneled operations, a device driver must tell the host where 56 it expects tunneled operation responses, by setting the PBCQ Tunnel BAR 57 Response register with a specific value within the range of its BARs. 58 59 This register is currently initialized by enable_capi_mode(). But, as 60 tunneled operations may also operate in PCI mode, a new API is required 61 to set the PBCQ Tunnel BAR Response register, without switching to CAPI 62 mode. 63 64 This patch provides two new OPAL calls to get/set the PBCQ Tunnel 65 BAR Response register. 66 67 Note: as there is only one PBCQ Tunnel BAR register, shared between 68 all the devices connected to the same PHB, only one of these devices 69 will be able to use tunneled operations, at any time. 70 - phb4: set PHB CMPM registers for tunneled operations 71 72 P9 supports PCI tunneled operations (atomics and as_notify) that require 73 setting the PHB ASN Compare/Mask register with a 16-bit indication. 74 75 This register is currently initialized by enable_capi_mode(). But, as 76 tunneled operations may also work in PCI mode, the ASN Compare/Mask 77 register should rather be initialized in phb4_init_ioda3(). 78 79 This patch also adds "ibm,phb-indications" to the device tree, to tell 80 Linux the values of CAPI, ASN, and NBW indications, when supported. 81 82 Tunneled operations tested by IBM in CAPI mode, by Mellanox Technologies 83 in PCI mode. 84 85- Tie tm-suspend fw-feature and opal_reinit_cpus() together 86 87 Currently opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) 88 always returns OPAL_UNSUPPORTED. 89 90 This ties the tm suspend fw-feature to the 91 opal_reinit_cpus(OPAL_REINIT_CPUS_TM_SUSPEND_DISABLED) so that when tm 92 suspend is disabled, we correctly report it to the kernel. For 93 backwards compatibility, it's assumed tm suspend is available if the 94 fw-feature is not present. 95 96 Currently hostboot will clear fw-feature(TM_SUSPEND_ENABLED) on P9N 97 DD2.1. P9N DD2.2 will set fw-feature(TM_SUSPEND_ENABLED). DD2.0 and 98 below has TM disabled completely (not just suspend). 99 100 We are using opal_reinit_cpus() to determine this setting (rather than 101 the device tree/HDAT) as some future firmware may let us change this 102 dynamically after boot. That is not the case currently though. 103 104Power Management 105---------------- 106 107- SLW: Increase stop4-5 residency by 10x 108 109 Using DGEMM benchmark we observed there was a drop of 5-9% throughput with 110 and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup 111 and provide the subsequent data block to compute. The wakup latency 112 accumulates over the run and shows up as a performance drop. 113 114 Linux enters stop4/5 more aggressively for its wakeup latency. Increasing 115 the residency from 1ms to 10ms makes the performance drop <1% 116- occ: Set up OCC messaging even if we fail to setup pstates 117 118 This means that we no longer hit this bug if we fail to get valid pstates 119 from the OCC. :: 120 121 [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear 122 echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear 123 [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 124 [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 125 [ 10.318805] Disabling lock debugging due to kernel taint 126 [ 10.318808] Severe Machine check interrupt [Not recovered] 127 [ 10.318812] NIP [000000003003e434]: 0x3003e434 128 [ 10.318813] Initiator: CPU 129 [ 10.318815] Error type: Real address [Load/Store (foreign)] 130 [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception 131 [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 132 [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 133 [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) 134 [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 135 [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 136 137 138mbox based platforms 139^^^^^^^^^^^^^^^^^^^^ 140 141For platforms using the mbox protocol for host flash access (all BMC based 142OpenPOWER systems, most OpenBMC based systems) there have been some hardening 143efforts in the event of the BMC being poorly behaved. 144 145- mbox: Reduce default BMC timeouts 146 147 Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin for 148 70 seconds waiting for a BMC to come back. This also makes the current 149 default of 30 seconds a bit pointless, is it far too short to be a 150 worse case wait time but too long to avoid hitting hardlockup detectors 151 and wrecking havoc inside host linux. 152 153 Just change it to three seconds so that host linux will survive and 154 that, reads and writes will fail but at least the host stays up. 155 156 Also refactored the waiting loop just a bit so that it's easier to read. 157- mbox: Harden against BMC daemon errors 158 159 Bugs present in the BMC daemon mean that skiboot gets presented with 160 mbox windows of size zero. These windows cannot be valid and skiboot 161 already detects these conditions. 162 163 Currently skiboot warns quite strongly about the occurrence of these 164 problems. The problem for skiboot is that it doesn't take any action. 165 Initially I wanting to avoid putting policy like this into skiboot but 166 since these bugs aren't going away and skiboot barfing is leading to 167 lockups and ultimately the host going down something needs to be done. 168 169 I propose that when we detect the problem we fail the mbox call and punt 170 the problem back up to Linux. I don't like it but at least it will cause 171 errors to cascade and won't bring the host down. I'm not sure how Linux 172 is supposed to detect this or what it can even do but this is better 173 than a crash. 174 175 Diagnosing a failure to boot if skiboot its self fails to read flash may 176 be marginally more difficult with this patch. This is because skiboot 177 will now only print one warning about the zero sized window rather than 178 continuously spitting it out. 179 180Fast Reboot Improvements 181------------------------ 182 183Around fast-reboot we have made several improvements to harden the fast 184reboot code paths and resort to a full IPL if something doesn't look right. 185 186- core/fast-reboot: zero memory after fast reboot 187 188 This improves the security and predictability of the fast reboot 189 environment. 190 191 There can not be a secure fence between fast reboots, because a 192 malicious OS can modify the firmware itself. However a well-behaved 193 OS can have a reasonable expectation that OS memory regions it has 194 modified will be cleared upon fast reboot. 195 196 The memory is zeroed after all other CPUs come up from fast reboot, 197 just before the new kernel is loaded and booted into. This allows 198 image preloading to run concurrently, and will allow parallelisation 199 of the clearing in future. 200- core/fast-reboot: verify mem regions before fast reboot 201 202 Run the mem_region sanity checkers before proceeding with fast 203 reboot. 204 205 This is the beginning of proactive sanity checks on opal data 206 for fast reboot (with complements the reactive disable_fast_reboot 207 cases). This is encouraged to re-use and share any kind of debug 208 code and unit test code. 209- fast-reboot: occ: Only delete /ibm, opal/power-mgt nodes if they exist 210- core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors 211 212 This disables fast reboot in several more cases where serious errors 213 like lock corruption or call re-entrancy are detected. 214- capp: Disable fast-reboot whenever enable_capi_mode() is called 215 216 This patch updates phb4_set_capi_mode() to disable fast-reboot 217 whenever enable_capi_mode() is called, irrespective to its return 218 value. This should prevent against a possibility of not disabling 219 fast-reboot when some changes to enable_capi_mode() causing return of 220 an error and leaving CAPP in enabled mode. 221- fast-reboot: occ: Delete OCC child nodes in /ibm, opal/power-mgt 222 223 Fast-reboot in P8 fails to re-init OCC data as there are chipwise OCC 224 nodes which are already present in the /ibm,opal/power-mgt node. These 225 per-chip nodes hold the voltage IDs for each pstate and these can be 226 changed on OCC pstate table biasing. So delete these before calling 227 the re-init code to re-parse and populate the pstate data. 228 229Debugging/SRESET improvemens 230---------------------------- 231 232Since :ref:`skiboot-5.11-rc1`: 233 234- core/cpu: Prevent clobbering of stack guard for boot-cpu 235 236 Commit 90d53934c2da ("core/cpu: discover stack region size before 237 initialising memory regions") introduced memzero for struct cpu_thread 238 in init_cpu_thread(). This has an unintended side effect of clobbering 239 the stack-guard cannery of the boot_cpu stack. This results in opal 240 failing to init with this failure message: :: 241 242 CPU: P9 generation processor (max 4 threads/core) 243 CPU: Boot CPU PIR is 0x0004 PVR is 0x004e1200 244 Guard skip = 0 245 Stack corruption detected ! 246 Aborting! 247 CPU 0004 Backtrace: 248 S: 0000000031c13ab0 R: 0000000030013b0c .backtrace+0x5c 249 S: 0000000031c13b50 R: 000000003001bd18 ._abort+0x60 250 S: 0000000031c13be0 R: 0000000030013bbc .__stack_chk_fail+0x54 251 S: 0000000031c13c60 R: 00000000300c5b70 .memset+0x12c 252 S: 0000000031c13d00 R: 0000000030019aa8 .init_cpu_thread+0x40 253 S: 0000000031c13d90 R: 000000003001b520 .init_boot_cpu+0x188 254 S: 0000000031c13e30 R: 0000000030015050 .main_cpu_entry+0xd0 255 S: 0000000031c13f00 R: 0000000030002700 boot_entry+0x1c0 256 257 So the patch provides a fix by tweaking the memset() call in 258 init_cpu_thread() to skip over the stack-guard cannery. 259- core/lock.c: ensure valid start value for lock spin duration warning 260 261 The previous fix in a8e6cc3f4 only addressed half of the problem, as 262 we could also get an invalid value for start, causing us to fail 263 in a weird way. 264 265 This was caught by the testcases.OpTestHMIHandling.HMI_TFMR_ERRORS 266 test in op-test-framework. 267 268 You'd get to this part of the test and get the erroneous lock 269 spinning warnings: :: 270 271 PATH=/usr/local/sbin:$PATH putscom -c 00000000 0x2b010a84 0003080000000000 272 0000080000000000 273 [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms 274 [ 790.140976993,4] WARNING: Lock has been spinning for 790275ms 275 [ 790.140976918,4] WARNING: Lock has been spinning for 790275ms 276 277 This patch checks the validity of timebase before setting start, 278 and only checks the lock timeout if we got a valid start value. 279 280 281Since :ref:`skiboot-5.10`: 282 283- core/opal: allow some re-entrant calls 284 285 This allows a small number of OPAL calls to succeed despite re-entering 286 the firmware, and rejects others rather than aborting. 287 288 This allows a system reset interrupt that interrupts OPAL to do something 289 useful. Sreset other CPUs, use the console, which allows xmon to work or 290 stack traces to be printed, reboot the system. 291 292 Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is 293 used for many other things that does not mean a serious permanent error. 294- core/opal: abort in case of re-entrant OPAL call 295 296 The stack is already destroyed by the time we get here, so there 297 is not much point continuing. 298- core/lock: Add lock timeout warnings 299 300 There are currently no timeout warnings for locks in skiboot. We assume 301 that the lock will eventually become free, which may not always be the 302 case. 303 304 This patch adds timeout warnings for locks. Any lock which spins for more 305 than 5 seconds will throw a warning and stacktrace for that thread. This is 306 useful for debugging siturations where a lock which hang, waiting for the 307 lock to be freed. 308- core/lock: Add deadlock detection 309 310 This adds simple deadlock detection. The detection looks for circular 311 dependencies in the lock requests. It will abort and display a stack trace 312 when a deadlock occurs. 313 The detection is enabled by DEBUG_LOCKS (enabled by default). 314 While the detection may have a slight performance overhead, as there are 315 not a huge number of locks in skiboot this overhead isn't significant. 316- core/hmi: report processor recovery reason from core FIR bits on P9 317 318 When an error is encountered that causes processor recovery, HMI is 319 generated if the recovery was successful. The reason is recorded in 320 the core FIR, which gets copied into the WOF. 321 322 In this case dump the WOF register and an error string into the OPAL 323 msglog. 324 325 A broken init setting led to HMIs reported in Linux as: :: 326 327 [ 3.591547] Harmless Hypervisor Maintenance interrupt [Recovered] 328 [ 3.591648] Error detail: Processor Recovery done 329 [ 3.591714] HMER: 2040000000000000 330 331 This patch would have been useful because it tells us exactly that 332 the problem is in the d-side ERAT: :: 333 334 [ 414.489690798,7] HMI: Received HMI interrupt: HMER = 0x2040000000000000 335 [ 414.489693339,7] HMI: [Loc: UOPWR.0000000-Node0-Proc0]: P:0 C:1 T:1: Processor recovery occurred. 336 [ 414.489699837,7] HMI: Core WOF = 0x0000000410000000 recovered error: 337 [ 414.489701543,7] HMI: LSU - SRAM (DCACHE parity, etc) 338 [ 414.489702341,7] HMI: LSU - ERAT multi hit 339 340 In future it will be good to unify this reporting, so Linux could 341 print something more useful. Until then, this gives some good data. 342 343NPU2/NVLink2 Fixes 344------------------ 345- npu2: Add performance tuning SCOM inits 346 347 Peer-to-peer GPU bandwidth latency testing has produced some tunable 348 values that improve performance. Add them to our device initialization. 349 350 File these under things that need to be cleaned up with nice #defines 351 for the register names and bitfields when we get time. 352 353 A few of the settings are dependent on the system's particular NVLink 354 topology, so introduce a helper to determine how many links go to a 355 single GPU. 356- hw/npu2: Assign a unique LPARSHORTID per GPU 357 358 This gets used elsewhere to index items in the XTS tables. 359- NPU2: dump NPU2 registers on npu2 HMI 360 361 Due to the nature of debugging npu2 issues, folk are wanting the 362 full list of NPU2 registers dumped when there's a problem. 363- npu2: Remove DD1 support 364 365 Major changes in the NPU between DD1 and DD2 necessitated a fair bit of 366 revision-specific code. 367 368 Now that all our lab machines are DD2, we no longer test anything on DD1 369 and it's time to get rid of it. 370 371 Remove DD1-specific code and abort probe if we're running on a DD1 machine. 372- npu2: Disable fast reboot 373 374 Fast reboot does not yet work right with the NPU. It's been disabled on 375 NVLink and OpenCAPI machines. Do the same for NVLink2. 376 377 This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset") 378 from the npu code to npu2. 379- npu2: Use unfiltered mode in XTS tables 380 381 The XTS_PID context table is limited to 256 possible pids/contexts. To 382 relieve this limitation, make use of "unfiltered mode" instead. 383 384 If an entry in the XTS_BDF table has the bit for unfiltered mode set, we 385 can just use one context for that entire bdf/lpar, regardless of pid. 386 Instead of of searching the XTS_PID table, the NMMU checkout request 387 will simply use the entry indexed by lparshort id instead. 388 389 Change opal_npu_init_context() to create these lparshort-indexed 390 wildcard entries (0-15) instead of allocating one for each pid. Check 391 that multiple calls for the same bdf all specify the same msr value. 392 393 In opal_npu_destroy_context(), continue validating the bdf argument, 394 ensuring that it actually maps to an lpar, but no longer remove anything 395 from the XTS_PID table. If/when we start supporting virtualized GPUs, we 396 might consider actually removing these wildcard entries by keeping a 397 refcount, but keep things simple for now. 398 399CAPI/OpenCAPI 400------------- 401 402Since :ref:`skiboot-5.11-rc1`: 403 404- capi: Poll Err/Status register during CAPP recovery 405 406 This patch updates do_capp_recovery_scoms() to poll the CAPP 407 Err/Status control register, check for CAPP-Recovery to complete/fail 408 based on indications of BITS-1,5,9 and then proceed with the 409 CAPP-Recovery scoms iif recovery completed successfully. This would 410 prevent cases where we bring-up the PCIe link while recovery sequencer 411 on CAPP is still busy with casting out cache lines. 412 413 In case CAPP-Recovery didn't complete successfully an error is returned 414 from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4 415 fenced and mark it as broken. 416 417 The loop that implements polling of Err/Status register will also log 418 an error on the PHB when it continues for more than 168ms which is the 419 max time to failure for CAPP-Recovery. 420 421Since :ref:`skiboot-5.10`: 422 423- npu2-opencapi: Add OpenCAPI OPAL API calls 424 425 Add three OPAL API calls that are required by the ocxl driver. 426 427 - OPAL_NPU_SPA_SETUP 428 429 The Shared Process Area (SPA) is a table containing one entry (a 430 "Process Element") per memory context which can be accessed by the 431 OpenCAPI device. 432 433 - OPAL_NPU_SPA_CLEAR_CACHE 434 435 The NPU keeps a cache of recently accessed memory contexts. When a 436 Process Element is removed from the SPA, the cache for the link must be 437 cleared. 438 439 - OPAL_NPU_TL_SET 440 441 The Transaction Layer specification defines several templates for 442 messages to be exchanged on the link. During link setup, the host and 443 device must negotiate what templates are supported on both sides and at 444 what rates those messages can be sent. 445- npu2-opencapi: Train OpenCAPI links and setup devices 446 447 Scan the OpenCAPI links under the NPU, and for each link, reset the card, 448 set up a device, train the link and register a PHB. 449 450 Implement the necessary operations for the OpenCAPI PHB type. 451 452 For bringup, test and debug purposes, we allow an NVRAM setting, 453 "opencapi-link-training" that can be set to either disable link training 454 completely or to use the prbs31 test pattern. 455 456 To disable link training: :: 457 458 nvram -p ibm,skiboot --update-config opencapi-link-training=none 459 460 To use prbs31: :: 461 462 nvram -p ibm,skiboot --update-config opencapi-link-training=prbs31 463- npu2-hw-procedures: Add support for OpenCAPI PHY link training 464 465 Unlike NVLink, which uses the pci-virt framework to fake a PCI 466 configuration space for NVLink devices, the OpenCAPI device model presents 467 us with a real configuration space handled by the device over the OpenCAPI 468 link. 469 470 As a result, we have to train the OpenCAPI link in skiboot before we do PCI 471 probing, so that config space can be accessed, rather than having link 472 training being triggered by the Linux driver. 473- npu2-opencapi: Configure NPU for OpenCAPI 474 475 Scan the device tree for NPUs with OpenCAPI links and configure the NPU per 476 the initialisation sequence in the NPU OpenCAPI workbook. 477- capp: Make error in capp timebase sync a non-fatal error 478 479 Presently when we encounter an error while synchronizing capp timebase 480 with chip-tod at the end of enable_capi_mode() we return an 481 error. This has an to unintended consequences. First this will prevent 482 disabling of fast-reboot even though CAPP is already enabled by this 483 point. Secondly, failure during timebase sync is a non fatal error or 484 capp initialization as CAPP/PSL can continue working after this and an 485 AFU will only see an error when it tries to read the timebase value 486 from PSL. 487 488 So this patch updates enable_capi_mode() to not return an error in 489 case call to chiptod_capp_timebase_sync() fails. The function will now 490 just log an error and continue further with capp init sequence. This 491 make the current implementation align with the one in kernel 'cxl' 492 driver which also assumes the PSL timebase sync errors as non-fatal 493 init error. 494- npu2-opencapi: Fix assert on link reset during init 495 496 We don't support resetting an opencapi link yet. 497 498 Commit fe6d86b9 ("pci: Make fast reboot creset PHBs in parallel") 499 tries resetting any PHB whose slot defines a 'run_sm' callback. It 500 raises an assert when applied to an opencapi PHB, as 'run_sm' calls 501 the 'freset' callback, which is not yet defined for opencapi. 502 503 Fix it for now by removing the currently useless definition of 504 'run_sm' on the opencapi slot. It will print a message in the skiboot 505 log because the PHB cannot be reset, which is correct. It will all go 506 away when we add support for resetting an opencapi link. 507- capp: Add lid definition for P9 DD-2.2 508 509 Update fsp_lid_map to include CAPP ucode lid for phb4-chipid == 510 0x202d1 that corresponds to P9 DD-2.2 chip. 511- capp: Disable fast-reboot when capp is enabled 512 513 514PCI 515--- 516 517Since :ref:`skiboot-5.11-rc1`: 518 519- phb4: Reset FIR/NFIR registers before PHB4 probe 520 521 The function phb4_probe_stack() resets "ETU Reset Register" to 522 unfreeze the PHB before it performs mmio access on the PHB. However in 523 case the FIR/NFIR registers are set while entering this function, 524 the reset of "ETU Reset Register" wont unfreeze the PHB and it will 525 remain fenced. This leads to failure during initial CRESET of the PHB 526 as mmio access is still not enabled and an error message of the form 527 below is logged: :: 528 529 PHB#0000[0:0]: Initializing PHB4... 530 PHB#0000[0:0]: Default system config: 0xffffffffffffffff 531 PHB#0000[0:0]: New system config : 0xffffffffffffffff 532 PHB#0000[0:0]: Initial PHB CRESET is 0xffffffffffffffff 533 PHB#0000[0:0]: Waiting for DLP PG reset to complete... 534 <snip> 535 PHB#0000[0:0]: Timeout waiting for DLP PG reset ! 536 PHB#0000[0:0]: Initialization failed 537 538 This is especially seen happening during the MPIPL flow where SBE 539 would quiesces and fence the PHB so that it doesn't stomp on the main 540 memory. However when skiboot enters phb4_probe_stack() after MPIPL, 541 the FIR/NFIR registers are set forcing PHB to re-enter fence after ETU 542 reset is done. 543 544 So to fix this issue the patch introduces new xscom writes to 545 phb4_probe_stack() to reset the FIR/NFIR registers before performing 546 ETU reset to enable mmio access to the PHB. 547 548Since :ref:`skiboot-5.10`: 549 550- pci: Reduce log level of error message 551 552 If a link doesn't train, we can end up with error messages like this: :: 553 554 [ 63.027261959,3] PHB#0032[8:2]: LINK: Timeout waiting for electrical link 555 [ 63.027265573,3] PHB#0032:00:00.0 Error -6 resetting 556 557 The first message is useful but the second message is just debug from 558 the core PCI code and is confusing to print to the console. 559 560 This reduces the second print to debug level so it's not seen by the 561 console by default. 562- Revert "platforms/astbmc/slots.c: Allow comparison of bus numbers when matching slots" 563 564 This reverts commit bda7cc4d0354eb3f66629d410b2afc08c79f795f. 565 566 Ben says: 567 It's on purpose that we do NOT compare the bus numbers, 568 they are always 0 in the slot table 569 we do a hierarchical walk of the tree, matching only the 570 devfn's along the way bcs the bus numbering isn't fixed 571 this breaks all slot naming etc... stuff on anything using 572 the "skiboot" slot tables (P8 opp typically) 573- core/pci-dt-slot: Fix booting with no slot map 574 575 Currently if you don't have a slot map in the device tree in 576 /ibm,pcie-slots, you can crash with a back trace like this: :: 577 578 CPU 0034 Backtrace: 579 S: 0000000031cd3370 R: 000000003001362c .backtrace+0x48 580 S: 0000000031cd3410 R: 0000000030019e38 ._abort+0x4c 581 S: 0000000031cd3490 R: 000000003002760c .exception_entry+0x180 582 S: 0000000031cd3670 R: 0000000000001f10 * 583 S: 0000000031cd3850 R: 00000000300b4f3e * cpu_features_table+0x1d9e 584 S: 0000000031cd38e0 R: 000000003002682c .dt_node_is_compatible+0x20 585 S: 0000000031cd3960 R: 0000000030030e08 .map_pci_dev_to_slot+0x16c 586 S: 0000000031cd3a30 R: 0000000030091054 .dt_slot_get_slot_info+0x28 587 S: 0000000031cd3ac0 R: 000000003001e27c .pci_scan_one+0x2ac 588 S: 0000000031cd3ba0 R: 000000003001e588 .pci_scan_bus+0x70 589 S: 0000000031cd3cb0 R: 000000003001ee74 .pci_scan_phb+0x100 590 S: 0000000031cd3d40 R: 0000000030017ff0 .cpu_process_jobs+0xdc 591 S: 0000000031cd3e00 R: 0000000030014cb0 .__secondary_cpu_entry+0x44 592 S: 0000000031cd3e80 R: 0000000030014d04 .secondary_cpu_entry+0x34 593 S: 0000000031cd3f00 R: 0000000030002770 secondary_wait+0x8c 594 [ 73.016947149,3] Fatal MCE at 0000000030026054 .dt_find_property+0x30 595 [ 73.017073254,3] CFAR : 0000000030026040 596 [ 73.017138048,3] SRR0 : 0000000030026054 SRR1 : 9000000000201000 597 [ 73.017198375,3] HSRR0: 0000000000000000 HSRR1: 0000000000000000 598 [ 73.017263210,3] DSISR: 00000008 DAR : 7c7b1b7848002524 599 [ 73.017352517,3] LR : 000000003002602c CTR : 000000003009102c 600 [ 73.017419778,3] CR : 20004204 XER : 20040000 601 [ 73.017502425,3] GPR00: 000000003002682c GPR16: 0000000000000000 602 [ 73.017586924,3] GPR01: 0000000031c23670 GPR17: 0000000000000000 603 [ 73.017643873,3] GPR02: 00000000300fd500 GPR18: 0000000000000000 604 [ 73.017767091,3] GPR03: fffffffffffffff8 GPR19: 0000000000000000 605 [ 73.017855707,3] GPR04: 00000000300b3dc6 GPR20: 0000000000000000 606 [ 73.017943944,3] GPR05: 0000000000000000 GPR21: 00000000300bb6d2 607 [ 73.018024709,3] GPR06: 0000000031c23910 GPR22: 0000000000000000 608 [ 73.018117716,3] GPR07: 0000000031c23930 GPR23: 0000000000000000 609 [ 73.018195974,3] GPR08: 0000000000000000 GPR24: 0000000000000000 610 [ 73.018278350,3] GPR09: 0000000000000000 GPR25: 0000000000000000 611 [ 73.018353795,3] GPR10: 0000000000000028 GPR26: 00000000300be6fb 612 [ 73.018424362,3] GPR11: 0000000000000000 GPR27: 0000000000000000 613 [ 73.018533159,3] GPR12: 0000000020004208 GPR28: 0000000030767d38 614 [ 73.018642725,3] GPR13: 0000000031c20000 GPR29: 00000000300b3dc6 615 [ 73.018737925,3] GPR14: 0000000000000000 GPR30: 0000000000000010 616 [ 73.018794428,3] GPR15: 0000000000000000 GPR31: 7c7b1b7848002514 617 618 This has been seen in the lab on a witherspoon using the device tree 619 entry point (ie. no HDAT). 620 621 This fixes the null pointer deref. 622 623Bugs Fixed 624---------- 625Since :ref:`skiboot-5.11-rc1`: 626 627- cpufeatures: Fix setting DARN and SCV HWCAP feature bits 628 629 DARN and SCV has been assigned AT_HWCAP2 (32-63) bits: :: 630 631 #define PPC_FEATURE2_DARN 0x00200000 /* darn random number insn */ 632 #define PPC_FEATURE2_SCV 0x00100000 /* scv syscall */ 633 634 A cpufeatures-aware OS will not advertise these to userspace without 635 this patch. 636- xive: disable store EOI support 637 638 Hardware has limitations which would require to put a sync after each 639 store EOI to make sure the MMIO operations that change the ESB state 640 are ordered. This is a killer for performance and the PHBs do not 641 support the sync. So remove the store EOI for the moment, until 642 hardware is improved. 643 644 Also, while we are at changing the XIVE source flags, let's fix the 645 settings for the PHB4s which should follow these rules : 646 647 - SHIFT_BUG for DD10 648 - STORE_EOI for DD20 and if enabled 649 - TRIGGER_PAGE for DDx0 and if not STORE_EOI 650 651Since :ref:`skiboot-5.10`: 652 653- xive: fix opal_xive_set_vp_info() error path 654 655 In case of error, opal_xive_set_vp_info() will return without 656 unlocking the xive object. This is most certainly a typo. 657- hw/imc: don't access homer memory if it was not initialised 658 659 This can happen under mambo, at least. 660- nvram: run nvram_validate() after nvram_reformat() 661 662 nvram_reformat() sets nvram_valid = true, but it does not set 663 skiboot_part_hdr. Call nvram_validate() instead, which sets 664 everything up properly. 665- dts: Zero struct to avoid using uninitialised value 666- hw/imc: Don't dereference possible NULL 667- libstb/create-container: munmap() signature file address 668- npu2-opencapi: Fix memory leak 669- npu2: Fix possible NULL dereference 670- occ-sensors: Remove NULL checks after dereference 671- core/ipmi-opal: Add interrupt-parent property for ipmi node on P9 and above. 672 673 dtc complains below warning with newer 4.2+ kernels. :: 674 675 dts: Warning (interrupts_property): Missing interrupt-parent for /ibm,opal/ipmi 676 677 This fix adds interrupt-parent property under /ibm,opal/ipmi DT node on P9 678 and above, which allows ipmi-opal to properly use the OPAL irqchip. 679 680Other fixes and improvements 681---------------------------- 682 683- core/cpu: discover stack region size before initialising memory regions 684 685 Stack allocation first allocates a memory region sized to hold stacks 686 for all possible CPUs up to the maximum PIR of the architecture, zeros 687 the region, then initialises all stacks. Max PIR is 32768 on POWER9, 688 which is 512MB for stacks. 689 690 The stack region is then shrunk after CPUs are discovered, but this is 691 a bit of a hack, and it leaves a hole in the memory allocation regions 692 as it's done after mem regions are initialised. :: 693 694 0x000000000000..00002fffffff : ibm,os-reserve - OS 695 0x000030000000..0000303fffff : ibm,firmware-code - OPAL 696 0x000030400000..000030ffffff : ibm,firmware-heap - OPAL 697 0x000031000000..000031bfffff : ibm,firmware-data - OPAL 698 0x000031c00000..000031c0ffff : ibm,firmware-stacks - OPAL 699 *** gap *** 700 0x000051c00000..000051d01fff : ibm,firmware-allocs-memory@0 - OPAL 701 0x000051d02000..00007fffffff : ibm,firmware-allocs-memory@0 - OS 702 0x000080000000..000080b3cdff : initramfs - OPAL 703 0x000080b3ce00..000080b7cdff : ibm,fake-nvram - OPAL 704 0x000080b7ce00..0000ffffffff : ibm,firmware-allocs-memory@0 - OS 705 706 This change moves zeroing into the per-cpu stack setup. The boot CPU 707 stack is set up based on the current PIR. Then the size of the stack 708 region is set, by discovering the maximum PIR of the system from the 709 device tree, before mem regions are intialised. 710 711 This results in all memory being accounted within memory regions, 712 and less memory fragmentation of OPAL allocations. 713- Make gard display show that a record is cleared 714 715 When clearing gard records, Hostboot only modifies the record_id 716 portion to be 0xFFFFFFFF. The remainder of the entry remains. 717 Without this change it can be confusing to users to know that 718 the record they are looking at is no longer valid. 719- Reserve OPAL API number for opal_handle_hmi2 function. 720- dts: spl_wakeup: Remove all workarounds in the spl wakeup logic 721 722 We coded few workarounds in special wakeup logic to handle the 723 buggy firmware. Now that is fixed remove them as they break the 724 special wakeup protocol. As per the spec we should not de-assert 725 beofre assert is complete. So follow this protocol. 726- build: use thin archives rather than incremental linking 727 728 This changes to build system to use thin archives rather than 729 incremental linking for built-in.o, similar to recent change to Linux. 730 built-in.o is renamed to built-in.a, and is created as a thin archive 731 with no index, for speed and size. All built-in.a are aggregated into 732 a skiboot.tmp.a which is a thin archive built with an index, making it 733 suitable or linking. This is input into the final link. 734 735 The advantags of build size and linker code placement flexibility are 736 not as great with skiboot as a bigger project like Linux, but it's a 737 conceptually better way to build, and is more compatible with link 738 time optimisation in toolchains which might be interesting for skiboot 739 particularly for size reductions. 740 741 Size of build tree before this patch is 34.4MB, afterwards 23.1MB. 742- core/init: Assert when kernel not found 743 744 If the kernel doesn't load out of flash or there is nothing at 745 KERNEL_LOAD_BASE, we end up with an esoteric message as we try to 746 branch to out of skiboot into nothing :: 747 748 [ 0.007197688,3] INIT: ELF header not found. Assuming raw binary. 749 [ 0.014035267,5] INIT: Starting kernel at 0x0, fdt at 0x3044ad90 13029 750 [ 0.014042254,3] *********************************************** 751 [ 0.014069947,3] Fatal Exception 0xe40 at 0000000000000000 752 [ 0.014085574,3] CFAR : 00000000300051c4 753 [ 0.014090118,3] SRR0 : 0000000000000000 SRR1 : 0000000000000000 754 [ 0.014096243,3] HSRR0: 0000000000000000 HSRR1: 9000000000001000 755 [ 0.014102546,3] DSISR: 00000000 DAR : 0000000000000000 756 [ 0.014108538,3] LR : 00000000300144c8 CTR : 0000000000000000 757 [ 0.014114756,3] CR : 40002202 XER : 00000000 758 [ 0.014120301,3] GPR00: 000000003001447c GPR16: 0000000000000000 759 760 This improves the message and asserts in this case: :: 761 762 [ 0.014042685,5] INIT: Starting kernel at 0x0, fdt at 0x3044ad90 13049 bytes) 763 [ 0.014049556,0] FATAL: Kernel is zeros, can't execute! 764 [ 0.014054237,0] Assert fail: core/init.c:566:0 765 [ 0.014060472,0] Aborting! 766- core: Fix 'opal-runtime-size' property 767 768 We are populating 'opal-runtime-size' before calculating actual stack size. 769 Hence we endup having wrong runtime size (ex: on P9 it shows ~540MB while 770 actual size is around ~40MB). Note that only device tree property is shows 771 wrong value, but reserved-memory reflects correct size. 772 773 init_all_cpus() calculates and updates actual stack size. Hence move this 774 function call before add_opal_node(). 775 776- mambo: Add fw-feature flags for security related settings 777 778 Newer firmwares report some feature flags related to security 779 settings via HDAT. On real hardware skiboot translates these into 780 device tree properties. For testing purposes just create the 781 properties manually in the tcl. 782 783 These values don't exactly match any actual chip revision, but the 784 code should not rely on any exact set of values anyway. We just define 785 the most interesting flags, that if toggled to "disable" will change 786 Linux behaviour. You can see the actual values in the hostboot source 787 in src/usr/hdat/hdatiplparms.H. 788 789 Also add an environment variable for easily toggling the top-level 790 "security on" setting. 791- direct-controls: mambo fix for multiple chips 792- libflash/blocklevel: Correct miscalculation in blocklevel_smart_erase() 793 794 If blocklevel_smart_erase() detects that the smart erase fits entire in 795 one erase block, it has an early bail path. In this path it miscaculates 796 where in the buffer the backend needs to read from to perform the final 797 write. 798- libstb/secureboot: Fix logging of secure verify messages. 799 800 Currently we are logging secure verify/enforce messages in PR_EMERG 801 level even when there is no secureboot mode enabled. So reduce the 802 log level to PR_ERR when secureboot mode is OFF. 803 804Testing / Code coverage improvements 805------------------------------------ 806 807Improvements in gcov support include support for newer GCCs as well 808as easily exporting the area of memory you need to dump to feed to 809`extract-gcov`. 810 811- cpu_idle_job: relax a bit 812 813 This *dramatically* improves kernel boot time with GCOV builds 814 815 from ~3minutes between loading kernel and switching the HILE 816 bit down to around 10 seconds. 817- gcov: Another GCC, another gcov tweak 818- Keep constructors with priorities 819 820 Fixes GCOV builds with gcc7, which uses this. 821- gcov: Add gcov data struct to sysfs 822 823 Extracting the skiboot gcov data is currently a tedious process which 824 involves taking a mem dump of skiboot and searching for the gcov_info 825 struct. 826 This patch adds the gcov struct to sysfs under /opal/exports. Allowing the 827 data to be copied directly into userspace and processed. 828 829