1.. _skiboot-6.0: 2 3skiboot-6.0 4=========== 5 6skiboot v6.0 was released on Friday May 11th 2018. It is the first 7release of skiboot 6.0, which is the new stable release of skiboot 8following the 5.11 release, first released April 6th 2018. 9 10Skiboot 6.0 is the basis for op-build v2.0 and will is *required* for 11POWER9 systems. 12 13skiboot v6.0 contains all bug fixes as of :ref:`skiboot-5.11`, 14:ref:`skiboot-5.10.5`, and :ref:`skiboot-5.4.9` (the currently maintained 15stable releases). We do *not* expect any further stable releases in the 165.10.x series, nor in the 5.11.x series. 17 18For how the skiboot stable releases work, see :ref:`stable-rules` for details. 19 20Over skiboot-5.11, we have the following changes: 21 22 23New Features 24------------ 25 26Since 6.0-rc1: 27 28- Update default stop-state-disable mask to cut only stop11 29 30 Stability improvements in microcode for stop4/stop5 are 31 available in upstream hcode images. Stop4 and stop5 can 32 be safely enabled by default. 33 34 Use ~0xE0000000 to cut all but stop0,1,2 in case there 35 are any issues with stop4/5. 36 37 example: :: 38 39 nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF 40 41 **Note**: that DD2.1 chips that have a frequency <1867Mhz possible *need* to 42 run a hcode image *different* than the default in op-build (set 43 `BR2_HCODE_LATEST_VERSION=y` in your config) 44- ibm,firmware-versions: add hcode to device tree 45 46 op-build commit 736a08b996e292a449c4996edb264011dfe56a40 47 added hcode to the VERSION partition, let's parse it out 48 and let the user know. 49- ipmi: Add BMC firmware version to device tree 50 51 BMC Get device ID command gives BMC firmware version details. Lets add this 52 to device tree. User space tools will use this information to display BMC 53 version details. 54 55Since 5.11: 56 57- Disable stop states from OPAL 58 59 On ZZ, stop4,5,11 are enabled for PowerVM, even though doing 60 so may cause problems with OPAL due to bugs in hcode. 61 62 For other platforms, this isn't so much of an issue as 63 we can just control stop states by the MRW. However the 64 rebuild-the-world approach to changing values there is a bit 65 annoying if you just want to rule out a specific stop state 66 from being problematic. 67 68 Provide an nvram option to override what's disabled in OPAL. 69 70 The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2) 71 72 You can set an NVRAM override with: :: 73 74 nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF 75 76 This nvram override will disable *all* stop states. 77- interrupts: Create an "interrupts" property in the OPAL node 78 79 Deprecate the old "opal-interrupts", it's still there, but the new 80 property follows the standard and allow us to specify whether an 81 interrupt is level or edge sensitive. 82 83 Similarly create "interrupt-names" whose content is identical to 84 "opal-interrupts-names". 85- SBE: Add timer support on POWER9 86 87 SBE on P9 provides one shot programmable timer facility. We can use this 88 to implement OPAL timers and hence limit the reliance on the Linux 89 heartbeat (similar to HW timer facility provided by SLW on P8). 90- Add SBE driver support 91 92 SBE (Self Boot Engine) on P9 has two different jobs: 93 - Boot the chip up to the point the core is functional 94 - Provide various services like timer, scom, stash MPIPL, etc., at runtime 95 96 We will use SBE for various purposes like timer, MPIPL, etc. 97 98- opal:hmi: Add missing processor recovery reason string. 99 100 With this patch now we see reason string printed for CORE_WOF[43] bit. :: 101 102 [ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred. 103 [ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error: 104 [ 477.352242181,7] HMI: PC - Thread hang recovery 105- Add DIMM actual speed to device tree 106 107 Recent HDAT provides DIMM actuall speed. Lets add this to device tree. 108- Fix DIMM size property 109 110 Today we parse vpd blob to get DIMM size information. This is limited 111 to FSP based system. HDAT provides DIMM size value. Lets use that to 112 populate device tree. So that we can get size information on BMC based 113 system as well. 114 115- PCI: Set slot power limit when supported 116 117 The PCIe slot capability can be implemented in a root or switch 118 downstream port to set the maximum power a card is allowed to draw 119 from the system. This patch adds support for setting the power limit 120 when the platform has defined one. 121- hdata/spira: parse vpd to add part-number and serial-number to xscom@ node 122 123 Expected by FWTS and associates our processor with the part/serial 124 number, which is obviously a good thing for one's own sanity. 125 126 127Improved HMI Handling 128^^^^^^^^^^^^^^^^^^^^^ 129 130- opal/hmi: Add documentation for opal_handle_hmi2 call 131- opal/hmi: Generate hmi event for recovered HDEC parity error. 132- opal/hmi: check thread 0 tfmr to validate latched tfmr errors. 133 134 Due to P9 errata, HDEC parity and TB residue errors are latched for 135 non-zero threads 1-3 even if they are cleared. But these are not 136 latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr 137 value and ignore them on non-zero threads if they are not present on 138 thread 0. 139- opal/hmi: Print additional debug information in rendezvous. 140- opal/hmi: Fix handling of TFMR parity/corrupt error. 141 142 While testing TFMR parity/corrupt error it has been observed that HMIs are 143 delivered twice for this error 144 145 - First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1. 146 - Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB. 147 148 On second HMI we end up throwing "HMI: TB invalid without core error 149 reported" even though TB is in a valid state. 150- opal/hmi: Stop flooding HMI event for TOD errors. 151 152 Fix the issue where every thread on the chip sends HMI event to host for 153 TOD errors. TOD errors are reported to all the core/threads on the chip. 154 Any one thread can fix the error and send event. Rest of the threads don't 155 need to send HMI event unnecessarily. 156- opal/hmi: Fix soft lockups during TOD errors 157 158 There are some TOD errors which do not affect working of TOD and TB. They 159 stay in valid state. Hence we don't need rendez vous for TOD errors that 160 does not affect TB working. 161 162 TOD errors that affects TOD/TB will report a global error on TFMR[44] 163 alongwith bit 51, and they will go in rendez vous path as expected. 164 165 But the TOD errors that does not affect TB register sets only TFMR bit 51. 166 The TFMR bit 51 is cleared when any single thread clears the TOD error. 167 Once cleared, the bit 51 is reflected to all the cores on that chip. Any 168 thread that reads the TFMR register after the error is cleared will see 169 TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through 170 rendez-vous path and threads that see TFMR[51]=0, returns doing 171 nothing. This ends up in a soft lockups in host kernel. 172 173 This patch fixes this issue by not considering TOD interrupt (TFMR[51]) 174 as a core-global error and hence avoiding rendez-vous path completely. 175 Instead threads that see TFMR[51]=1 will now take different path that 176 just do the TOD error recovery. 177- opal/hmi: Do not send HMI event if no errors are found. 178 179 For TOD errors, all the cores in the chip get HMIs. Any one thread from any 180 core can fix the issue and TFMR will have error conditions cleared. Rest of 181 the threads need take any action if TOD errors are already cleared. Hence 182 thread 0 of every core should get a fresh copy of TFMR before going ahead 183 recovery path. Initialize recover = -1, so that if no errors found that 184 thread need not send a HMI event to linux. This helps in stop flooding host 185 with hmi event by every thread even there are no errors found. 186- opal/hmi: Initialize the hmi event with old value of HMER. 187 188 Do this before we check for TFAC errors. Otherwise the event at host console 189 shows no error reported in HMER register. 190 191 Without this patch the console event show HMER with all zeros :: 192 193 [ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered] 194 [ 216.753498] Error detail: Timer facility experienced an error 195 [ 216.753509] HMER: 0000000000000000 196 [ 216.753518] TFMR: 3c12000870e04000 197 198 After this patch it shows old HMER values on host console: :: 199 200 [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered] 201 [ 2237.652651] Error detail: Timer facility experienced an error 202 [ 2237.652766] HMER: 0840000000000000 203 [ 2237.652837] TFMR: 3c12000870e04000 204- opal/hmi: Rework HMI handling of TFAC errors 205 206 This patch reworks the HMI handling for TFAC errors by introducing 207 4 rendez-vous points improve the thread synchronization while handling 208 timebase errors that requires all thread to clear dirty data from TB/HDEC 209 register before clearing the errors. 210- opal/hmi: Don't bother passing HMER to pre-recovery cleanup 211 212 The test for TFAC error is now redundant so we remove it and 213 remove the HMER argument. 214- opal/hmi: Move timer related error handling to a separate function 215 216 Currently no functional change. This is a first step to completely 217 rewriting how these things are handled. 218- opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux 219 220 It returns a 64-bit flags mask currently set to provide info 221 about which timer facilities were lost, and whether an event 222 was generated. 223- opal/hmi: Remove races in clearing HMER 224 225 Writing to HMER acts as an "AND". The current code writes back the 226 value we originally read with the bits we handled cleared. This is 227 racy, if a new bit gets set in HW after the original read, we'll end 228 up clearing it without handling it. 229 230 Instead, use an all 1's mask with only the bit handled cleared. 231- opal/hmi: Don't re-read HMER multiple times 232 233 We want to make sure all reporting and actions are based 234 upon the same snapshot of HMER in case bits get added 235 by HW while we are in OPAL. 236 237libflash and ffspart 238^^^^^^^^^^^^^^^^^^^^ 239 240Many improvements to the `ffspart` utility and `libflash` have come 241in this release, making `ffspart` suitable for building bit-identical 242PNOR images as the existing tooling used by `op-build`. The plan is to 243switch `op-build` to use this infrastructure in the not too distant 244future. 245 246- libflash/blocklevel: Make read/write be ECC agnostic for callers 247 248 The blocklevel abstraction allows for regions of the backing store to be 249 marked as ECC protected so that blocklevel can decode/encode the ECC 250 bytes into the buffer automatically without the caller having to be ECC 251 aware. 252 253 Unfortunately this abstraction is far from perfect, this is only useful 254 if reads and writes are performed at the start of the ECC region or in 255 some circumstances at an ECC aligned position - which requires the 256 caller be aware of the ECC regions. 257 258 The problem that has arisen is that the blocklevel abstraction is 259 initialised somewhere but when it is later called the caller is unaware 260 if ECC exists in the region it wants to arbitrarily read and write to. 261 This should not have been a problem since blocklevel knows. Currently 262 misaligned reads will fail ECC checks and misaligned writes will 263 overwrite ECC bytes and the backing store will become corrupted. 264 265 This patch add the smarts to blocklevel_read() and blocklevel_write() to 266 cope with the problem. Note that ECC can always be bypassed by calling 267 blocklevel_raw_() functions. 268 269 All this work means that the gard tool can can safely call 270 blocklevel_read() and blocklevel_write() and as long as the blocklevel 271 knows of the presence of ECC then it will deal with all cases. 272 273 This also commit removes code in the gard tool which compensated for 274 inadequacies no longer present in blocklevel. 275- libflash/blocklevel: Return region start from ecc_protected() 276 277 Currently all ecc_protected() does is say if a region is ECC protected 278 or not. Knowing a region is ECC protected is one thing but there isn't 279 much that can be done afterwards if this is the only known fact. A lot 280 more can be done if the caller is told where the ECC region begins. 281 282 Knowing where the ECC region start it allows to caller to align its 283 read/and writes. This allows for more flexibility calling read and write 284 without knowing exactly how the backing store is organised. 285- libflash/ecc: Add helpers to align a position within an ecc buffer 286 287 As part of ongoing work to make ECC invisible to higher levels up the 288 stack this function converts a 'position' which should be ECC agnostic 289 to the equivalent position within an ECC region starting at a specified 290 location. 291- libflash/ecc: Add functions to deal with unaligned ECC memcpy 292- external/ffspart: Improve error output 293- libffs: Fix bad checks for partition overlap 294 295 Not all TOCs are written at zero 296- libflash/libffs: Allow caller to specifiy header partition 297 298 An FFS TOC is comprised of two parts. A small header which has a magic 299 and very minimmal information about the TOC which will be common to all 300 partitions, things like number of patritions, block sizes and the like. 301 Following this small header are a series of entries. Importantly there 302 is always an entry which encompases the TOC its self, this is usually 303 called the 'part' partition. 304 305 Currently libffs always assumes that the 'part' partition is at zero. 306 While there is always a TOC and zero there doesn't actually have to be. 307 PNORs may have multiple TOCs within them, therefore libffs needs to be 308 flexible enough to allow callers to specify TOCs not at zero. 309 310 The 'part' partition is otherwise a regular partition which may have 311 flags associated with it. libffs should allow the user to set the flags 312 for the 'part' partition. 313 314 This patch achieves both by allowing the caller to specify the 'part' 315 partition. The caller can not and libffs will provide a sensible 316 default. 317- libflash/libffs: Refcount ffs entries 318 319 Currently consumers can add an new ffs entry to multiple headers, this 320 is fine but freeing any of the headers will cause the entry to be freed, 321 this causes double free problems. 322 323 Even if only one header is uses, the consumer of the library still has a 324 reference to the entry, which they may well reuse at some other point. 325 326 libffs will now refcount entries and only free when there are no more 327 references. 328 329 This patch also removes the pointless return value of ffs_hdr_free() 330- libflash/libffs: Switch to storing header entries in an array 331 332 Since the libffs no longer needs to sort the entries as they get added 333 it makes little sense to have the complexity of a linked list when an 334 array will suffice. 335- libflash/libffs: Remove backup partition from TOC generation code 336 337 It turns out this code was messy and not all that reliable. Doing it at 338 the library level adds complexity to the library and restrictions to the 339 caller. 340 341 A simpler approach can be achived with the just instantiating multiple 342 ffs_header structures pointing to different parts of the same file. 343- libflash/libffs: Remove the 'sides' from the FFS TOC generation code 344 345 It turns out this code was messy and not all that reliable. Doing it at 346 the library level adds complexity to the library and restrictions to the 347 caller. 348 349 A simpler approach can be achived with the just instantiating multiple 350 ffs_header structures pointing to different parts of the same file. 351- libflash/libffs: Always add entries to the end of the TOC 352 353 It turns out that sorted order isn't the best idea. This removes 354 flexibility from the caller. If the user wants their partitions in 355 sorted order, they should insert them in sorted order. 356- external/ffspart: Remove side, order and backup options 357 358 These options are currently flakey in libflash/libffs so there isn't 359 much point to being able to use them in ffspart. 360 361 Future reworks planned for libflash/libffs will render these options 362 redundant anyway. 363- libflash/libffs: ffs_close() should use ffs_hdr_free() 364- libflash/libffs: Add setter for a partitions actual size 365- pflash: Use ffs_entry_user_to_string() to standardise flag strings 366- libffs: Standardise ffs partition flags 367 368 It seems we've developed a character respresentation for ffs partition 369 flags. Currently only pflash really prints them so it hasn't been a 370 problem but now ffspart wants to read them in from user input. 371 372 It is important that what libffs reads and what pflash prints remain 373 consistent, we should move the code into libffs to avoid problems. 374- external/ffspart: Allow # comments in input file\ 375 376p9dsu Platform changes 377---------------------- 378 379The p9dsu platform from SuperMicro (also known as 'Boston') has received 380a number of updates, and the patches once carried by SuperMicro are now 381upstream. 382 383Since 6.0-rc1: 384 385- p9dsu: timeout for variant detection, default to 2uess 386 387 388Since 5.11: 389 390- p9dsu: detect p9dsu variant even when hostboot doesn't tell us 391 392 The SuperMicro BMC can tell us what riser type we have, which dictates 393 the PCI slot tables. Usually, in an environment that a customer would 394 experience, Hostboot will do the query with an SMC specific patch 395 (not upstream as there's no platform specific code in hostboot) 396 and skiboot knows what variant it is based on the compatible string. 397 398 However, if you're using upstream hostboot, you only get the bare 399 'p9dsu' compatible type. We can work around this by asking the BMC 400 ourselves and setting the slot table appropriately. We do this 401 syncronously in platform init so that we don't start probing 402 PCI before we setup the slot table. 403- p9dsu: add slot power limit. 404- p9dsu: add pci slot table for Boston LC 1U/2U and Boston LA/ESS. 405- p9dsu HACK: fix system-vpd eeprom 406- p9dsu: change esel command from AMI to IBM 0x3a. 407 408ZZ Platform Changes 409------------------- 410 411- hdata/i2c: Fix up pci hotplug labels 412 413 These labels are used on the devices used to do PCIe slot power control 414 for implementing PCIe hotplug. I'm not sure how they ended up as 415 "eeprom-pgood" and "eeprom-controller" since that doesn't make any sense. 416- hdata/i2c: Ignore multi-port I2C devices 417 418 Recent FSP firmware builds add support for multi-port I2C devices such 419 as the GPIO expanders used for the presence detect of OpenCAPI devices 420 and the PCIe hotplug controllers used to power cycle PCIe slots on ZZ. 421 422 The OpenCAPI driver inside of skiboot currently uses a platform-specific 423 method to talk to the relevant I2C device rather than relying on HDAT 424 since not all platforms correctly report the I2C devices (hello Zaius). 425 Additionally the nature of multi-port devices require that we a device 426 specific handler so that we generate the correct DT bindings. Currently 427 we don't and there is no immediate need for this support so just ignore 428 the multi-port devices for now. 429- hdata/i2c: Replace `i2c_` prefix with `dev_` 430 431 The current naming scheme makes it easy to conflate "i2cm_port" and 432 "i2c_port." The latter is used to describe multi-port I2C devices such 433 as GPIO expanders and multi-channel PCIe hotplug controllers. Rename 434 i2c_port to dev_port to make the two a bit more distinct. 435 436 Also rename i2c_addr to dev_addr for consistency. 437- hdata/i2c: Ignore CFAM I2C master 438 439 Recent FSP firmware builds put in information about the CFAM I2C master 440 in addition the to host I2C masters accessible via XSCOM. Odds are this 441 information should not be there since there's no handshaking between the 442 FSP/BMC and the host over who controls that I2C master, but it is so 443 we need to deal with it. 444 445 This patch adds filtering to the HDAT parser so it ignores the CFAM I2C 446 master. Without this it will create a bogus i2cm@<addr> which migh cause 447 issues. 448- ZZ: hw/imc: Add support to load imc catalog lid file 449 450 Add support to load the imc catalog from a lid file packaged 451 as part of the system firmware. Lid number allocated 452 is 0x80f00103.lid. 453 454 455Bugs Fixed 456---------- 457 458Since 6.0-rc2: 459 460- core/opal: Fix recursion check in opal_run_pollers() 461 462 An earlier commit introduced a counter variable poller_recursion to 463 limit to the number number of error messages shown when opal_pollers 464 are run recursively. However the check for the counter value was 465 placed in a way that the poller recursion was only detected first 16 466 times and then allowed afterwards. 467 468 This patch fixes this by moving the check for the counter value inside 469 the conditional branch with some re-factoring so that opal_poller 470 recursion is not erroneously allowed after poll_recursion is detected 471 first 16 times. 472- phb4: Print WOF registers on fence detect 473 474 Without the WOF registers it's hard to figure out what went wrong first, 475 so print those when we print the FIRs when a fence is detected. 476- p9dsu: detect variant in init only if probe fails to found. 477 478 Currently the slot table init happens twice in both probe and init 479 functions due to the variant detection logic called with in-correct 480 condition check. 481 482Since 6.0-rc1: 483 484- core/direct-controls: improve p9_stop_thread error handling 485 486 p9_stop_thread should fail the operation if it finds the thread was 487 already quiescd. This implies something else is doing direct controls 488 on the thread (e.g., pdbg) or there is some exceptional condition we 489 don't know how to deal with. Proceeding here would cause things to 490 trample on each other, for example the hard lockup watchdog trying to 491 send a sreset to the core while it is stopped for debugging with pdbg 492 will end in tears. 493 494 If p9_stop_thread times out waiting for the thread to quiesce, do 495 not hit it with a core_start direct control, because we don't know 496 what state things are in and doing more things at this point is worse 497 than doing nothing. There is no good recipe described in the workbook 498 to de-assert the core_stop control if it fails to quiesce the thread. 499 After timing out here, the thread may eventually quiesce and get 500 stuck, but that's simpler to debug than undefied behaviour. 501 502- core/direct-controls: fix p9_cont_thread for stopped/inactive threads 503 504 Firstly, p9_cont_thread should check that the thread actually was 505 quiesced before it tries to resume it. Anything could happen if we 506 try this from an arbitrary thread state. 507 508 Then when resuming a quiesced thread that is inactive or stopped (in 509 a stop idle state), we must not send a core_start direct control, 510 clear_maint must be used in these cases. 511- hmi: Clear unknown debug trigger 512 513 On some systems, seeing hangs like this when Linux starts: :: 514 515 [ 170.027252763,5] OCC: All Chip Rdy after 0 ms 516 [ 170.062930145,5] INIT: Starting kernel at 0x20011000, fdt at 0x30ae0530 366247 bytes) 517 [ 171.238270428,5] OPAL: Switch to little-endian OS 518 519 If you look at the in memory skiboot console (or do `nvram -p 520 ibm,skiboot --update-config log-level-driver=7`) we see the console get 521 spammed with: :: 522 523 [ 5209.109790675,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000 524 [ 5209.109792716,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000 525 [ 5209.109794695,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000 526 [ 5209.109796689,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000 527 528 We're taking the debug trigger (bit 17) early on, before the 529 hmi_debug_trigger function in the kernel is set up. 530 531 This clears the HMI in Skiboot and reports to the kernel instead of 532 bringing down the machine. 533 534- core/hmi: assign flags=0 in case nothing set by handle_hmi_exception 535 536 Theoretically we could have returned junk to the OS in this parameter. 537 538- SLW: Fix mambo boot to use stop states 539 540 After commit 35c66b8ce5a2 ("SLW: Move MAMBO simulator checks to 541 slw_init"), mambo boot no longer calls add_cpu_idle_state_properties() 542 and as such we never enable stop states. 543 544 After adding the call back, we get more testing coverage as well 545 as faster mambo SMT boots. 546 547- phb4: Hardware init updates 548 549 CFG Write Request Timeout was incorrectly set to informational and not 550 fatal for both non-CAPI and CAPI, so set it to fatal. This was a 551 mistake in the specification. Correcting this fixes a niche bug in 552 escalation (which is necessary on pre-DD2.2) that can cause a checkstop 553 due to a NCU timeout. 554 555 In addition, set the values in the timeout control registers to match. 556 This fixes an extremely rare and unreproducible bug, though the current 557 timings don't make sense since they're higher than the NCU timeout (16) 558 which will checkstop the machine anyway. 559 560- SLW: quieten 'Configuring self-restore' for DARN,NCU_SPEC_BAR and HRMOR 561 562Since 5.11: 563 564- core: Fix iteration condition to skip garded cpu 565- uart: fix uart_opal_flush to take console lock over uart_con_flush 566 This bug meant that OPAL_CONSOLE_FLUSH didn't take the appropriate locks. 567 Luckily, since this call is only currently used in the crash path. 568- xive: fix missing unlock in error path 569- OPAL_PCI_SET_POWER_STATE: fix locking in error paths 570 571 Otherwise we could exit OPAL holding locks, potentially leading 572 to all sorts of problems later on. 573- hw/slw: Don't assert on a unknown chip 574 575 For some reason skiboot populates nodes in /cpus/ for the cores on 576 chips that are deconfigured. As a result Linux includes the threads 577 of those cores in it's set of possible CPUs in the system and attempts 578 to set the SPR values that should be used when waking a thread from 579 a deep sleep state. 580 581 However, in the case where we have deconfigured chip we don't create 582 a xscom node for that chip and as a result we don't have a proc_chip 583 structure for that chip either. In turn, this results in an assertion 584 failure when calling opal_slw_set_reg() since it expects the chip 585 structure to exist. Fix this up and print an error instead. 586- opal/hmi: Generate one event per core for processor recovery. 587 588 Processor recovery is per core error. All threads on that core receive 589 HMI. All threads don't need to generate HMI event for same error. 590 591 Let thread 0 only generate the event. 592- sensors: Dont add DTS sensors when OCC inband sensors are available 593 594 There are two sets of core temperature sensors today. One is DTS scom 595 based core temperature sensors and the second group is the sensors 596 provided by OCC. DTS is the highest temperature among the different 597 temperature zones in the core while OCC core temperature sensors are 598 the average temperature of the core. DTS sensors are read directly by 599 the host by SCOMing the DTS sensors while OCC sensors are read and 600 updated by OCC to main memory. 601 602 Reading DTS sensors by SCOMing is a heavy and slower operation as 603 compared to reading OCC sensors which is as good as reading memory. 604 So dont add DTS sensors when OCC sensors are available. 605- core/fast-reboot: Increase timeout for dctl sreset to 1sec 606 607 Direct control xscom can take more time to complete. We seem to 608 wait too little on Boston failing fast-reboot for no good reason. 609 610 Increase timeout to 1 sec as a reasonable value for sreset to be delivered 611 and core to start executing instructions. 612- occ: sensors-groups: Add DT properties to mark HWMON sensor groups 613 614 Fix the sensor type to match HWMON sensor types. Add compatible flag 615 to indicate the environmental sensor groups so that operations on 616 these groups can be handled by HWMON linux interface. 617- core: Correctly load initramfs in stb container 618 619 Skiboot does not calculate the actual size and start location of the 620 initramfs if it is wrapped by an STB container (for example if loading 621 an initramfs from the ROOTFS partition). 622 623 Check if the initramfs is in an STB container and determine the size and 624 location correctly in the same manner as the kernel. Since 625 load_initramfs() is called after load_kernel() move the call to 626 trustedboot_exit_boot_services() into load_and_boot_kernel() so it is 627 called after both of these. 628- hdat/i2c.c: quieten "v2 found, parsing as v1" 629- hw/imc: Check for pause_microcode_at_boot() return status 630 631 pause_microcode_at_boot() loops through all the chip's ucode 632 control block and pause the ucode if it is in the running state. 633 But it does not fail if any of the chip's ucode is not initialised. 634 635 Add code to return a failure if ucode is not initialized in any 636 of the chip. Since pause_microcode_at_boot() is called just before 637 attaching the IMC device nodes in imc_init(), add code to check for 638 the function return. 639 640 641Slot location code fixes: 642 643- npu2: Use ibm, loc-code rather than ibm, slot-label 644 645 The ibm,slot-label property is to name the slot that appears under a 646 PCIe bridge. In the past we (ab)used the slot tables to attach names 647 to GPU devices and their corresponding NVLinks which resulted in npu2.c 648 using slot-label as a location code rather than as a way to name slots. 649 650 Fix this up since it's confusing. 651- hdata/slots: Apply slot label to the parent slot 652 653 Slot names only really make sense when applied to an actual slot rather 654 than a device. On witherspoon the GPU devices have a name associated with 655 the device rather than the slot for the GPUs. Add a hack that moves the 656 slot label to the parent slot rather than on the device itself. 657- pci-dt-slot: Big ol' cleanup 658 659 The underlying data that we get from HDAT can only really describe a 660 PCIe system. As such we can simplify the devicetree slot lookup code 661 by only caring about the important cases, namly, root ports and switch 662 downstream ports. 663 664 This also fixes a bug where root port didn't get a Slot label applied 665 which results in devices under that port not having ibm,loc-code set. 666 This results in the EEH core being unable to report the location of 667 EEHed devices under that port. 668 669opal-prd 670^^^^^^^^ 671- opal-prd: Insert powernv_flash module 672 673 Explictly load powernv_flash module on BMC based system so that we are sure 674 that flash device is created before starting opal-prd daemon. 675 676 Note that I have replaced pnor_available() check with is_fsp_system(). As we 677 want to load module on BMC system only. Also pnor_init has enough logic to 678 detect flash device. Hence pnor_available() becomes redundant check. 679 680NPU2/NVLINK2 681^^^^^^^^^^^^ 682- npu2/hw-procedures: fence bricks on GPU reset 683 684 The NPU workbook defines a way of fencing a brick and 685 getting the brick out of fence state. We do have an implementation 686 of bringing the brick out of fenced/quiesced state. We do 687 the latter in our procedures, but to support run time reset 688 we need to do the former. 689 690 The fencing ensures that access to memory behind the links 691 will not lead to HMI's, but instead SUE's will be populated 692 in cache (in the case of speculation). The expectation is then 693 that prior to and after reset, the operating system components 694 will flush the cache for the region of memory behind the GPU. 695 696 This patch does the following: 697 698 1. Implements a npu2_dev_fence_brick() function to set/clear 699 fence state 700 2. Clear FIR bits prior to clearing the fence status 701 3. Clear's the fence status 702 4. We take the powerbus out of CQ fence much later now, 703 in credits_check() which is the last hardware procedure 704 called after link training. 705- hw/npu2.c: Remove static configuration of NPU2 register 706 707 The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to 708 select NVLink mode, however Hostboot should configure other bits in this 709 register. 710 711 For some reason Skiboot was explicitly clearing bit-6 712 (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared 713 as recent Hostboot versions explicitly set it to the correct value based on 714 the specific system configuration. Therefore Skiboot should not alter it. 715 716 Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or 717 not. Hostboot does not configure this bit so Skiboot should continue to 718 configure it. 719- npu2: Improve log output of GPU-to-link mapping 720 721 Debugging issues related to unconnected NVLinks can be a little less 722 irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog(). 723 724 In short, change this: :: 725 726 NPU2: comparing GPU 'GPU2' and NPU2 'GPU1' 727 NPU2: comparing GPU 'GPU3' and NPU2 'GPU1' 728 NPU2: comparing GPU 'GPU4' and NPU2 'GPU1' 729 NPU2: comparing GPU 'GPU5' and NPU2 'GPU1' 730 : 731 npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem. 732 733 to this: :: 734 735 NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1' 736 NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1' 737 NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1' 738 NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1' 739 : 740 NPU6:0:1.0 No PCI device found for slot 'GPU1' 741- npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context init 742 743 A bad GPU or other condition may leave us with a subset of links that 744 never get initialized. If an ATSD is sent to one of those bricks, it 745 will never complete, leaving us waiting forever for a response: :: 746 747 watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050] 748 ... 749 Modules linked in: nvidia_uvm(O) nvidia(O) 750 CPU: 23 PID: 2050 Comm: acos Tainted: G W O 4.14.0 #2 751 task: c0000000285cfc00 task.stack: c000001fea860000 752 NIP: c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60 753 REGS: c000001fea863550 TRAP: 0901 Tainted: G W O (4.14.0) 754 MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004484 XER: 20040000 755 CFAR: c0000000000abdf4 SOFTE: 1 756 GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820 757 GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560 758 GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff 759 GPR12: 0000000000008000 c000000003167e80 760 NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0 761 LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370 762 763 764 ATSDs are only sent to bricks which have a valid entry in the XTS_BDF 765 table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless 766 we make it all the way to creating a context for the BDF. 767 768Secure and Trusted Boot 769^^^^^^^^^^^^^^^^^^^^^^^ 770- hdata/tpmrel: detect tpm not present by looking up the stinfo->status 771 772 Skiboot detects if tpm is present by checking if a secureboot_tpm_info 773 entry exists. However, if a tpm is not present, hostboot also creates a 774 secureboot_tpm_info entry. In this case, hostboot creates an empty 775 entry, but setting the field tpm_status to TPM_NOT_PRESENT. 776 777 This detects if tpm is not present by looking up the stinfo->status. 778 779 This fixes the "TPMREL: TPM node not found for chip_id=0 (HB bug)" 780 issue, reproduced when skiboot is running on a system that has no tpm. 781 782PCI 783^^^ 784- phb4: Restore bus numbers after CRS 785 786 Currently we restore PCIe bus numbers right after the link is 787 up. Unfortunately as this point we haven't done CRS so config space 788 may not be accessible. 789 790 This moves the bus number restore till after CRS has happened. 791- romulus: Add a barebones slot table 792- phb4: Quieten and improve "Timeout waiting for electrical link" 793 794 This happens normally if a slot doesn't have a working HW presence 795 detect and relies instead of inband presence detect. 796 797 The message we display is scary and not very useful unless ou 798 are debugging, so quiten it up and change it to something more 799 meaningful. 800- pcie-slot: Don't fail powering on an already on switch 801 802 If the power state is already the required value, return 803 OPAL_SUCCESS rather than OPAL_PARAMETER to avoid spurrious 804 errors during boot. 805 806CAPI/OpenCAPI 807^^^^^^^^^^^^^ 808- capi: Keep the current mmio windows in the mbt cache table. 809 810 When the phb is used as a CAPI interface, the current mmio windows list 811 is cleaned before adding the capi and the prefetchable memory (M64) 812 windows, which implies that the non-prefetchable BAR is no more 813 configured. 814 This patch allows to set only the mbt bar to pass capi mmio window and 815 to keep, as defined, the other mmio values (M32 and M64). 816- npu2-opencapi: Fix 'link internal error' FIR, take 2 817 818 When setting up an opencapi link, we set the transport muxes first, 819 then set the PHY training config register, which includes disabling 820 nvlink mode for the bricks. That's the order of the init sequence, as 821 found in the NPU workbook. 822 823 In reality, doing so works, but it raises 2 FIR bits in the PowerBus 824 OLL FIR Register for the 2 links when we configure the transport 825 muxes. Presumably because nvlink is not disabled yet and we are 826 configuring the transport muxes for opencapi. 827 828 bit 60: 829 link0 internal error 830 bit 61: 831 link1 internal error 832 833 Overall the current setup ends up being correct and everything works, 834 but we raise 2 FIR bits. 835 836 So tweak the order of operations to disable nvlink before configuring 837 the transport muxes. Incidentally, this is what the scripts from the 838 opencapi enablement team were doing all along. 839- npu2-opencapi: Fix 'link internal error' FIR, take 1 840 841 When we setup a link, we always enable ODL0 and ODL1 at the same time 842 in the PHY training config register, even though we are setting up 843 only one OTL/ODL, so it raises a "link internal error" FIR bit in the 844 PowerBus OLL FIR Register for the second link. The error is harmless, 845 as we'll eventually setup the second link, but there's no reason to 846 raise that FIR bit. 847 848 The fix is simply to only enable the ODL we are using for the link. 849- phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode. 850 851 The cxl driver will set the capi value, like other drivers already do. 852- phb4: set TVT1 for tunneled operations in capi mode 853 854 The ASN indication is used for tunneled operations (as_notify and 855 atomics). Tunneled operation messages can be sent in PCI mode as 856 well as CAPI mode. 857 858 The address field of as_notify messages is hijacked to encode the 859 LPID/PID/TID of the target thread, so those messages should not go 860 through address translation. Therefore bit 59 is part of the ASN 861 indication. 862 863 This patch sets TVT#1 in bypass mode when capi mode is enabled, 864 to prevent as_notify messages from being dropped. 865 866Debugging/Testing improvements 867------------------------------ 868 869Since 6.0-rc1: 870 871- mambo: Enable XER CA32 and OV32 bits on P9 872 873 POWER9 adds 32 bit carry and overflow bits to the XER, but we need to 874 set the relevant CTRL1 bit to enable them. 875- Makefile: Fix building natively on ppc64le 876 877 When on ppc64le and CROSS is not set by the environment, make assumes 878 ppc64 and sets a default CROSS. Check for ppc64le as well, so that 879 'make' works out of the box on ppc64le. 880- Experimental support for building with Clang 881- Improvements to testing and Travis CI 882 883Since 5.11: 884 885- core/stack: backtrace unwind basic OPAL call details 886 887 Put OPAL callers' r1 into the stack back chain, and then use that to 888 unwind back to the OPAL entry frame (as opposed to boot entry, which 889 has a 0 back chain). 890 891 From there, dump the OPAL call token and the caller's r1. A backtrace 892 looks like this: :: 893 894 CPU 0000 Backtrace: 895 S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c 896 S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c 897 S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4 898 S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c 899 --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 --- 900 901 This is pretty basic for the moment, but it does give you the bottom 902 of the Linux stack. It will allow some interesting improvements in 903 future. 904 905 First, with the eframe, all the call's parameters can be printed out 906 as well. The ___backtrace / ___print_backtrace API needs to be 907 reworked in order to support this, but it's otherwise very simple 908 (see opal_trace_entry()). 909 910 Second, it will allow Linux's stack to be passed back to Linux via 911 a debugging opal call. This will allow Linux's BUG() or xmon to 912 also print the Linux back trace in case of a NMI or MCE or watchdog 913 lockup that hits in OPAL. 914- asm/head: implement quiescing without stack or clobbering regs 915 916 Quiescing currently is implmeented in C in opal_entry before the 917 opal call handler is called. This works well enough for simple 918 cases like fast reset when one CPU wants all others out of the way. 919 920 Linux would like to use it to prevent an sreset IPI from 921 interrupting firmware, which could lead to deadlocks when crash 922 dumping or entering the debugger. Linux interrupts do not recover 923 well when returning back to general OPAL code, due to r13 not being 924 restored. OPAL also can't be re-entered, which may happen e.g., 925 from the debugger. 926 927 So move the quiesce hold/reject to entry code, beore the stack or 928 r1 or r13 registers are switched. OPAL can be interrupted and 929 returned to or re-entered during this period. 930 931 This does not completely solve all such problems. OPAL will be 932 interrupted with sreset if the quiesce times out, and it can be 933 interrupted by MCEs as well. These still have the issues above. 934- core/opal: Allow poller re-entry if OPAL was re-entered 935 936 If an NMI interrupts the middle of running pollers and the OS 937 invokes pollers again (e.g., for console output), the poller 938 re-entrancy check will prevent it from running and spam the 939 console. 940 941 That check was designed to catch a poller calling opal_run_pollers, 942 OPAL re-entrancy is something different and is detected elsewhere. 943 Avoid the poller recursion check if OPAL has been re-entered. This 944 is a best-effort attempt to cope with errors. 945- core/opal: Emergency stack for re-entry 946 947 This detects OPAL being re-entered by the OS, and switches to an 948 emergency stack if it was. This protects the firmware's main stack 949 from re-entrancy and allows the OS to use NMI facilities for crash 950 / debug functionality. 951 952 Further nested re-entry will destroy the previous emergency stack 953 and prevent returning, but those should be rare cases. 954 955 This stack is sized at 16kB, which doubles the size of CPU stacks, 956 so as not to introduce a regression in primary stack size. The 16kB 957 stack originally had a 4kB machine check stack at the top, which was 958 removed by 80eee1946 ("opal: Remove machine check interrupt patching 959 in OPAL."). So it is possible the size could be tightened again, but 960 that would require further analysis. 961 962- hdat_to_dt: hash_prop the same on all platforms 963 Fixes this unit test on ppc64le hosts. 964- mambo: Add persistent memory disk support 965 966 This adds support to for mapping disks images using persistent 967 memory. Disks can be added by setting this ENV variable: 968 969 PMEM_DISK="/mydisks/disk1.img,/mydisks/disk2.img" 970 971 These will show up in Linux as /dev/pmem0 and /dev/pmem1. 972 973 This uses a new feature in mambo "mysim memory mmap .." which is only 974 available since mambo commit 0131f0fc08 (from 24/4/2018). 975 976 This also needs the of_pmem.c driver in Linux which is only available 977 since v4.17. It works with powernv_defconfig + CONFIG_OF_PMEM. 978- external/mambo: Add di command to decode instructions 979 980 By default you get 16 instructions but you can specify the number you 981 want. i.e. :: 982 983 systemsim % di 0x100 4 984 0x0000000000000100: Enc:0xA64BB17D : mtspr HSPRG1,r13 985 0x0000000000000104: Enc:0xA64AB07D : mfspr r13,HSPRG0 986 0x0000000000000108: Enc:0xF0092DF9 : std r9,0x9F0(r13) 987 0x000000000000010C: Enc:0xA6E2207D : mfspr r9,PPR 988 989 Using di since it's what xmon uses. 990- mambo/mambo_utils.tcl: Inject an MCE at a specified address 991 992 Currently we don't support injecting an MCE on a specific address. 993 This is useful for testing functionality like memcpy_mcsafe() 994 (see https://patchwork.ozlabs.org/cover/893339/) 995 996 The core of the functionality is a routine called 997 inject_mce_ue_on_addr, which takes an addr argument and injects 998 an MCE (load/store with UE) when the specified address is accessed 999 by code. This functionality can easily be enhanced to cover 1000 instruction UE's as well. 1001 1002 A sample use case to create an MCE on stack access would be :: 1003 1004 set addr [mysim display gpr 1] 1005 inject_mce_ue_on_addr $addr 1006 1007 This would cause an mce on any r1 or r1 based access 1008- external/mambo: improve helper for machine checks 1009 1010 Improve workarounds for stop injection, because mambo often will 1011 trigger on 0x104/204 when injecting sreset/mces. 1012 1013 This also adds a workaround to skip injecting on reservations to 1014 avoid infinite loops when doing inject_mce_step. 1015- travis: Enable ppc64le builds 1016 1017 At least on the IBM Travis Enterprise instance, we can now do 1018 ppc64le builds! 1019 1020 We can only build a subset of our matrix due to availability of 1021 ppc64le distros. The Dockerfiles need some tweaking to only 1022 attempt to install (x86_64 only) Mambo binaries, as well as the 1023 build scripts. 1024- external: Add "lpc" tool 1025 1026 This is a little front-end to the lpc debugfs files to access 1027 the LPC bus from userspace on the host. 1028- core/test/run-trace: fix on ppc64el 1029