1.. _skiboot-6.0-rc1: 2 3skiboot-6.0-rc1 4================ 5 6skiboot v6.0-rc1 was released on Tuesday May 1st 2018. It is the first 7release candidate of skiboot 6.0, which will become the new stable release 8of skiboot following the 5.11 release, first released April 6th 2018. 9 10Skiboot 6.0 will mark the basis for op-build v2.0 and will be required for 11POWER9 systems. 12 13skiboot v6.0-rc1 contains all bug fixes as of :ref:`skiboot-5.11`, 14:ref:`skiboot-5.10.5`, and :ref:`skiboot-5.4.9` (the currently maintained 15stable releases). Once 6.0 is released, we do *not* expect any further 16stable releases in the 5.10.x series, nor in the 5.11.x series. 17 18For how the skiboot stable releases work, see :ref:`stable-rules` for details. 19 20The current plan is to cut the final 6.0 in early May, with skiboot 6.0 21being for all POWER8 and POWER9 platforms in op-build v2.0. 22 23Over skiboot-5.11, we have the following changes: 24 25New Features 26------------ 27- Disable stop states from OPAL 28 29 On ZZ, stop4,5,11 are enabled for PowerVM, even though doing 30 so may cause problems with OPAL due to bugs in hcode. 31 32 For other platforms, this isn't so much of an issue as 33 we can just control stop states by the MRW. However the 34 rebuild-the-world approach to changing values there is a bit 35 annoying if you just want to rule out a specific stop state 36 from being problematic. 37 38 Provide an nvram option to override what's disabled in OPAL. 39 40 The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2) 41 42 You can set an NVRAM override with: :: 43 44 nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF 45 46 This nvram override will disable *all* stop states. 47- interrupts: Create an "interrupts" property in the OPAL node 48 49 Deprecate the old "opal-interrupts", it's still there, but the new 50 property follows the standard and allow us to specify whether an 51 interrupt is level or edge sensitive. 52 53 Similarly create "interrupt-names" whose content is identical to 54 "opal-interrupts-names". 55- SBE: Add timer support on POWER9 56 57 SBE on P9 provides one shot programmable timer facility. We can use this 58 to implement OPAL timers and hence limit the reliance on the Linux 59 heartbeat (similar to HW timer facility provided by SLW on P8). 60- Add SBE driver support 61 62 SBE (Self Boot Engine) on P9 has two different jobs: 63 - Boot the chip up to the point the core is functional 64 - Provide various services like timer, scom, stash MPIPL, etc., at runtime 65 66 We will use SBE for various purposes like timer, MPIPL, etc. 67 68- opal:hmi: Add missing processor recovery reason string. 69 70 With this patch now we see reason string printed for CORE_WOF[43] bit. :: 71 72 [ 477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred. 73 [ 477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error: 74 [ 477.352242181,7] HMI: PC - Thread hang recovery 75- Add DIMM actual speed to device tree 76 77 Recent HDAT provides DIMM actuall speed. Lets add this to device tree. 78- Fix DIMM size property 79 80 Today we parse vpd blob to get DIMM size information. This is limited 81 to FSP based system. HDAT provides DIMM size value. Lets use that to 82 populate device tree. So that we can get size information on BMC based 83 system as well. 84 85- PCI: Set slot power limit when supported 86 87 The PCIe slot capability can be implemented in a root or switch 88 downstream port to set the maximum power a card is allowed to draw 89 from the system. This patch adds support for setting the power limit 90 when the platform has defined one. 91- hdata/spira: parse vpd to add part-number and serial-number to xscom@ node 92 93 Expected by FWTS and associates our processor with the part/serial 94 number, which is obviously a good thing for one's own sanity. 95 96 97Improved HMI Handling 98^^^^^^^^^^^^^^^^^^^^^ 99 100- opal/hmi: Add documentation for opal_handle_hmi2 call 101- opal/hmi: Generate hmi event for recovered HDEC parity error. 102- opal/hmi: check thread 0 tfmr to validate latched tfmr errors. 103 104 Due to P9 errata, HDEC parity and TB residue errors are latched for 105 non-zero threads 1-3 even if they are cleared. But these are not 106 latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr 107 value and ignore them on non-zero threads if they are not present on 108 thread 0. 109- opal/hmi: Print additional debug information in rendezvous. 110- opal/hmi: Fix handling of TFMR parity/corrupt error. 111 112 While testing TFMR parity/corrupt error it has been observed that HMIs are 113 delivered twice for this error 114 115 - First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1. 116 - Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB. 117 118 On second HMI we end up throwing "HMI: TB invalid without core error 119 reported" even though TB is in a valid state. 120- opal/hmi: Stop flooding HMI event for TOD errors. 121 122 Fix the issue where every thread on the chip sends HMI event to host for 123 TOD errors. TOD errors are reported to all the core/threads on the chip. 124 Any one thread can fix the error and send event. Rest of the threads don't 125 need to send HMI event unnecessarily. 126- opal/hmi: Fix soft lockups during TOD errors 127 128 There are some TOD errors which do not affect working of TOD and TB. They 129 stay in valid state. Hence we don't need rendez vous for TOD errors that 130 does not affect TB working. 131 132 TOD errors that affects TOD/TB will report a global error on TFMR[44] 133 alongwith bit 51, and they will go in rendez vous path as expected. 134 135 But the TOD errors that does not affect TB register sets only TFMR bit 51. 136 The TFMR bit 51 is cleared when any single thread clears the TOD error. 137 Once cleared, the bit 51 is reflected to all the cores on that chip. Any 138 thread that reads the TFMR register after the error is cleared will see 139 TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through 140 rendez-vous path and threads that see TFMR[51]=0, returns doing 141 nothing. This ends up in a soft lockups in host kernel. 142 143 This patch fixes this issue by not considering TOD interrupt (TFMR[51]) 144 as a core-global error and hence avoiding rendez-vous path completely. 145 Instead threads that see TFMR[51]=1 will now take different path that 146 just do the TOD error recovery. 147- opal/hmi: Do not send HMI event if no errors are found. 148 149 For TOD errors, all the cores in the chip get HMIs. Any one thread from any 150 core can fix the issue and TFMR will have error conditions cleared. Rest of 151 the threads need take any action if TOD errors are already cleared. Hence 152 thread 0 of every core should get a fresh copy of TFMR before going ahead 153 recovery path. Initialize recover = -1, so that if no errors found that 154 thread need not send a HMI event to linux. This helps in stop flooding host 155 with hmi event by every thread even there are no errors found. 156- opal/hmi: Initialize the hmi event with old value of HMER. 157 158 Do this before we check for TFAC errors. Otherwise the event at host console 159 shows no error reported in HMER register. 160 161 Without this patch the console event show HMER with all zeros :: 162 163 [ 216.753417] Severe Hypervisor Maintenance interrupt [Recovered] 164 [ 216.753498] Error detail: Timer facility experienced an error 165 [ 216.753509] HMER: 0000000000000000 166 [ 216.753518] TFMR: 3c12000870e04000 167 168 After this patch it shows old HMER values on host console: :: 169 170 [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered] 171 [ 2237.652651] Error detail: Timer facility experienced an error 172 [ 2237.652766] HMER: 0840000000000000 173 [ 2237.652837] TFMR: 3c12000870e04000 174- opal/hmi: Rework HMI handling of TFAC errors 175 176 This patch reworks the HMI handling for TFAC errors by introducing 177 4 rendez-vous points improve the thread synchronization while handling 178 timebase errors that requires all thread to clear dirty data from TB/HDEC 179 register before clearing the errors. 180- opal/hmi: Don't bother passing HMER to pre-recovery cleanup 181 182 The test for TFAC error is now redundant so we remove it and 183 remove the HMER argument. 184- opal/hmi: Move timer related error handling to a separate function 185 186 Currently no functional change. This is a first step to completely 187 rewriting how these things are handled. 188- opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux 189 190 It returns a 64-bit flags mask currently set to provide info 191 about which timer facilities were lost, and whether an event 192 was generated. 193- opal/hmi: Remove races in clearing HMER 194 195 Writing to HMER acts as an "AND". The current code writes back the 196 value we originally read with the bits we handled cleared. This is 197 racy, if a new bit gets set in HW after the original read, we'll end 198 up clearing it without handling it. 199 200 Instead, use an all 1's mask with only the bit handled cleared. 201- opal/hmi: Don't re-read HMER multiple times 202 203 We want to make sure all reporting and actions are based 204 upon the same snapshot of HMER in case bits get added 205 by HW while we are in OPAL. 206 207libflash and ffspart 208^^^^^^^^^^^^^^^^^^^^ 209 210Many improvements to the `ffspart` utility and `libflash` have come 211in this release, making `ffspart` suitable for building bit-identical 212PNOR images as the existing tooling used by `op-build`. The plan is to 213switch `op-build` to use this infrastructure in the not too distant 214future. 215 216- libflash/blocklevel: Make read/write be ECC agnostic for callers 217 218 The blocklevel abstraction allows for regions of the backing store to be 219 marked as ECC protected so that blocklevel can decode/encode the ECC 220 bytes into the buffer automatically without the caller having to be ECC 221 aware. 222 223 Unfortunately this abstraction is far from perfect, this is only useful 224 if reads and writes are performed at the start of the ECC region or in 225 some circumstances at an ECC aligned position - which requires the 226 caller be aware of the ECC regions. 227 228 The problem that has arisen is that the blocklevel abstraction is 229 initialised somewhere but when it is later called the caller is unaware 230 if ECC exists in the region it wants to arbitrarily read and write to. 231 This should not have been a problem since blocklevel knows. Currently 232 misaligned reads will fail ECC checks and misaligned writes will 233 overwrite ECC bytes and the backing store will become corrupted. 234 235 This patch add the smarts to blocklevel_read() and blocklevel_write() to 236 cope with the problem. Note that ECC can always be bypassed by calling 237 blocklevel_raw_() functions. 238 239 All this work means that the gard tool can can safely call 240 blocklevel_read() and blocklevel_write() and as long as the blocklevel 241 knows of the presence of ECC then it will deal with all cases. 242 243 This also commit removes code in the gard tool which compensated for 244 inadequacies no longer present in blocklevel. 245- libflash/blocklevel: Return region start from ecc_protected() 246 247 Currently all ecc_protected() does is say if a region is ECC protected 248 or not. Knowing a region is ECC protected is one thing but there isn't 249 much that can be done afterwards if this is the only known fact. A lot 250 more can be done if the caller is told where the ECC region begins. 251 252 Knowing where the ECC region start it allows to caller to align its 253 read/and writes. This allows for more flexibility calling read and write 254 without knowing exactly how the backing store is organised. 255- libflash/ecc: Add helpers to align a position within an ecc buffer 256 257 As part of ongoing work to make ECC invisible to higher levels up the 258 stack this function converts a 'position' which should be ECC agnostic 259 to the equivalent position within an ECC region starting at a specified 260 location. 261- libflash/ecc: Add functions to deal with unaligned ECC memcpy 262- external/ffspart: Improve error output 263- libffs: Fix bad checks for partition overlap 264 265 Not all TOCs are written at zero 266- libflash/libffs: Allow caller to specifiy header partition 267 268 An FFS TOC is comprised of two parts. A small header which has a magic 269 and very minimmal information about the TOC which will be common to all 270 partitions, things like number of patritions, block sizes and the like. 271 Following this small header are a series of entries. Importantly there 272 is always an entry which encompases the TOC its self, this is usually 273 called the 'part' partition. 274 275 Currently libffs always assumes that the 'part' partition is at zero. 276 While there is always a TOC and zero there doesn't actually have to be. 277 PNORs may have multiple TOCs within them, therefore libffs needs to be 278 flexible enough to allow callers to specify TOCs not at zero. 279 280 The 'part' partition is otherwise a regular partition which may have 281 flags associated with it. libffs should allow the user to set the flags 282 for the 'part' partition. 283 284 This patch achieves both by allowing the caller to specify the 'part' 285 partition. The caller can not and libffs will provide a sensible 286 default. 287- libflash/libffs: Refcount ffs entries 288 289 Currently consumers can add an new ffs entry to multiple headers, this 290 is fine but freeing any of the headers will cause the entry to be freed, 291 this causes double free problems. 292 293 Even if only one header is uses, the consumer of the library still has a 294 reference to the entry, which they may well reuse at some other point. 295 296 libffs will now refcount entries and only free when there are no more 297 references. 298 299 This patch also removes the pointless return value of ffs_hdr_free() 300- libflash/libffs: Switch to storing header entries in an array 301 302 Since the libffs no longer needs to sort the entries as they get added 303 it makes little sense to have the complexity of a linked list when an 304 array will suffice. 305- libflash/libffs: Remove backup partition from TOC generation code 306 307 It turns out this code was messy and not all that reliable. Doing it at 308 the library level adds complexity to the library and restrictions to the 309 caller. 310 311 A simpler approach can be achived with the just instantiating multiple 312 ffs_header structures pointing to different parts of the same file. 313- libflash/libffs: Remove the 'sides' from the FFS TOC generation code 314 315 It turns out this code was messy and not all that reliable. Doing it at 316 the library level adds complexity to the library and restrictions to the 317 caller. 318 319 A simpler approach can be achived with the just instantiating multiple 320 ffs_header structures pointing to different parts of the same file. 321- libflash/libffs: Always add entries to the end of the TOC 322 323 It turns out that sorted order isn't the best idea. This removes 324 flexibility from the caller. If the user wants their partitions in 325 sorted order, they should insert them in sorted order. 326- external/ffspart: Remove side, order and backup options 327 328 These options are currently flakey in libflash/libffs so there isn't 329 much point to being able to use them in ffspart. 330 331 Future reworks planned for libflash/libffs will render these options 332 redundant anyway. 333- libflash/libffs: ffs_close() should use ffs_hdr_free() 334- libflash/libffs: Add setter for a partitions actual size 335- pflash: Use ffs_entry_user_to_string() to standardise flag strings 336- libffs: Standardise ffs partition flags 337 338 It seems we've developed a character respresentation for ffs partition 339 flags. Currently only pflash really prints them so it hasn't been a 340 problem but now ffspart wants to read them in from user input. 341 342 It is important that what libffs reads and what pflash prints remain 343 consistent, we should move the code into libffs to avoid problems. 344- external/ffspart: Allow # comments in input file\ 345 346p9dsu Platform changes 347---------------------- 348 349The p9dsu platform from SuperMicro (also known as 'Boston') has received 350a number of updates, and the patches once carried by SuperMicro are now 351upstream. 352 353- p9dsu: detect p9dsu variant even when hostboot doesn't tell us 354 355 The SuperMicro BMC can tell us what riser type we have, which dictates 356 the PCI slot tables. Usually, in an environment that a customer would 357 experience, Hostboot will do the query with an SMC specific patch 358 (not upstream as there's no platform specific code in hostboot) 359 and skiboot knows what variant it is based on the compatible string. 360 361 However, if you're using upstream hostboot, you only get the bare 362 'p9dsu' compatible type. We can work around this by asking the BMC 363 ourselves and setting the slot table appropriately. We do this 364 syncronously in platform init so that we don't start probing 365 PCI before we setup the slot table. 366- p9dsu: add slot power limit. 367- p9dsu: add pci slot table for Boston LC 1U/2U and Boston LA/ESS. 368- p9dsu HACK: fix system-vpd eeprom 369- p9dsu: change esel command from AMI to IBM 0x3a. 370 371ZZ Platform Changes 372------------------- 373 374- hdata/i2c: Fix up pci hotplug labels 375 376 These labels are used on the devices used to do PCIe slot power control 377 for implementing PCIe hotplug. I'm not sure how they ended up as 378 "eeprom-pgood" and "eeprom-controller" since that doesn't make any sense. 379- hdata/i2c: Ignore multi-port I2C devices 380 381 Recent FSP firmware builds add support for multi-port I2C devices such 382 as the GPIO expanders used for the presence detect of OpenCAPI devices 383 and the PCIe hotplug controllers used to power cycle PCIe slots on ZZ. 384 385 The OpenCAPI driver inside of skiboot currently uses a platform-specific 386 method to talk to the relevant I2C device rather than relying on HDAT 387 since not all platforms correctly report the I2C devices (hello Zaius). 388 Additionally the nature of multi-port devices require that we a device 389 specific handler so that we generate the correct DT bindings. Currently 390 we don't and there is no immediate need for this support so just ignore 391 the multi-port devices for now. 392- hdata/i2c: Replace `i2c_` prefix with `dev_` 393 394 The current naming scheme makes it easy to conflate "i2cm_port" and 395 "i2c_port." The latter is used to describe multi-port I2C devices such 396 as GPIO expanders and multi-channel PCIe hotplug controllers. Rename 397 i2c_port to dev_port to make the two a bit more distinct. 398 399 Also rename i2c_addr to dev_addr for consistency. 400- hdata/i2c: Ignore CFAM I2C master 401 402 Recent FSP firmware builds put in information about the CFAM I2C master 403 in addition the to host I2C masters accessible via XSCOM. Odds are this 404 information should not be there since there's no handshaking between the 405 FSP/BMC and the host over who controls that I2C master, but it is so 406 we need to deal with it. 407 408 This patch adds filtering to the HDAT parser so it ignores the CFAM I2C 409 master. Without this it will create a bogus i2cm@<addr> which migh cause 410 issues. 411- ZZ: hw/imc: Add support to load imc catalog lid file 412 413 Add support to load the imc catalog from a lid file packaged 414 as part of the system firmware. Lid number allocated 415 is 0x80f00103.lid. 416 417 418Bugs Fixed 419---------- 420- core: Fix iteration condition to skip garded cpu 421- uart: fix uart_opal_flush to take console lock over uart_con_flush 422 This bug meant that OPAL_CONSOLE_FLUSH didn't take the appropriate locks. 423 Luckily, since this call is only currently used in the crash path. 424- xive: fix missing unlock in error path 425- OPAL_PCI_SET_POWER_STATE: fix locking in error paths 426 427 Otherwise we could exit OPAL holding locks, potentially leading 428 to all sorts of problems later on. 429- hw/slw: Don't assert on a unknown chip 430 431 For some reason skiboot populates nodes in /cpus/ for the cores on 432 chips that are deconfigured. As a result Linux includes the threads 433 of those cores in it's set of possible CPUs in the system and attempts 434 to set the SPR values that should be used when waking a thread from 435 a deep sleep state. 436 437 However, in the case where we have deconfigured chip we don't create 438 a xscom node for that chip and as a result we don't have a proc_chip 439 structure for that chip either. In turn, this results in an assertion 440 failure when calling opal_slw_set_reg() since it expects the chip 441 structure to exist. Fix this up and print an error instead. 442- opal/hmi: Generate one event per core for processor recovery. 443 444 Processor recovery is per core error. All threads on that core receive 445 HMI. All threads don't need to generate HMI event for same error. 446 447 Let thread 0 only generate the event. 448- sensors: Dont add DTS sensors when OCC inband sensors are available 449 450 There are two sets of core temperature sensors today. One is DTS scom 451 based core temperature sensors and the second group is the sensors 452 provided by OCC. DTS is the highest temperature among the different 453 temperature zones in the core while OCC core temperature sensors are 454 the average temperature of the core. DTS sensors are read directly by 455 the host by SCOMing the DTS sensors while OCC sensors are read and 456 updated by OCC to main memory. 457 458 Reading DTS sensors by SCOMing is a heavy and slower operation as 459 compared to reading OCC sensors which is as good as reading memory. 460 So dont add DTS sensors when OCC sensors are available. 461- core/fast-reboot: Increase timeout for dctl sreset to 1sec 462 463 Direct control xscom can take more time to complete. We seem to 464 wait too little on Boston failing fast-reboot for no good reason. 465 466 Increase timeout to 1 sec as a reasonable value for sreset to be delivered 467 and core to start executing instructions. 468- occ: sensors-groups: Add DT properties to mark HWMON sensor groups 469 470 Fix the sensor type to match HWMON sensor types. Add compatible flag 471 to indicate the environmental sensor groups so that operations on 472 these groups can be handled by HWMON linux interface. 473- core: Correctly load initramfs in stb container 474 475 Skiboot does not calculate the actual size and start location of the 476 initramfs if it is wrapped by an STB container (for example if loading 477 an initramfs from the ROOTFS partition). 478 479 Check if the initramfs is in an STB container and determine the size and 480 location correctly in the same manner as the kernel. Since 481 load_initramfs() is called after load_kernel() move the call to 482 trustedboot_exit_boot_services() into load_and_boot_kernel() so it is 483 called after both of these. 484- hdat/i2c.c: quieten "v2 found, parsing as v1" 485- hw/imc: Check for pause_microcode_at_boot() return status 486 487 pause_microcode_at_boot() loops through all the chip's ucode 488 control block and pause the ucode if it is in the running state. 489 But it does not fail if any of the chip's ucode is not initialised. 490 491 Add code to return a failure if ucode is not initialized in any 492 of the chip. Since pause_microcode_at_boot() is called just before 493 attaching the IMC device nodes in imc_init(), add code to check for 494 the function return. 495 496 497Slot location code fixes: 498 499- npu2: Use ibm, loc-code rather than ibm, slot-label 500 501 The ibm,slot-label property is to name the slot that appears under a 502 PCIe bridge. In the past we (ab)used the slot tables to attach names 503 to GPU devices and their corresponding NVLinks which resulted in npu2.c 504 using slot-label as a location code rather than as a way to name slots. 505 506 Fix this up since it's confusing. 507- hdata/slots: Apply slot label to the parent slot 508 509 Slot names only really make sense when applied to an actual slot rather 510 than a device. On witherspoon the GPU devices have a name associated with 511 the device rather than the slot for the GPUs. Add a hack that moves the 512 slot label to the parent slot rather than on the device itself. 513- pci-dt-slot: Big ol' cleanup 514 515 The underlying data that we get from HDAT can only really describe a 516 PCIe system. As such we can simplify the devicetree slot lookup code 517 by only caring about the important cases, namly, root ports and switch 518 downstream ports. 519 520 This also fixes a bug where root port didn't get a Slot label applied 521 which results in devices under that port not having ibm,loc-code set. 522 This results in the EEH core being unable to report the location of 523 EEHed devices under that port. 524 525opal-prd 526^^^^^^^^ 527- opal-prd: Insert powernv_flash module 528 529 Explictly load powernv_flash module on BMC based system so that we are sure 530 that flash device is created before starting opal-prd daemon. 531 532 Note that I have replaced pnor_available() check with is_fsp_system(). As we 533 want to load module on BMC system only. Also pnor_init has enough logic to 534 detect flash device. Hence pnor_available() becomes redundant check. 535 536NPU2/NVLINK2 537^^^^^^^^^^^^ 538- npu2/hw-procedures: fence bricks on GPU reset 539 540 The NPU workbook defines a way of fencing a brick and 541 getting the brick out of fence state. We do have an implementation 542 of bringing the brick out of fenced/quiesced state. We do 543 the latter in our procedures, but to support run time reset 544 we need to do the former. 545 546 The fencing ensures that access to memory behind the links 547 will not lead to HMI's, but instead SUE's will be populated 548 in cache (in the case of speculation). The expectation is then 549 that prior to and after reset, the operating system components 550 will flush the cache for the region of memory behind the GPU. 551 552 This patch does the following: 553 554 1. Implements a npu2_dev_fence_brick() function to set/clear 555 fence state 556 2. Clear FIR bits prior to clearing the fence status 557 3. Clear's the fence status 558 4. We take the powerbus out of CQ fence much later now, 559 in credits_check() which is the last hardware procedure 560 called after link training. 561- hw/npu2.c: Remove static configuration of NPU2 register 562 563 The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to 564 select NVLink mode, however Hostboot should configure other bits in this 565 register. 566 567 For some reason Skiboot was explicitly clearing bit-6 568 (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared 569 as recent Hostboot versions explicitly set it to the correct value based on 570 the specific system configuration. Therefore Skiboot should not alter it. 571 572 Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or 573 not. Hostboot does not configure this bit so Skiboot should continue to 574 configure it. 575- npu2: Improve log output of GPU-to-link mapping 576 577 Debugging issues related to unconnected NVLinks can be a little less 578 irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog(). 579 580 In short, change this: :: 581 582 NPU2: comparing GPU 'GPU2' and NPU2 'GPU1' 583 NPU2: comparing GPU 'GPU3' and NPU2 'GPU1' 584 NPU2: comparing GPU 'GPU4' and NPU2 'GPU1' 585 NPU2: comparing GPU 'GPU5' and NPU2 'GPU1' 586 : 587 npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem. 588 589 to this: :: 590 591 NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1' 592 NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1' 593 NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1' 594 NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1' 595 : 596 NPU6:0:1.0 No PCI device found for slot 'GPU1' 597- npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context init 598 599 A bad GPU or other condition may leave us with a subset of links that 600 never get initialized. If an ATSD is sent to one of those bricks, it 601 will never complete, leaving us waiting forever for a response: :: 602 603 watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050] 604 ... 605 Modules linked in: nvidia_uvm(O) nvidia(O) 606 CPU: 23 PID: 2050 Comm: acos Tainted: G W O 4.14.0 #2 607 task: c0000000285cfc00 task.stack: c000001fea860000 608 NIP: c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60 609 REGS: c000001fea863550 TRAP: 0901 Tainted: G W O (4.14.0) 610 MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004484 XER: 20040000 611 CFAR: c0000000000abdf4 SOFTE: 1 612 GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820 613 GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560 614 GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff 615 GPR12: 0000000000008000 c000000003167e80 616 NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0 617 LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370 618 619 620 ATSDs are only sent to bricks which have a valid entry in the XTS_BDF 621 table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless 622 we make it all the way to creating a context for the BDF. 623 624Secure and Trusted Boot 625^^^^^^^^^^^^^^^^^^^^^^^ 626- hdata/tpmrel: detect tpm not present by looking up the stinfo->status 627 628 Skiboot detects if tpm is present by checking if a secureboot_tpm_info 629 entry exists. However, if a tpm is not present, hostboot also creates a 630 secureboot_tpm_info entry. In this case, hostboot creates an empty 631 entry, but setting the field tpm_status to TPM_NOT_PRESENT. 632 633 This detects if tpm is not present by looking up the stinfo->status. 634 635 This fixes the "TPMREL: TPM node not found for chip_id=0 (HB bug)" 636 issue, reproduced when skiboot is running on a system that has no tpm. 637 638PCI 639^^^ 640- phb4: Restore bus numbers after CRS 641 642 Currently we restore PCIe bus numbers right after the link is 643 up. Unfortunately as this point we haven't done CRS so config space 644 may not be accessible. 645 646 This moves the bus number restore till after CRS has happened. 647- romulus: Add a barebones slot table 648- phb4: Quieten and improve "Timeout waiting for electrical link" 649 650 This happens normally if a slot doesn't have a working HW presence 651 detect and relies instead of inband presence detect. 652 653 The message we display is scary and not very useful unless ou 654 are debugging, so quiten it up and change it to something more 655 meaningful. 656- pcie-slot: Don't fail powering on an already on switch 657 658 If the power state is already the required value, return 659 OPAL_SUCCESS rather than OPAL_PARAMETER to avoid spurrious 660 errors during boot. 661 662CAPI/OpenCAPI 663^^^^^^^^^^^^^ 664- capi: Keep the current mmio windows in the mbt cache table. 665 666 When the phb is used as a CAPI interface, the current mmio windows list 667 is cleaned before adding the capi and the prefetchable memory (M64) 668 windows, which implies that the non-prefetchable BAR is no more 669 configured. 670 This patch allows to set only the mbt bar to pass capi mmio window and 671 to keep, as defined, the other mmio values (M32 and M64). 672- npu2-opencapi: Fix 'link internal error' FIR, take 2 673 674 When setting up an opencapi link, we set the transport muxes first, 675 then set the PHY training config register, which includes disabling 676 nvlink mode for the bricks. That's the order of the init sequence, as 677 found in the NPU workbook. 678 679 In reality, doing so works, but it raises 2 FIR bits in the PowerBus 680 OLL FIR Register for the 2 links when we configure the transport 681 muxes. Presumably because nvlink is not disabled yet and we are 682 configuring the transport muxes for opencapi. 683 684 bit 60: 685 link0 internal error 686 bit 61: 687 link1 internal error 688 689 Overall the current setup ends up being correct and everything works, 690 but we raise 2 FIR bits. 691 692 So tweak the order of operations to disable nvlink before configuring 693 the transport muxes. Incidentally, this is what the scripts from the 694 opencapi enablement team were doing all along. 695- npu2-opencapi: Fix 'link internal error' FIR, take 1 696 697 When we setup a link, we always enable ODL0 and ODL1 at the same time 698 in the PHY training config register, even though we are setting up 699 only one OTL/ODL, so it raises a "link internal error" FIR bit in the 700 PowerBus OLL FIR Register for the second link. The error is harmless, 701 as we'll eventually setup the second link, but there's no reason to 702 raise that FIR bit. 703 704 The fix is simply to only enable the ODL we are using for the link. 705- phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode. 706 707 The cxl driver will set the capi value, like other drivers already do. 708- phb4: set TVT1 for tunneled operations in capi mode 709 710 The ASN indication is used for tunneled operations (as_notify and 711 atomics). Tunneled operation messages can be sent in PCI mode as 712 well as CAPI mode. 713 714 The address field of as_notify messages is hijacked to encode the 715 LPID/PID/TID of the target thread, so those messages should not go 716 through address translation. Therefore bit 59 is part of the ASN 717 indication. 718 719 This patch sets TVT#1 in bypass mode when capi mode is enabled, 720 to prevent as_notify messages from being dropped. 721 722Debugging/Testing improvements 723------------------------------ 724- core/stack: backtrace unwind basic OPAL call details 725 726 Put OPAL callers' r1 into the stack back chain, and then use that to 727 unwind back to the OPAL entry frame (as opposed to boot entry, which 728 has a 0 back chain). 729 730 From there, dump the OPAL call token and the caller's r1. A backtrace 731 looks like this: :: 732 733 CPU 0000 Backtrace: 734 S: 0000000031c03ba0 R: 000000003001a548 ._abort+0x4c 735 S: 0000000031c03c20 R: 000000003001baac .opal_run_pollers+0x3c 736 S: 0000000031c03ca0 R: 000000003001bcbc .opal_poll_events+0xc4 737 S: 0000000031c03d20 R: 00000000300051dc opal_entry+0x12c 738 --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 --- 739 740 This is pretty basic for the moment, but it does give you the bottom 741 of the Linux stack. It will allow some interesting improvements in 742 future. 743 744 First, with the eframe, all the call's parameters can be printed out 745 as well. The ___backtrace / ___print_backtrace API needs to be 746 reworked in order to support this, but it's otherwise very simple 747 (see opal_trace_entry()). 748 749 Second, it will allow Linux's stack to be passed back to Linux via 750 a debugging opal call. This will allow Linux's BUG() or xmon to 751 also print the Linux back trace in case of a NMI or MCE or watchdog 752 lockup that hits in OPAL. 753- asm/head: implement quiescing without stack or clobbering regs 754 755 Quiescing currently is implmeented in C in opal_entry before the 756 opal call handler is called. This works well enough for simple 757 cases like fast reset when one CPU wants all others out of the way. 758 759 Linux would like to use it to prevent an sreset IPI from 760 interrupting firmware, which could lead to deadlocks when crash 761 dumping or entering the debugger. Linux interrupts do not recover 762 well when returning back to general OPAL code, due to r13 not being 763 restored. OPAL also can't be re-entered, which may happen e.g., 764 from the debugger. 765 766 So move the quiesce hold/reject to entry code, beore the stack or 767 r1 or r13 registers are switched. OPAL can be interrupted and 768 returned to or re-entered during this period. 769 770 This does not completely solve all such problems. OPAL will be 771 interrupted with sreset if the quiesce times out, and it can be 772 interrupted by MCEs as well. These still have the issues above. 773- core/opal: Allow poller re-entry if OPAL was re-entered 774 775 If an NMI interrupts the middle of running pollers and the OS 776 invokes pollers again (e.g., for console output), the poller 777 re-entrancy check will prevent it from running and spam the 778 console. 779 780 That check was designed to catch a poller calling opal_run_pollers, 781 OPAL re-entrancy is something different and is detected elsewhere. 782 Avoid the poller recursion check if OPAL has been re-entered. This 783 is a best-effort attempt to cope with errors. 784- core/opal: Emergency stack for re-entry 785 786 This detects OPAL being re-entered by the OS, and switches to an 787 emergency stack if it was. This protects the firmware's main stack 788 from re-entrancy and allows the OS to use NMI facilities for crash 789 / debug functionality. 790 791 Further nested re-entry will destroy the previous emergency stack 792 and prevent returning, but those should be rare cases. 793 794 This stack is sized at 16kB, which doubles the size of CPU stacks, 795 so as not to introduce a regression in primary stack size. The 16kB 796 stack originally had a 4kB machine check stack at the top, which was 797 removed by 80eee1946 ("opal: Remove machine check interrupt patching 798 in OPAL."). So it is possible the size could be tightened again, but 799 that would require further analysis. 800 801- hdat_to_dt: hash_prop the same on all platforms 802 Fixes this unit test on ppc64le hosts. 803- mambo: Add persistent memory disk support 804 805 This adds support to for mapping disks images using persistent 806 memory. Disks can be added by setting this ENV variable: 807 808 PMEM_DISK="/mydisks/disk1.img,/mydisks/disk2.img" 809 810 These will show up in Linux as /dev/pmem0 and /dev/pmem1. 811 812 This uses a new feature in mambo "mysim memory mmap .." which is only 813 available since mambo commit 0131f0fc08 (from 24/4/2018). 814 815 This also needs the of_pmem.c driver in Linux which is only available 816 since v4.17. It works with powernv_defconfig + CONFIG_OF_PMEM. 817- external/mambo: Add di command to decode instructions 818 819 By default you get 16 instructions but you can specify the number you 820 want. i.e. :: 821 822 systemsim % di 0x100 4 823 0x0000000000000100: Enc:0xA64BB17D : mtspr HSPRG1,r13 824 0x0000000000000104: Enc:0xA64AB07D : mfspr r13,HSPRG0 825 0x0000000000000108: Enc:0xF0092DF9 : std r9,0x9F0(r13) 826 0x000000000000010C: Enc:0xA6E2207D : mfspr r9,PPR 827 828 Using di since it's what xmon uses. 829- mambo/mambo_utils.tcl: Inject an MCE at a specified address 830 831 Currently we don't support injecting an MCE on a specific address. 832 This is useful for testing functionality like memcpy_mcsafe() 833 (see https://patchwork.ozlabs.org/cover/893339/) 834 835 The core of the functionality is a routine called 836 inject_mce_ue_on_addr, which takes an addr argument and injects 837 an MCE (load/store with UE) when the specified address is accessed 838 by code. This functionality can easily be enhanced to cover 839 instruction UE's as well. 840 841 A sample use case to create an MCE on stack access would be :: 842 843 set addr [mysim display gpr 1] 844 inject_mce_ue_on_addr $addr 845 846 This would cause an mce on any r1 or r1 based access 847- external/mambo: improve helper for machine checks 848 849 Improve workarounds for stop injection, because mambo often will 850 trigger on 0x104/204 when injecting sreset/mces. 851 852 This also adds a workaround to skip injecting on reservations to 853 avoid infinite loops when doing inject_mce_step. 854- travis: Enable ppc64le builds 855 856 At least on the IBM Travis Enterprise instance, we can now do 857 ppc64le builds! 858 859 We can only build a subset of our matrix due to availability of 860 ppc64le distros. The Dockerfiles need some tweaking to only 861 attempt to install (x86_64 only) Mambo binaries, as well as the 862 build scripts. 863- external: Add "lpc" tool 864 865 This is a little front-end to the lpc debugfs files to access 866 the LPC bus from userspace on the host. 867- core/test/run-trace: fix on ppc64el 868 869 870