1.. _skiboot-6.1: 2 3skiboot-6.1 4=========== 5 6skiboot v6.1 was released on Wednesday July 11th 2018. It is the first 7release of skiboot 6.1, which is the new stable release of skiboot 8following the 6.0 release, first released May 11th 2018. 9 10Skiboot 6.1 is the basis for op-build v2.1 and contains all bug fixes as 11of :ref:`skiboot-6.0.5`, and :ref:`skiboot-5.4.9` (the currently maintained 12stable releases). We expect further stable releases in the 6.0.x and 5.4.x 13series, while we do not expect to do any stable releases of 6.1.x. 14 15This final 6.1 release follows a single release candidate release, as this 16cycle we have been rather quiet, with mainly cleanup and bug fix patches 17going in. 18 19For how the skiboot stable releases work, see :ref:`stable-rules` for details. 20 21Over skiboot-6.0, we have the following changes: 22 23General changes and bug fixes 24----------------------------- 25 26Since :ref:`skiboot-6.1-rc1`: 27 28- slw: Fix trivial typo in debug message 29- vpd: Add vendor property to processor node 30 31 Processor FRU vpd doesn't contain vendor detail. We have to parse 32 module VPD to get vendor detail. 33 34- vpd: Sanitize VPD data 35 36 On OpenPower system, VPD keyword size tells us the maximum size of the data. 37 But they fill trailing end with space (0x20) instead of NULL. Also spec 38 doesn't stop user to have space (0x20) within actual data. 39 40 This patch discards trailing spaces before populating device tree. 41- core: always flush console before stopping 42 43 This catches a few cases (e.g., fast reboot failure messages) that 44 don't always make it to the console before the machine is rebooted. 45- core/cpu: parallelise global CPU register setting jobs 46 47 On a 176 thread system, before: :: 48 49 [ 122.319923233,5] OPAL: Switch to big-endian OS 50 [ 126.317897467,5] OPAL: Switch to little-endian OS 51 52 after: :: 53 54 [ 212.439299889,5] OPAL: Switch to big-endian OS 55 [ 212.469323643,5] OPAL: Switch to little-endian OS 56- init, occ: Initialise OCC earlier on BMC systems 57 58 We need to use the OCC to obtain presence data for the SXM2 slots on 59 Witherspoon systems. This is needed to determine device type for NVLink 60 GPUs and OpenCAPI devices which can be plugged into the same slot. Support 61 for this will be implemented in a future patch. 62 63 Currently, OCC initialisation is done just before handing over to Linux, 64 which is well after NPU probe. On FSP systems, OCC boot starts very late, 65 so we wait until the last possible moment to initialise the skiboot side in 66 order to give it the maximum time to boot. On BMC systems, OCC boot starts 67 earlier, so there aren't any issues in moving it earlier in the skiboot 68 init sequence. 69 70 When running on a BMC machine, call occ_pstates_init() as early as 71 possible in the init sequence. On FSP machines, continue to call it from 72 its current location. 73 74Since :ref:`skiboot-6.0`: 75 76- GCC8 build fixes 77- Add prepare_hbrt_update to hbrt interfaces 78 79 Add placeholder support for prepare_hbrt_update call into 80 hostboot runtime (opal-prd) code. This interface is only 81 called as part of a concurrent code update on a FSP based 82 system. 83- cpu: Clear PCR SPR in opal_reinit_cpus() 84 85 Currently if Linux boots with a non-zero PCR, things can go bad where 86 some early userspace programs can take illegal instructions. This is 87 being fixed in Linux, but in the mean time, we should cleanup in 88 skiboot also. 89- pci: Fix PCI_DEVICE_ID() 90 91 The vendor ID is 16 bits not 8. This error leaves the top of the vendor 92 ID in the bottom bits of the device ID, which resulted in e.g. a failure 93 to run the PCI quirk for the AST VGA device. 94- Quieten console output on boot 95 96 We print out a whole bunch of things on boot, most of which aren't 97 interesting, so we should *not* print them instead. 98 99 Printing things like what CPUs we found and what PCI devices we found 100 *are* useful, so continue to do that. But we don't need to splat out 101 a bunch of things that are always going to be true. 102- core/console: fix deadlock when printing with console lock held 103 104 Some debugging options will print while the console lock is held, 105 which is why the console lock is taken as a recursive lock. 106 However console_write calls __flush_console, which will drop and 107 re-take the lock non-recursively in some cases. 108 109 Just set con_need_flush and return from __flush_console if we are 110 holding the console lock already. 111 112 This stack usage message (taken with this patch applied) could lead 113 to a deadlock without this: :: 114 115 CPU 0000 lowest stack mark 11768 bytes left pc=300cb808 token=0 116 CPU 0000 Backtrace: 117 S: 0000000031c03370 R: 00000000300cb808 .list_check_node+0x1c 118 S: 0000000031c03410 R: 00000000300cb910 .list_check+0x38 119 S: 0000000031c034b0 R: 00000000300190ac .try_lock_caller+0xb8 120 S: 0000000031c03540 R: 00000000300192e0 .lock_caller+0x80 121 S: 0000000031c03600 R: 0000000030012c70 .__flush_console+0x134 122 S: 0000000031c036d0 R: 00000000300130cc .console_write+0x68 123 S: 0000000031c03780 R: 00000000300347bc .vprlog+0xc8 124 S: 0000000031c03970 R: 0000000030034844 ._prlog+0x50 125 S: 0000000031c03a00 R: 00000000300364a4 .log_simple_error+0x74 126 S: 0000000031c03b90 R: 000000003004ab48 .occ_pstates_init+0x184 127 S: 0000000031c03d50 R: 000000003001480c .load_and_boot_kernel+0x38c 128 S: 0000000031c03e30 R: 000000003001571c .main_cpu_entry+0x62c 129 S: 0000000031c03f00 R: 0000000030002700 boot_entry+0x1c0 130- opal-prd: Do not error out on first failure for soft/hard offline. 131 132 The memory errors (CEs and UEs) that are detected as part of background 133 memory scrubbing are reported by PRD asynchronously to opal-prd along with 134 affected memory ranges. hservice_memory_error() converts these ranges into 135 page granularity before hooking up them to soft/hard offline-ing 136 infrastructure. 137 138 But the current implementation of hservice_memory_error() does not hookup 139 all the pages to soft/hard offline-ing if any of the page offline action 140 fails. e.g hard offline can fail for: 141 142 - Pages that are not part of buddy managed pool. 143 - Pages that are reserved by kernel using memblock_reserved() 144 - Pages that are in use by kernel. 145 146 But for the pages that are in use by user space application, the hard 147 offline marks the page as hwpoison, sends SIGBUS signal to kill the 148 affected application as recovery action and returns success. 149 150 Hence, It is possible that some of the pages in that memory range are in 151 use by application or free. By stopping on first error we loose the 152 opportunity to hwpoison the subsequent pages which may be free or in use by 153 application. This patch fixes this issue. 154- libflash/blocklevel_write: Fix missing error handling 155 156 Caught by scan-build, we seem to trap the errors in rc, but 157 not take any recovery action during blocklevel_write. 158 159I2C 160^^^ 161- p8-i2c: fix wrong request status when a reset is needed 162 163 If the bus is found in error state when starting a new request, the 164 engine is reset and we enter recovery. However, once complete, the 165 reset operation shows a status of complete in the status register. So 166 any badly-timed called to check_status() will think the current top 167 request is complete, even though it hasn't run yet. 168 169 So don't update any request status while we are in recovery, as 170 nothing useful for the request is supposed to happen in that state. 171- p8-i2c: Remove force reset 172 173 Force reset was added as an attempt to work around some issues with TPM 174 devices locking up their I2C bus. In that particular case the problem 175 was that the device would hold the SCL line down permanently due to a 176 device firmware bug. The force reset doesn't actually do anything to 177 alleviate the situation here, it just happens to reset the internal 178 master state enough to make the I2C driver appear to work until 179 something tries to access the bus again. 180 181 On P9 systems with secure boot enabled there is the added problem 182 of the "diagostic mode" not being supported on I2C masters A,B,C and 183 D. Diagnostic mode allows the SCL and SDA lines to be driven directly 184 by software. Without this force reset is impossible to implement. 185 186 This patch removes the force reset functionality entirely since: 187 188 a) it doesn't do what it's supposed to, and 189 b) it's butt ugly code 190 191 Additionally, turn p8_i2c_reset_engine() into p8_i2c_reset_port(). 192 There's no need to reset every port on a master in response to an 193 error that occurred on a specific port. 194- libstb/i2c-driver: Bump max timeout 195 196 We have observed some TPMs clock streching the I2C bus for signifigant 197 amounts of time when processing commands. The same TPMs also have 198 errata that can result in permernantly locking up a bus in response to 199 an I2C transaction they don't understand. Using an excessively long 200 timeout to prevent this in the field. 201- hdata: Add TPM timeout workaround 202 203 Set the default timeout for any bus containing a TPM to one second. This 204 is needed to work around a bug in the firmware of certain TPMs that will 205 clock strech the I2C port the for up to a second. Additionally, when the 206 TPM is clock streching it responds to a STOP condition on the bus by 207 bricking itself. Clearing this error requires a hard power cycle of the 208 system since the TPM is powered by standby power. 209- p8-i2c: Allow a per-port default timeout 210 211 Add support for setting a default timeout for the I2C port to the 212 device-tree. This is consumed by skiboot. 213 214IPMI Watchdog 215^^^^^^^^^^^^^ 216- ipmi-watchdog: Support handling re-initialization 217 218 Watchdog resets can return an error code from the BMC indicating that 219 the BMC watchdog was not initialized. Currently we abort skiboot due to 220 a missing error handler. This patch implements handling 221 re-initialization for the watchdog, automatically saving the last 222 watchdog set values and re-issuing them if needed. 223- ipmi-watchdog: The stop action should disable reset 224 225 Otherwise it is possible for the reset timer to elapse and trigger the 226 watchdog to wake back up. This doesn't affect the behavior of the 227 system since we are providing a NONE action to the BMC. However we would 228 like to avoid the action from taking place if possible. 229- ipmi-watchdog: Add a flag to determine if we are still ticking 230 231 This makes it easier for future changes to ensure that the watchdog 232 stops ticking and doesn't requeue itself for execution in the 233 background. This way it is safe for resets to be performed after the 234 ticks are assumed to be stopped and it won't start the timer again. 235- ipmi-watchdog: (prepare for) not disabling at shutdown 236 237 The op-build linux kernel has been configured to support the ipmi 238 watchdog. This driver will always handle the watchdog by either leaving 239 it enabled if configured, or by disabling it during module load if no 240 configuration is provided. This increases the coverage of the watchdog 241 during the boot process. The watchdog should no longer be disabled at 242 any point during skiboot execution. 243 244 We're not enabling this by default yet as people can (and do, at least in 245 development) mix and match old BOOTKERNEL with new skiboot and we don't 246 want to break that too obviously. 247- ipmi-watchdog: Don't reset the watchdog twice 248 249 There is no clarification for why this change was needed, but presumably 250 this is due to a buggy BMC implementation where the Watchdog Set command 251 was processed concurrently or after the initial Watchdog Reset. This 252 inversion would cause the watchdog to stop since the DONT_STOP bit was 253 not set. Since we are now using the DONT_STOP bit during initialization, 254 the watchdog should not be stopped even if an inversion occurs. 255- ipmi-watchdog: Make it possible to set DONT_STOP 256 257 The IPMI standard supports setting a DONT_STOP bit during an Watchdog 258 Set operation. Most of the time we don't want to stop the Watchdog when 259 updating the settings so we should be using this bit. This patch makes 260 it possible for callers of set_wdt to prevent the watchdog from being 261 stopped. This only changes the behavior of the watchdog during the 262 initial settings update when initializing skiboot. The watchdog is no 263 longer disabled and then immediately re-enabled. 264- ipmi-watchdog: WD_POWER_CYCLE_ACTION -> WD_RESET_ACTION 265 266 The IPMI specification denotes that action 0x1 is Host Reset and 0x3 is 267 Host Power Cycle. Use the correct name for Reset in our watchdog code. 268 269 270POWER8 platforms 271---------------- 272 273- astbmc: Enable mbox depending on scratch reg 274 275 P8 boxes can opt in for mbox pnor support if they set the scratch 276 register bit to indicate it is supported. 277 278Simulator platforms 279------------------- 280 281Since :ref:`skiboot-6.1-rc1`: 282 283- pmem: volatile bindings for the poorly enabled 284 285 PMEM_DISK bindings were added, but they rely on a rather 286 recent mmap feature. This patch steals from those bindings 287 to add volatile bindings. I've used these bindings with 288 PMEM_VOLATILE to launch an instance with the publicly 289 available systemsim-p9. The bindings are volatile and one 290 should not expect any data to be saved/retrieved. 291 292Since :ref:`skiboot-6.0`: 293 294- plat/qemu: add PNOR support 295 296 To access the PNOR, OPAL/skiboot drives the BMC SPI controller using 297 the iLPC2AHB device of the BMC SuperIO controller and accesses the 298 flash contents using the LPC FW address space on which the PNOR is 299 remapped. 300 301 The QEMU PowerNV machine now integrates such models (SuperIO 302 controller, iLPC2AHB device) and also a pseudo Aspeed SoC AHB memory 303 space populated with the SPI controller registers (same model as for 304 ARM). The AHB window giving access to the contents of the BMC SPI 305 controller flash modules is mapped on the LPC FW address space. 306 307 The change should be compatible for machine without PNOR support. 308- external/mambo: Add support for readline if it exists 309 310 Add support for tclreadline package if it is present. 311 This patch loads the package and uses it when the 312 simulation stops for any reason. 313 314 315FSP based platforms 316------------------- 317 318- Disable fast reboot on FSP IPL side change 319 320 If FSP changes next IPL side, then disable fast reboot. 321 322 sample output: :: 323 324 [ 620.196442259,5] FSP: Got sysparam update, param ID 0xf0000007 325 [ 620.196444501,5] CUPD: FW IPL side changed. Disable fast reboot 326 [ 620.196445389,5] CUPD: Next IPL side : perm 327- fsp/console: Always establish OPAL console API backend 328 329 Currently we only call set_opal_console() to establish the backend 330 used by the OPAL console API if we find at least one FSP serial 331 port in HDAT. 332 333 On systems where there is none (IPMI only), we fail to set it, 334 causing the console code to try to use the dummy console causing 335 an assertion failure during boot due to clashing on the device-tree 336 node names. 337 338 So always set it if an FSP is present 339 340AST BMC based platforms 341----------------------- 342 343- AMI BMC: use 0x3a as OEM command 344 345 The 0x3a OEM command is for IBM commands, while 0x32 was for AMI ones. 346 Sometime in the P8 timeframe, AMI BMCs were changed to listen for our 347 commands on either 0x32 or 0x3a. Since 0x3a is the direction forward, 348 we'll use that, as P9 machines with AMI BMCs probably also want these 349 to work, and let's not bet that 0x32 will continue to be okay. 350- astbmc: Set romulus BMC type to OpenBMC 351- platform/astbmc: Do not delete compatible property 352 353 P9 onwards OPAL is building device tree for BMC based system using 354 HDAT. We are populating bmc/compatible node with bmc version. Hence 355 do not delete this property. 356 357Utilities 358--------- 359- external/xscom-utils: Add python library for xscom access 360 361 Patch adds a simple python library module for xscom access. 362 It directly manipulate the '/access' file for scom read 363 and write from debugfs 'scom' directory. 364 365 Example on how to generate a getscom using this module: 366 367 .. code-block:: python 368 369 from adu_scoms import * 370 getscom = GetSCom() 371 getscom.parse_args() 372 getscom.run_command() 373 374 Sample output for above getscom.py: 375 376 .. code-block:: console 377 378 # ./getscom.py -l 379 Chip ID | Rev | Chip type 380 ---------|-------|----------- 381 00000008 | DD2.0 | P9 (Nimbus) processor 382 00000000 | DD2.0 | P9 (Nimbus) processor 383- ffspart: Don't require user to create blank partitions manually 384 385 Add '--allow-empty' which allows the filename for a given partition to 386 be blank. If set ffspart will set that part of the PNOR file 'blank' and 387 set ECC bits if required. 388 Without this option behaviour is unchanged and ffspart will return an 389 error if it can not find the partition file. 390- pflash: Use correct prefix when installing 391 392 pflash uses lowercase prefix when running make install in it's 393 direcetory, but uppercase PREFIX when running it in shared. Use 394 lowercase everywhere. 395 396 With this the OpenBMC bitbake recipie can drop an out of tree patch it's 397 been carrying for years. 398 399 400POWER9 401------ 402 403Since :ref:`skiboot-6.1-rc1`: 404 405- occ: sensors: Fix the size of the phandle array 'sensors' in DT 406 407 Fixes: 99505c03f493 (present in v5.10-rc4) 408- phb4: Delay training till after PERST is deasserted 409 410 This helps some cards train on the second PERST (ie fast-reboot). The 411 reason is not clear why but it helps, so YOLO! 412 413Since :ref:`skiboot-6.0`: 414 415- occ-sensor: Avoid using uninitialised struct cpu_thread 416 417 When adding the sensors in occ_sensors_init, if the type is not 418 OCC_SENSOR_LOC_CORE, then the loop to find 'c' will not be executed. 419 Then c->pir is used for both of the the add_sensor_node calls below. 420 421 This provides a default value of 0 instead. 422- NX: Add NX coprocessor init opal call 423 424 The read offset (4:11) in Receive FIFO control register is incremented 425 by FIFO size whenever CRB read by NX. But the index in RxFIFO has to 426 match with the corresponding entry in FIFO maintained by VAS in kernel. 427 VAS entry is reset to 0 when opening the receive window during driver 428 initialization. So when NX842 is reloaded or in kexec boot, possibility 429 of mismatch between RxFIFO control register and VAS entries in kernel. 430 It could cause CRB failure / timeout from NX. 431 432 This patch adds nx_coproc_init opal call for kernel to initialize 433 readOffset (4:11) and Queued (15:23) in RxFIFO control register. 434- SLW: Remove stop1_lite and stop2_lite 435 436 stop1_lite has been removed since it adds no additional benefit 437 over stop0_lite. stop2_lite has been removed since currently it adds 438 minimal benefit over stop2. However, the benefit is eclipsed by the time 439 required to ungate the clocks 440 441 Moreover, Lite states don't give up the SMT resources, can potentially 442 have a performance impact on sibling threads. 443 444 Since current OSs (Linux) aren't smart enough to make good decisions 445 with these stop states, we're (temporarly) removing them from what 446 we expose to the OS, the idea being to bring them back in a new 447 DT representation so that only an OS that knows what to do will 448 do things with them. 449- cpu: Use STOP1 on POWER9 for idle/sleep inside OPAL 450 451 The current code requests STOP3, which means it gets STOP2 in practice. 452 453 STOP2 has proven to occasionally be unreliable depending on FW 454 version and chip revision, it also requires a functional CME, 455 so instead, let's use STOP1. The difference is rather minimum 456 for something that is only used a few seconds during boot. 457 458NPU2 (NVLink2 and OpenCAPI) 459^^^^^^^^^^^^^^^^^^^^^^^^^^^ 460 461Since :ref:`skiboot-6.1-rc1`: 462 463- capi: Select the correct IODA table entry for the mbt cache. 464 465 With the current code, the capi mmio window is not correctly configured 466 in the IODA table entry. The first entry (generally the non-prefetchable 467 BAR) is overwrriten. 468 This patch sets the capi window bar at the right place. 469- npu2/hw-procedures: Fence bricks via NTL instead of MISC 470 471 There are a couple of places we can set/unset fence for a brick: 472 473 1. MISC register: NPU2_MISC_FENCE_STATE 474 2. NTL register for the brick: NPU2_NTL_MISC_CFG1(ndev) 475 476 Recent testing of ATS in combination with GPU reset has exposed a side 477 effect of using (1); if fence is set for all six bricks, it triggers a 478 sticky nmmu latch which prevents the NPU from getting ATR responses. 479 This manifests as a hang in the tests. 480 481 We have npu2_dev_fence_brick() which uses (1), and only two calls to it. 482 Replace the call which sets fence with a write to (2). Remove the 483 corresponding unset call entirely. It's unneeded because the procedures 484 already do a progression from full fence to half to idle using (2). 485 486- phb4/capp: Calculate STQ/DMA read engines based on link-width for PEC 487 488 Presently in CAPI mode the number of STQ/DMA-read engines allocated on 489 PEC2 for CAPP is fixed to 6 and 0-30 respectively irrespective of the 490 PCI link width. These values are only suitable for x8 cards and 491 quickly run out if a x16 card is plugged to a PEC2 attached slot. This 492 usually manifests as CAPP reporting TLBI timeout due to these messages 493 getting stalled due to insufficient STQs. 494 495 To fix this we update enable_capi_mode() to check if PEC2 chiplet is 496 in x16 mode and if yes then we allocate 4/0-47 STQ/DMA-read engines 497 for the CAPP traffic. 498 499 Fixes: 37ea3cfdc852 (present in v5.7-rc1) 500- npu2: Use same compatible string for NVLink and OpenCAPI link nodes in device tree 501 502 Currently, we distinguish between NPU links for NVLink devices and OpenCAPI 503 devices through the use of two different compatible strings - ibm,npu-link 504 and ibm,npu-link-opencapi. 505 506 As we move towards supporting configurations with both NVLink and OpenCAPI 507 devices behind a single NPU, we need to detect the device type as part of 508 presence detection, which can't happen until well after the point where the 509 HDAT or platform code has created the NPU device tree nodes. Changing a 510 node's compatible string after it's been created is a bit ugly, so instead 511 we should move the device type to a new property which we can add to the 512 node later on. 513 514 Get rid of the ibm,npu-link-opencapi compatible string, add a new 515 ibm,npu-link-type property, and a helper function to check the link type. 516 Add an "unknown" device type in preparation for later patches to detect 517 device type dynamically. 518 519 These device tree bindings are entirely internal to skiboot and are not 520 consumed directly by Linux, so this shouldn't break anything (other than 521 internal BML lab environments). 522- occ: Add support for GPU presence detection 523 524 On the Witherspoon platform, we need to distinguish between NVLink GPUs and 525 OpenCAPI accelerators. In order to do this, we first need to find out 526 whether the SXM2 socket is populated. 527 528 On Witherspoon, the SXM2 socket's presence detection pin is only visible 529 via I2C from the APSS, and thus can only be exposed to the host via the 530 OCC. The OCC, per OCC Firmware Interface Specification for POWER9 version 531 0.22, now exposes this to skiboot through a field in the dynamic data 532 shared memory. 533 534 Add the necessary dynamic data changes required to read the version and 535 GPU presence fields. Add a function, occ_get_gpu_presence(), that can be 536 used to check GPU presence. 537 538 If the OCC isn't reporting presence (old OCC firmware, or some other 539 reason), we default to assuming there is a device present and wait until 540 link training to fail. 541 542 This will be used in later patches to fix up the NPU2 probe path for 543 OpenCAPI support on Witherspoon. 544- hw/npu2, core/hmi: Use NPU instead of NPU2 as log message prefix 545 546 The NPU2{DBG,INF,ERR} macros use "NPU%d" as a prefix to identify messages 547 relating to a particular NPU. 548 549 It's slightly confusing to have per-NPU messages prefixed with "NPU0" or 550 "NPU1" and NPU-generic messages prefixed with "NPU2". On some future system 551 we could potentially have a NPU #2 in which case it'd be really confusing. 552 553 Use NPU rather than NPU2 for NPU-generic log messages. There's no risk of 554 confusion with the original npu.c code since that's only for P8. 555 556Since :ref:`skiboot-6.0`: 557 558- npu2: Reset NVLinks on hot reset 559 560 This effectively fences GPU RAM on GPU reset so the host system 561 does not have to crash every time we stop a KVM guest with a GPU 562 passed through. 563- npu2-opencapi: reduce number of retries to train the link 564 565 We've been reliably training the opencapi link on the first attempt 566 for quite a while. Furthermore, if it doesn't train on the first 567 attempt, retries haven't been that useful. So let's reduce the number 568 of attempts we do to train the link. 569 570 2 retries = 3 attempts to train. 571 572 Each (failed) training sequence costs about 3 seconds. 573- opal/hmi: Display correct chip id while printing NPU FIRs. 574 575 HMIs for NPU xstops are broadcasted to all chips. All cores on all the 576 chips receive HMI. HMI handler correctly identifies and extracts the 577 NPU FIR details from affected chip, but while printing FIR data it 578 prints chip id and location code details of this_cpu()->chip_id which 579 may not be correct. This patch fixes this issue. 580- npu2-opencapi: Fix link state to report link down 581 582 The PHB callback 'get_link_state' is always reporting the link width, 583 irrespective of the link status and even when the link is down. It is 584 causing too much work (and failures) when the PHB is probed during pci 585 init. 586 The fix is to look at the link status first and report the link as 587 down when appropriate. 588- npu2-opencapi: Cleanup traces printed during link training 589 590 Now that links may train in parallel, traces shown during training can 591 be all mixed up. So add a prefix to all the traces to clearly identify 592 the chip and link the trace refers to: :: 593 594 OCAPI[<chip id>:<link id>]: this is a very useful message 595 596 The lower-level hardware procedures (npu2-hw-procedures.c) also print 597 traces which would need work. But that code is being reworked to be 598 better integrated with opencapi and nvidia, so leave it alone for now. 599- npu2-opencapi: Train links on fundamental reset 600 601 Reorder our link training steps so that they are executed on 602 fundamental reset instead of during the initial setup. Skiboot always 603 call a fundamental reset on all the PHBs during pci init. 604 605 It is done through a state machine, similarly to what is done for 606 'real' PHBs. 607 608 This is the first step for a longer term goal to be able to trigger an 609 adapter reset from linux. We'll need the reset callbacks of the PHB to 610 be defined. We have to handle the various delays differently, since a 611 linux thread shouldn't stay stuck waiting in opal for too long. 612- npu2-opencapi: Rework adapter reset 613 614 Rework a bit the code to reset the opencapi adapter: 615 616 - make clearer which i2c pin is resetting which device 617 - break the reset operation in smaller chunks. This is really to 618 prepare for a future patch. 619 620 No functional changes. 621- npu2-opencapi: Use presence detection 622 623 Presence detection is not part of the opencapi specification. So each 624 platform may choose to implement it the way it wants. 625 626 All current platforms implement it through an i2c device where we can 627 query a pin to know if a device is connected or not. ZZ and Zaius have 628 a similar design and even use the same i2c information and pin 629 numbers. 630 However, presence detection on older ZZ planar (older than v4) doesn't 631 work, so we don't activate it for now, until our lab systems are 632 upgraded and it's better tested. 633 634 Presence detection on witherspoon is still being worked on. It's 635 shaping up to be quite different, so we may have to revisit the topic 636 in a later patch. 637 638Testing and CI 639-------------- 640 641Since :ref:`skiboot-6.1-rc1`: 642 643- test/qemu: start building qemu again, and use our built qemu for tests 644 645 We need to use QEMU_BIN rather than QEMU as the makefiles define 646 QEMU already. 647- opal-ci: qemu: Use the powernv-3.0 branch 648 649 This is based off the current development version of Qemu, and 650 importantly it contains the patch that allows skiboot and Linux to clear 651 the PCR that we require to boot. 652