1.. _skiboot-5.10.3: 2 3============== 4skiboot-5.10.3 5============== 6 7skiboot 5.10.3 was released on Thursday March 28th, 2018. It replaces 8:ref:`skiboot-5.10.2` as the current stable release in the 5.10.x series. 9 10It is recommended that 5.10.3 be used instead of any previous 5.10.x version 11due to the bug fixes and debugging enhancements in it. 12 13Over :ref:`skiboot-5.10.2`, we have a few improvements and bug fixes: 14 15- NPU2: dump NPU2 registers on npu2 HMI 16 17 Due to the nature of debugging npu2 issues, folk are wanting the 18 full list of NPU2 registers dumped when there's a problem. 19 20 This is different than the solution introduced in 5.10.1 21 as there we would dump the registers in a way that would trigger a FIR 22 bit that would confuse PRD. 23- npu2: Add performance tuning SCOM inits 24 25 Peer-to-peer GPU bandwidth latency testing has produced some tunable 26 values that improve performance. Add them to our device initialization. 27 28 File these under things that need to be cleaned up with nice #defines 29 for the register names and bitfields when we get time. 30 31 A few of the settings are dependent on the system's particular NVLink 32 topology, so introduce a helper to determine how many links go to a 33 single GPU. 34- hw/npu2: Assign a unique LPARSHORTID per GPU 35 36 This gets used elsewhere to index items in the XTS tables. 37- occ: Set up OCC messaging even if we fail to setup pstates 38 39 This means that we no longer hit this bug if we fail to get valid pstates 40 from the OCC. :: 41 42 [console-pexpect]#echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear 43 echo 1 > //sys/firmware/opal/sensor_groups//occ-csm0/clear 44 [ 94.019971181,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 45 [ 94.020098392,5] CPU ATTEMPT TO RE-ENTER FIRMWARE! PIR=083d cpu @0x33cf4000 -> pir=083d token=8 46 [ 10.318805] Disabling lock debugging due to kernel taint 47 [ 10.318808] Severe Machine check interrupt [Not recovered] 48 [ 10.318812] NIP [000000003003e434]: 0x3003e434 49 [ 10.318813] Initiator: CPU 50 [ 10.318815] Error type: Real address [Load/Store (foreign)] 51 [ 10.318817] opal: Hardware platform error: Unrecoverable Machine Check exception 52 [ 10.318821] CPU: 117 PID: 2745 Comm: sh Tainted: G M 4.15.9-openpower1 #3 53 [ 10.318823] NIP: 000000003003e434 LR: 000000003003025c CTR: 0000000030030240 54 [ 10.318825] REGS: c00000003fa7bd80 TRAP: 0200 Tainted: G M (4.15.9-openpower1) 55 [ 10.318826] MSR: 9000000000201002 <SF,HV,ME,RI> CR: 48002888 XER: 20040000 56 [ 10.318831] CFAR: 0000000030030258 DAR: 394a00147d5a03a6 DSISR: 00000008 SOFTE: 1 57- core/fast-reboot: disable fast reboot upon fundamental entry/exit/locking errors 58 59 This disables fast reboot in several more cases where serious errors 60 like lock corruption or call re-entrancy are detected. 61- core/opal: allow some re-entrant calls 62 63 This allows a small number of OPAL calls to succeed despite re-entering 64 the firmware, and rejects others rather than aborting. 65 66 This allows a system reset interrupt that interrupts OPAL to do something 67 useful. Sreset other CPUs, use the console, which allows xmon to work or 68 stack traces to be printed, reboot the system. 69 70 Use OPAL_INTERNAL_ERROR when rejecting, rather than OPAL_BUSY, which is 71 used for many other things that does not mean a serious permanent error. 72- core/opal: abort in case of re-entrant OPAL call 73 74 The stack is already destroyed by the time we get here, so there 75 is not much point continuing. 76- npu2: Disable fast reboot 77 78 Fast reboot does not yet work right with the NPU. It's been disabled on 79 NVLink and OpenCAPI machines. Do the same for NVLink2. 80 81 This amounts to a port of 3e4577939bbf ("npu: Fix broken fast reset") 82 from the npu code to npu2. 83