1.. _skiboot-5.7-rc2: 2 3skiboot-5.7-rc2 4=============== 5 6skiboot v5.7-rc2 was released on Thursday July 13th 2017. It is the second 7release candidate of skiboot 5.7, which will become the new stable release 8of skiboot following the 5.6 release, first released 24th May 2017. 9 10skiboot v5.7-rc2 contains all bug fixes as of :ref:`skiboot-5.4.6` 11and :ref:`skiboot-5.1.19` (the currently maintained stable releases). We 12do not currently expect to do any 5.6.x stable releases. 13 14For how the skiboot stable releases work, see :ref:`stable-rules` for details. 15 16The current plan is to cut the final 5.7 in the next week or so, with skiboot 175.7 being for all POWER8 and POWER9 platforms in op-build v1.18 18(due July 12th, but will come *after* skiboot 5.7). 19 20This is the second release using the new regular six week release cycle, 21similar to op-build, but slightly offset to allow for a short stabilisation 22period. Expected release dates and contents are tracked using GitHub milestone 23and issues: https://github.com/open-power/skiboot/milestones 24 25Over :ref:`skiboot-5.7-rc1`, we have the following changes: 26 27POWER9 28------ 29 30There are many important changes for POWER9 DD1 and DD2 systems. POWER9 support 31should be considered in development and skiboot 5.7 is certainly **NOT** 32suitable for POWER9 production environments. 33 34- HDAT: Add IPMI sensor data under /bmc node 35- numa/associativity: Add a new level of NUMA for GPU's 36 37 Today we have an issue where the NUMA nodes corresponding 38 to GPU's have the same affinity/distance as normal memory 39 nodes. Our reference-points today supports two levels 40 [0x4, 0x4] for normal systems and [0x4, 0x3] for Power8E 41 systems. This patch adds a new level [0x4, X, 0x2] and 42 uses node-id as at all levels for the GPU. 43- xive: Enable memory backing of queues 44 45 This dedicates 6x64k pages of memory permanently for the XIVE to 46 use for internal queue overflow. This allows the XIVE to deal with 47 some corner cases where the internal queues might prove insufficient. 48 49- xive: Properly get rid of donated indirect pages during reset 50 51 Otherwise they keep being used accross kexec causing memory 52 corruption in subsequent kernels once KVM has been used. 53 54- cpu: Better handle unknown flags in opal_reinit_cpus() 55 56 At the moment, if we get passed flags we don't know about, we 57 return OPAL_UNSUPPORTED but we still perform whatever actions 58 was requied by the flags we do support. Additionally, on P8, 59 we attempt a SLW re-init which hasn't been supported since 60 Murano DD2.0 and will crash your system. 61 62 It's too late to fix on existing systems so Linux will have to 63 be careful at least on P8, but to avoid future issues let's clean 64 that up, make sure we only use slw_reinit() when HILE isn't 65 supported. 66- cpu: Unconditionally cleanup TLBs on P9 in opal_reinit_cpus() 67 68 This can work around problems where Linux fails to properly 69 cleanup part or all of the TLB on kexec. 70 71- Fix scom addresses for power9 nx checkstop hmi handling. 72 73 Scom addresses for NX status, DMA & ENGINE FIR and PBI FIR has changed 74 for Power9. Fixup thoes while handling nx checkstop for Power9. 75- Fix scom addresses for power9 core checkstop hmi handling. 76 77 Scom addresses for CORE FIR (Fault Isolation Register) and Malfunction 78 Alert Register has changed for Power9. Fixup those while handling core 79 checkstop for Power9. 80 81 Without this change HMI handler fails to check for correct reason for 82 core checkstop on Power9. 83 84- core/mem_region: check return value of add_region 85 86 The only sensible thing to do if this fails is to abort() as we've 87 likely just failed reserving reserved memory regions, and nothing 88 good comes from that. 89 90PHB4 91^^^^ 92- phb4: Do more retries on link training failures 93 Currently we only retry once when we have a link training failure. 94 This changes this to be 3 retries as 1 retry is not giving us enough 95 reliablity. 96 97 This will increase the boot time, especially on systems where we 98 incorrectly detect a link presence when there really is nothing 99 present. I'll post a followup patch to optimise our timings to help 100 mitigate this later. 101 102- phb4: Workaround phy lockup by doing full PHB reset on retry 103 104 For PHB4 it's possible that the phy may end up in a bad state where it 105 can no longer recieve data. This can manifest as the link not 106 retraining. A simple PERST will not clear this. The PHB must be 107 completely reset. 108 109 This changes the retry state to CRESET to do this. 110 111 This issue may also manifest itself as the link training in a degraded 112 state (lower speed or narrower width). This patch doesn't attempt to 113 fix that (will come later). 114- pci: Add ability to trace timing 115 116 PCI link training is responsible for a huge chunk of the skiboot boot 117 time, so add the ability to trace it waiting in the main state 118 machine. 119- pci: Print resetting PHB notice at higher log level 120 121 Currently during boot there a long delay while we wait for the PHBs to 122 be reset and train. During this time, there is no output from skiboot 123 and the last message doesn't give an indication of what's happening. 124 125 This boosts the PHB reset message from info to notice so users can see 126 what's happening during this long period of waiting. 127- phb4: Only set one bit in nfir 128 129 The MPIPL procedure says to only set bit 26 when forcing the PEC into 130 freeze mode. Currently we set bits 24-27. 131 132 This changes the code to follow spec and only set bit 26. 133- phb4: Fix order of pfir/nfir clearing in CRESET 134 135 According to the workbook, pfir must be cleared before the nfir. 136 The way we have it now causes the nfir to not clear properly in some 137 error circumstances. 138 139 This swaps the order to match the workbook. 140- phb4: Remove incorrect state transition 141 142 When waiting in PHB4_SLOT_CRESET_WAIT_CQ for transations to end, we 143 incorrectly move onto the next state. Generally we don't hit this as 144 the transactions have ended already anyway. 145 146 This removes the incorrect state transition. 147- phb4: Set default lane equalisation 148 149 Set default lane equalisation if there is nothing in the device-tree. 150 151 Default value taken from hdat and confirmed by hardware team. Neatens 152 the code up a bit too. 153- hdata: Fix phb4 lane-eq property generation 154 155 The lane-eq data we get from hdat is all 7s but what we end up in the 156 device tree is: :: 157 158 xscom@603fc00000000/pbcq@4010c00/stack@0/ibm,lane-eq 159 00000000 31c339e0 00000000 0000000c 160 00000000 00000000 00000000 00000000 161 00000000 31c30000 77777777 77777777 162 77777777 77777777 77777777 77777777 163 164 This fixes grabbing the properties from hdat and fixes the call to put 165 them in the device tree. 166- phb4: Fix PHB4 fence recovery. 167 168 We had a few problems: 169 170 - We used the wrong register to trigger the reset (spec bug) 171 - We should clear the PFIR and NFIR while the reset is asserted 172 - ... and in the right order ! 173 - We should only apply the DD1 workaround after the reset has 174 been lifted. 175 - We should ensure we use ASB whenever we are fenced or doing a 176 CRESET 177 - Make config ops write with ASB 178- phb4: Verbose EEH options 179 180 Enabled via nvram pci-eeh-verbose=true. ie. :: 181 182 nvram -p ibm,skiboot --update-config pci-eeh-verbose=true 183- phb4: Print more info when PHB fences 184 185 For now at PHBERR level. We don't have room in the diags data 186 passed to Linux for these unfortunately. 187 188 189Testing/development 190------------------- 191- lpc: remove double LPC prefix from messages 192- opal-ci/fetch-debian-jessie-installer: follow redirects 193 Fixes some CI failures 194- test/qemu-jessie: bail out fast on kernel panic 195- test/qemu-jessie: dump boot log on failure 196- travis: add fedora26 197- xz: add fallthrough annotations to silence GCC7 warning 198