1.. _skiboot-6.0.18: 2 3============== 4skiboot-6.0.18 5============== 6 7skiboot 6.0.18 was released on Wednesday March 6th, 2019. It replaces 8:ref:`skiboot-6.0.17` as the current stable release in the 6.0.x series. 9 10It is recommended that 6.0.18 be used instead of any previous 6.0.x version 11due to the bug fixes it contains. 12 13Over :ref:`skiboot-6.0.17` we have several bug fixes, including important ones 14for powercap, ipmi-hiomap and BMC communication driver. 15 16powercap 17======== 18- powercap: occ: Fix the powercapping range allowed for user 19 20 OCC provides two limits for minimum powercap. One being hard powercap 21 minimum which is guaranteed by OCC and the other one is a soft 22 powercap minimum which is lesser than hard-min and may or may not be 23 asserted due to various power-thermal reasons. So to allow the users 24 to access the entire powercap range, this patch exports soft powercap 25 minimum as the "powercap-min" DT property. And it also adds a new 26 DT property called "powercap-hard-min" to export the hard-min powercap 27 limit. 28 29IPMI-HIOMAP 30=========== 31- ipmi-hiomap test case enhancements/fixes. 32 33- libflash/ipmi-hiomap: Enforce message size for empty response 34 35 The protocol defines the response to the associated messages as empty 36 except for the command ID and sequence fields. If the BMC is returning 37 extra data consider the message malformed. 38 39- libflash/ipmi-hiomap: Remove unused close handling 40 41 Issuing a HIOMAP_C_CLOSE is not required by the protocol specification, 42 rather a close can be implicit in a subsequent 43 CREATE_{READ,WRITE}_WINDOW request. The implicit close provides an 44 opportunity to reduce LPC traffic and the implementation takes up that 45 optimisation, so remove the case from the IPMI callback handler. 46 47- libflash/ipmi-hiomap: Overhaul event handling 48 49 Reworking the event handling was inspired by a bug report by Vasant 50 where the host would get wedged on multiple flash access attempts in the 51 face of a persistent error state on the BMC-side. The cause of this bug 52 was the early-exit based on ctx->update, which erronously assumed that 53 all events had been completely handled in prior calls to 54 ipmi_hiomap_handle_events(). This is not true if e.g. 55 HIOMAP_E_DAEMON_READY is clear in the prior calls. 56 57 Regardless, there were other correctness and efficiency problems with 58 the handling strategy: 59 60 * Ack-able event state was not restored in the face of errors in the 61 process of re-establishing protocol state 62 63 * It forced needless window restoration with respect to the context in 64 which ipmi_hiomap_handle_events() was called. 65 66 * Tests for HIOMAP_E_DAEMON_READY and HIOMAP_E_FLASH_LOST were redundant 67 with the overhauled error handling introduced in the previous patch 68 69 Fix all of the above issues and add comments to explain the event 70 handling flow. 71 72 Tests for correctness follow later in the series. 73 74- libflash/ipmi-hiomap: Overhaul error handling 75 76 The aim is to improve the robustness with respect to absence of the 77 BMC-side daemon. The current error handling roughly mirrors what was 78 done for the mailbox implementation, but there's room for improvement. 79 80 Errors are split into two classes, those that affect the transport state 81 and those that affect the window validity. From here, we push the 82 transport state error checks right to the bottom of the stack, to ensure 83 the link is known to be in a good state before any message is sent. 84 Window validity tests remain as they were in the hiomap_window_move() 85 and ipmi_hiomap_read() functions. Validity tests are not necessary in 86 the write and erase paths as we will receive an error response from the 87 BMC when performing a dirty or flush on an invalid window. 88 89 Recovery also remains as it was, done on entry to the blocklevel 90 callbacks. If an error state is encountered in the middle of an 91 operation no attempt is made to recover it on the spot, instead the 92 error is returned up the stack and the caller can choose how it wishes 93 to respond. 94 95- libflash/ipmi-hiomap: Fix leak of msg in callback 96 97BMC communication 98================= 99- core/ipmi: Add ipmi sync messages to top of the list 100 101 In ipmi_queue_msg_sync() path OPAL will wait until it gets response from 102 BMC. If we do not get response ontime we may endup in kernel hardlockups. 103 Hence lets add sync messages to top of the queue. This will reduces the 104 chance of hardlockups. 105 106- hw/bt: Introduce separate list for synchronous messages 107 108 BT send logic always sends top of bt message list to BMC. Once BMC reads the 109 message, it clears the interrupt and bt_idle() becomes true. 110 111 bt_add_ipmi_msg_head() adds message to top of the list. If bt message list 112 is not empty then: 113 114 - if bt_idle() is true then we will endup sending message to BMC before 115 getting response from BMC for inflight message. Looks like on some 116 BMC implementation this results in message timeout. 117 - else we endup starting message timer without actually sending message 118 to BMC.. which is not correct. 119 120 This patch introduces separate list to track synchronous messages. 121 bt_add_ipmi_msg_head() will add messages to tail of this new list. We 122 will always process this queue before processing normal queue. 123 124 Finally this patch introduces new variable (inflight_bt_msg) to track 125 inflight message. This will point to current inflight message. 126 127- hw/bt: Fix message retry handler 128 129 In some corner cases (like BMC reboot), bt_send_and_unlock() starts 130 message timer, but won't send message to BMC as driver is not free to 131 send message. bt_expire_old_msg() function enables H2B interrupt without 132 actually sending message. 133 134 This patch fixes above issue. 135 136- ipmi/power: Fix system reboot issue 137 138 Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel 139 gets response from OPAL it runs opal_poll_events() until firmware 140 handles the request. 141 142 On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to 143 initiate system reboot/shutdown. At present OPAL queues IPMI messages 144 and return SUCESS to Host. If BMC is not ready to accept command (like 145 BMC reboot), then these message will fail. We have to manually 146 reboot/shutdown the system using BMC interface. 147 148 This patch adds logic to validate message return value. If message failed, 149 then it will resend the message. At some stage BMC will be ready to accept 150 message and handles IPMI message. 151 152- hw/bt: Add backend interface to disable ipmi message retry option 153 154 During boot OPAL makes IPMI_GET_BT_CAPS call to BMC to get BT interface 155 capabilities which includes IPMI message max resend count, message 156 timeout, etc,. Most of the time OPAL gets response from BMC within 157 specified timeout. In some corner cases (like mboxd daemon reset in BMC, 158 BMC reboot, etc) OPAL may not get response within timeout period. In 159 such scenarios, OPAL resends message until max resend count reaches. 160 161 OPAL uses synchronous IPMI message (ipmi_queue_msg_sync()) for few 162 operations like flash read, write, etc. Thread will wait in OPAL until 163 it gets response from BMC. In some corner cases like BMC reboot, thread 164 may wait in OPAL for long time (more than 20 seconds) and results in 165 kernel hardlockup. 166 167 This patch introduces new interface to disable message resend option. We 168 will disable message resend option for synchrous message. This will 169 greatly reduces kernel hardlock up issues. 170 171 This is short term fix. Long term solution is to convert all synchronous 172 messages to asynhrounous one. 173 174PHB3 175==== 176- hw/phb3/naples: Disable D-states 177 178 Putting "Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]" 179 (more precisely, the second of 2 its PCI functions, no matter in what 180 order) into the D3 state causes EEH with the "PCT timeout" error. 181 This has been noticed on garrison machines only and firestones do not 182 seem to have this issue. 183 184 This disables D-states changing for devices on root buses on Naples by 185 installing a config space access filter (copied from PHB4). 186