1.. _skiboot-6.0.18:
2
3==============
4skiboot-6.0.18
5==============
6
7skiboot 6.0.18 was released on Wednesday March 6th, 2019. It replaces
8:ref:`skiboot-6.0.17` as the current stable release in the 6.0.x series.
9
10It is recommended that 6.0.18 be used instead of any previous 6.0.x version
11due to the bug fixes it contains.
12
13Over :ref:`skiboot-6.0.17` we have several bug fixes, including important ones
14for powercap, ipmi-hiomap and BMC communication driver.
15
16powercap
17========
18- powercap: occ: Fix the powercapping range allowed for user
19
20  OCC provides two limits for minimum powercap. One being hard powercap
21  minimum which is guaranteed by OCC and the other one is a soft
22  powercap minimum which is lesser than hard-min and may or may not be
23  asserted due to various power-thermal reasons. So to allow the users
24  to access the entire powercap range, this patch exports soft powercap
25  minimum as the "powercap-min" DT property. And it also adds a new
26  DT property called "powercap-hard-min" to export the hard-min powercap
27  limit.
28
29IPMI-HIOMAP
30===========
31- ipmi-hiomap test case enhancements/fixes.
32
33- libflash/ipmi-hiomap: Enforce message size for empty response
34
35  The protocol defines the response to the associated messages as empty
36  except for the command ID and sequence fields. If the BMC is returning
37  extra data consider the message malformed.
38
39- libflash/ipmi-hiomap: Remove unused close handling
40
41  Issuing a HIOMAP_C_CLOSE is not required by the protocol specification,
42  rather a close can be implicit in a subsequent
43  CREATE_{READ,WRITE}_WINDOW request. The implicit close provides an
44  opportunity to reduce LPC traffic and the implementation takes up that
45  optimisation, so remove the case from the IPMI callback handler.
46
47- libflash/ipmi-hiomap: Overhaul event handling
48
49  Reworking the event handling was inspired by a bug report by Vasant
50  where the host would get wedged on multiple flash access attempts in the
51  face of a persistent error state on the BMC-side. The cause of this bug
52  was the early-exit based on ctx->update, which erronously assumed that
53  all events had been completely handled in prior calls to
54  ipmi_hiomap_handle_events(). This is not true if e.g.
55  HIOMAP_E_DAEMON_READY is clear in the prior calls.
56
57  Regardless, there were other correctness and efficiency problems with
58  the handling strategy:
59
60  * Ack-able event state was not restored in the face of errors in the
61    process of re-establishing protocol state
62
63  * It forced needless window restoration with respect to the context in
64    which ipmi_hiomap_handle_events() was called.
65
66  * Tests for HIOMAP_E_DAEMON_READY and HIOMAP_E_FLASH_LOST were redundant
67    with the overhauled error handling introduced in the previous patch
68
69  Fix all of the above issues and add comments to explain the event
70  handling flow.
71
72  Tests for correctness follow later in the series.
73
74- libflash/ipmi-hiomap: Overhaul error handling
75
76  The aim is to improve the robustness with respect to absence of the
77  BMC-side daemon. The current error handling roughly mirrors what was
78  done for the mailbox implementation, but there's room for improvement.
79
80  Errors are split into two classes, those that affect the transport state
81  and those that affect the window validity. From here, we push the
82  transport state error checks right to the bottom of the stack, to ensure
83  the link is known to be in a good state before any message is sent.
84  Window validity tests remain as they were in the hiomap_window_move()
85  and ipmi_hiomap_read() functions. Validity tests are not necessary in
86  the write and erase paths as we will receive an error response from the
87  BMC when performing a dirty or flush on an invalid window.
88
89  Recovery also remains as it was, done on entry to the blocklevel
90  callbacks. If an error state is encountered in the middle of an
91  operation no attempt is made to recover it on the spot, instead the
92  error is returned up the stack and the caller can choose how it wishes
93  to respond.
94
95- libflash/ipmi-hiomap: Fix leak of msg in callback
96
97BMC communication
98=================
99- core/ipmi: Add ipmi sync messages to top of the list
100
101  In ipmi_queue_msg_sync() path OPAL will wait until it gets response from
102  BMC. If we do not get response ontime we may endup in kernel hardlockups.
103  Hence lets add sync messages to top of the queue. This will reduces the
104  chance of hardlockups.
105
106- hw/bt: Introduce separate list for synchronous messages
107
108  BT send logic always sends top of bt message list to BMC. Once BMC reads the
109  message, it clears the interrupt and bt_idle() becomes true.
110
111  bt_add_ipmi_msg_head() adds message to top of the list. If bt message list
112  is not empty then:
113
114    - if bt_idle() is true then we will endup sending message to BMC before
115      getting response from BMC for inflight message. Looks like on some
116      BMC implementation this results in message timeout.
117    - else we endup starting message timer without actually sending message
118      to BMC.. which is not correct.
119
120  This patch introduces separate list to track synchronous messages.
121  bt_add_ipmi_msg_head() will add messages to tail of this new list. We
122  will always process this queue before processing normal queue.
123
124  Finally this patch introduces new variable (inflight_bt_msg) to track
125  inflight message. This will point to current inflight message.
126
127- hw/bt: Fix message retry handler
128
129  In some corner cases (like BMC reboot), bt_send_and_unlock() starts
130  message timer, but won't send message to BMC as driver is not free to
131  send message. bt_expire_old_msg() function enables H2B interrupt without
132  actually sending message.
133
134  This patch fixes above issue.
135
136- ipmi/power: Fix system reboot issue
137
138  Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel
139  gets response from OPAL it runs opal_poll_events() until firmware
140  handles the request.
141
142  On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to
143  initiate system reboot/shutdown. At present OPAL queues IPMI messages
144  and return SUCESS to Host. If BMC is not ready to accept command (like
145  BMC reboot), then these message will fail. We have to manually
146  reboot/shutdown the system using BMC interface.
147
148  This patch adds logic to validate message return value. If message failed,
149  then it will resend the message. At some stage BMC will be ready to accept
150  message and handles IPMI message.
151
152- hw/bt: Add backend interface to disable ipmi message retry option
153
154  During boot OPAL makes IPMI_GET_BT_CAPS call to BMC to get BT interface
155  capabilities which includes IPMI message max resend count, message
156  timeout, etc,. Most of the time OPAL gets response from BMC within
157  specified timeout. In some corner cases (like mboxd daemon reset in BMC,
158  BMC reboot, etc) OPAL may not get response within timeout period. In
159  such scenarios, OPAL resends message until max resend count reaches.
160
161  OPAL uses synchronous IPMI message (ipmi_queue_msg_sync()) for few
162  operations like flash read, write, etc. Thread will wait in OPAL until
163  it gets response from BMC. In some corner cases like BMC reboot, thread
164  may wait in OPAL for long time (more than 20 seconds) and results in
165  kernel hardlockup.
166
167  This patch introduces new interface to disable message resend option. We
168  will disable message resend option for synchrous message. This will
169  greatly reduces kernel hardlock up issues.
170
171  This is short term fix. Long term solution is to convert all synchronous
172  messages to asynhrounous one.
173
174PHB3
175====
176- hw/phb3/naples: Disable D-states
177
178  Putting "Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]"
179  (more precisely, the second of 2 its PCI functions, no matter in what
180  order) into the D3 state causes EEH with the "PCT timeout" error.
181  This has been noticed on garrison machines only and firestones do not
182  seem to have this issue.
183
184  This disables D-states changing for devices on root buses on Naples by
185  installing a config space access filter (copied from PHB4).
186