1.. _skiboot-6.3:
2
3skiboot-6.3
4===========
5
6skiboot v6.3 was released on Friday May 3rd 2019. It is the first
7release of skiboot 6.3, which becomes the new stable release
8of skiboot following the 6.2 release, first released December 14th 2018.
9
10Skiboot 6.3 will mark the basis for op-build v2.3.
11
12skiboot v6.3 contains all bug fixes as of :ref:`skiboot-6.0.20`,
13and :ref:`skiboot-6.2.3` (the currently maintained
14stable releases).
15
16For how the skiboot stable releases work, see :ref:`stable-rules` for details.
17
18Over skiboot 6.2, we have the following changes:
19
20.. _skiboot-6.3-new-features:
21
22New Features
23------------
24
25- hw/imc: Enable opal calls to init/start/stop IMC Trace mode
26
27  New OPAL APIs for In-Memory Collection Counter infrastructure(IMC),
28  including a new device type called OPAL_IMC_COUNTERS_TRACE.
29- xive: Add calls to save/restore the queues and VPs HW state
30
31  To be able to support migration of guests using the XIVE native
32  exploitation mode, (where the queue is effectively owned by the
33  guest), KVM needs to be able to save and restore the HW-modified
34  fields of the queue, such as the current queue producer pointer and
35  generation bit, and to retrieve the modified thread context registers
36  of the VP from the NVT structure : the VP interrupt pending bits.
37
38  However, there is no need to set back the NVT structure on P9. P10
39  should be the same.
40- witherspoon: Add nvlink2 interconnect information
41
42  GPUs on Redbud and Sequoia platforms are interconnected in groups of
43  2 or 3 GPUs. The problem with that is if the user decides to pass a single
44  GPU from a group to the userspace, we need to ensure that links between
45  GPUs do not get enabled.
46
47  A V100 GPU provides a way to disable selected links. In order to only
48  disable links to peer GPUs, we need a topology map.
49
50  This adds an "ibm,nvlink-peers" property to a GPU DT node with phandles
51  of peer GPUs and NVLink2 bridges. The index in the property is a GPU link
52  number.
53- platforms/romulus: Also support talos
54
55  The two are similar enough and I'd like to have a slot table for our
56  Talos.
57- OpenCAPI support! (see :ref:`skiboot-6.3-OpenCAPI` section)
58- opal/hmi: set a flag to inform OS that TOD/TB has failed.
59
60  Set a flag to indicate OS about TOD/TB failure as part of new
61  opal_handle_hmi2 handler. This flag then can be used by OS to make sure
62  functions depending on TB value (e.g. udelay()) are aware of TB not
63  ticking.
64- astbmc: Enable IPMI HIOMAP for AMI platforms
65
66  Required for Habanero, Palmetto and Romulus.
67- power-mgmt : occ : Add 'freq-domain-mask' DT property
68
69  Add a new device-tree property freq-domain-indicator to define group of
70  CPUs which would share same frequency. This property has been added under
71  power-mgmt node. It is a bitmask.
72
73  Bitwise AND is taken between this bitmask value and PIR of cpu. All the
74  CPUs lying in the same frequency domain will have same result for AND.
75
76  For example, For POWER9, 0xFFF0 indicates quad wide frequency domain.
77  Taking AND with the PIR of CPUs will yield us frequency domain which is
78  quad wise distribution as last 4 bits have been masked which represent the
79  cores.
80
81  Similarly, 0xFFF8 will represent core wide frequency domain for P8.
82
83  Also, Add a new device-tree property domain-runs-at which will denote the
84  strategy OCC is using to change the frequency of a frequency-domain. There
85  can be two strategy - FREQ_MOST_RECENTLY_SET and FREQ_MAX_IN_DOMAIN.
86
87  FREQ_MOST_RECENTLY_SET : the OCC sets the frequency of the quad to the most
88  recent frequency value requested by the CPUs in the quad.
89
90  FREQ_MAX_IN_DOMAIN : the OCC sets the frequency of the CPUs in
91  the Quad to the maximum of the latest frequency requested by each of
92  the component cores.
93- powercap: occ: Fix the powercapping range allowed for user
94
95  OCC provides two limits for minimum powercap. One being hard powercap
96  minimum which is guaranteed by OCC and the other one is a soft
97  powercap minimum which is lesser than hard-min and may or may not be
98  asserted due to various power-thermal reasons. So to allow the users
99  to access the entire powercap range, this patch exports soft powercap
100  minimum as the "powercap-min" DT property. And it also adds a new
101  DT property called "powercap-hard-min" to export the hard-min powercap
102  limit.
103- Add NVDIMM support
104
105  NVDIMMs are memory modules that use a battery backup system to allow the
106  contents RAM to be saved to non-volatile storage if system power goes
107  away unexpectedly. This allows them to be used a high-performance
108  storage device, suitable for serving as a cache for SSDs and the like.
109
110  Configuration of NVDIMMs is handled by hostboot and communicated to OPAL
111  via the HDAT. We need to parse out the NVDIMM memory ranges and create
112  memory regions with the "pmem-region" compatible label to make them
113  available to the host.
114- core/exceptions: implement support for MCE interrupts in powersave
115
116  The ISA specifies that MCE interrupts in power saving modes will enter
117  at 0x200 with powersave bits in SRR1 set. This is not currently
118  supported properly, the MCE will just happen like a normal interrupt,
119  but GPRs could be lost, which would lead to crashes (e.g., r1, r2, r13
120  etc).
121
122  So check the power save bits similarly to the sreset vector, and
123  handle this properly.
124- core/exceptions: allow recoverable sreset exceptions
125
126  This requires implementing the MSR[RI] bit. Then just allow all
127  non-fatal sreset exceptions to recover.
128- core/exceptions: implement an exception handler for non-powersave sresets
129
130  Detect non-powersave sresets and send them to the normal exception
131  handler which prints registers and stack.
132- Add PVR_TYPE_P9P
133
134  Enable a new PVR to get us running on another p9 variant.
135
136Since v6.3-rc2:
137
138- Expose PNOR Flash partitions to host MTD driver via devicetree
139
140  This makes it possible for the host to directly address each
141  partition without requiring each application to directly parse
142  the FFS headers.  This has been in use for some time already to
143  allow BOOTKERNFW partition updates from the host.
144
145  All partitions except BOOTKERNFW are marked readonly.
146
147  The BOOTKERNFW partition is currently exclusively used by the TalosII platform
148
149- Write boot progress to LPC port 80h
150
151  This is an adaptation of what we currently do for op_display() on FSP
152  machines, inventing an encoding for what we can write into the single
153  byte at LPC port 80h.
154
155  Port 80h is often used on x86 systems to indicate boot progress/status
156  and dates back a decent amount of time. Since a byte isn't exactly very
157  expressive for everything that can go on (and wrong) during boot, it's
158  all about compromise.
159
160  Some systems (such as Zaius/Barreleye G2) have a physical dual 7 segment
161  display that display these codes. So far, this has only been driven by
162  hostboot (see hostboot commit 90ec2e65314c).
163
164- Write boot progress to LPC ports 81 and 82
165
166  There's a thought to write more extensive boot progress codes to LPC
167  ports 81 and 82 to supplement/replace any reliance on port 80.
168
169  We want to still emit port 80 for platforms like Zaius and Barreleye
170  that have the physical display. Ports 81 and 82 can be monitored by a
171  BMC though.
172
173- Add Talos II platform
174
175  Talos II has some hardware differences from Romulus, therefore
176  we cannot guarantee Talos II == Romulus in skiboot.  Copy and
177  slightly modify the Romulus files for Talos II.
178
179Since v6.3-rc1:
180
181- cpufeatures: Add tm-suspend-hypervisor-assist and tm-suspend-xer-so-bug node
182
183  tm-suspend-hypervisor-assist for P9 >=DD2.2
184  And a tm-suspend-xer-so-bug node for P9 DD2.2 only.
185
186  I also treat P9P as P9 DD2.3 and add a unit test for the cpufeatures
187  infrastructure.
188
189  Fixes: https://github.com/open-power/skiboot/issues/233
190
191
192Deprecated/Removed Features
193---------------------------
194
195- opal: Deprecate reading the PHB status
196
197  The OPAL_PCI_EEH_FREEZE_STATUS call takes a bunch of parameters, one of
198  them is @phb_status. It is defined as __be64* and always NULL in
199  the current Linux upstream but if anyone ever decides to read that status,
200  then the PHB3's handler will assume it is struct OpalIoPhb3ErrorData*
201  (which is a lot bigger than 8 bytes) and zero it causing the stack
202  corruption; p7ioc-phb has the same issue.
203
204  This removes @phb_status from all eeh_freeze_status() hooks and moves
205  the error message from PHB4 to the affected OPAL handlers.
206
207  As far as we can tell, nobody has ever used this and thus it's safe to remove.
208- Remove POWER9N DD1 support
209
210  This is not a shipping product and is no longer supported by Linux
211  or other firmware components.
212
213Since v6.3-rc3:
214
215- Disable fast-reset for POWER8
216
217  There is a bug with fast-reset when CPU cores are busy, which can be
218  reproduced by running `stress` and then trying `reboot -ff` (this is
219  what the op-test test cases FastRebootHostStress and
220  FastRebootHostStressTorture do). What happens is the cores lock up,
221  which isn't the best thing in the world when you want them to start
222  executing instructions again.
223
224  A workaround is to use instruction ramming, which while greatly
225  increasing the reliability of fast-reset on p8, doesn't make it perfect.
226
227  Instruction ramming is what pdbg was modified to do in order to have the
228  sreset functionality work reliably on p8.
229  pdbg patches: https://patchwork.ozlabs.org/project/pdbg/list/?series=96593&state=*
230
231  Fixes: https://github.com/open-power/skiboot/issues/185
232
233General
234-------
235
236- core/i2c: Various bits of refactoring
237- refactor backtrace generation infrastructure
238- astbmc: Handle failure to initialise raw flash
239
240  Initialising raw flash lead to a dead assignment to rc. Check the return
241  code and take the failure path as necessary. Both before and after the
242  fix we see output along the lines of the following when flash_init()
243  fails: ::
244
245    [   53.283182881,7] IRQ: Registering 0800..0ff7 ops @0x300d4b98 (data 0x3052b9d8)
246    [   53.283184335,7] IRQ: Registering 0ff8..0fff ops @0x300d4bc8 (data 0x3052b9d8)
247    [   53.283185513,7] PHB#0000: Initializing PHB...
248    [   53.288260827,4] FLASH: Can't load resource id:0. No system flash found
249    [   53.288354442,4] FLASH: Can't load resource id:1. No system flash found
250    [   53.342933439,3] CAPP: Error loading ucode lid. index=200ea
251    [   53.462749486,2] NVRAM: Failed to load
252    [   53.462819095,2] NVRAM: Failed to load
253    [   53.462894236,2] NVRAM: Failed to load
254    [   53.462967071,2] NVRAM: Failed to load
255    [   53.463033077,2] NVRAM: Failed to load
256    [   53.463144847,2] NVRAM: Failed to load
257
258  Eventually followed by: ::
259
260    [   57.216942479,5] INIT: platform wait for kernel load failed
261    [   57.217051132,5] INIT: Assuming kernel at 0x20000000
262    [   57.217127508,3] INIT: ELF header not found. Assuming raw binary.
263    [   57.217249886,2] NVRAM: Failed to load
264    [   57.221294487,0] FATAL: Kernel is zeros, can't execute!
265    [   57.221397429,0] Assert fail: core/init.c:615:0
266    [   57.221471414,0] Aborting!
267    CPU 0028 Backtrace:
268     S: 0000000031d43c60 R: 000000003001b274   ._abort+0x4c
269     S: 0000000031d43ce0 R: 000000003001b2f0   .assert_fail+0x34
270     S: 0000000031d43d60 R: 0000000030014814   .load_and_boot_kernel+0xae4
271     S: 0000000031d43e30 R: 0000000030015164   .main_cpu_entry+0x680
272     S: 0000000031d43f00 R: 0000000030002718   boot_entry+0x1c0
273     --- OPAL boot ---
274
275  Analysis of the execution paths suggests we'll always "safely" end this
276  way due the setup sequence for the blocklevel callbacks in flash_init()
277  and error handling in blocklevel_get_info(), and there's no current risk
278  of executing from unexpected memory locations. As such the issue is
279  reduced to down to a fix for poor error hygene in the original change
280  and a resolution for a Coverity warning (famous last words etc).
281- core/flash: Retry requests as necessary in flash_load_resource()
282
283  We would like to successfully boot if we have a dependency on the BMC
284  for flash even if the BMC is not current ready to service flash
285  requests. On the assumption that it will become ready, retry for several
286  minutes to cover a BMC reboot cycle and *eventually* rather than
287  *immediately* crash out with: ::
288
289        [  269.549748] reboot: Restarting system
290        [  390.297462587,5] OPAL: Reboot request...
291        [  390.297737995,5] RESET: Initiating fast reboot 1...
292        [  391.074707590,5] Clearing unused memory:
293        [  391.075198880,5] PCI: Clearing all devices...
294        [  391.075201618,7] Clearing region 201ffe000000-201fff800000
295        [  391.086235699,5] PCI: Resetting PHBs and training links...
296        [  391.254089525,3] FFS: Error 17 reading flash header
297        [  391.254159668,3] FLASH: Can't open ffs handle: 17
298        [  392.307245135,5] PCI: Probing slots...
299        [  392.363723191,5] PCI Summary:
300        ...
301        [  393.423255262,5] OCC: All Chip Rdy after 0 ms
302        [  393.453092828,5] INIT: Starting kernel at 0x20000000, fdt at
303        0x30800a88 390645 bytes
304        [  393.453202605,0] FATAL: Kernel is zeros, can't execute!
305        [  393.453247064,0] Assert fail: core/init.c:593:0
306        [  393.453289682,0] Aborting!
307        CPU 0040 Backtrace:
308         S: 0000000031e03ca0 R: 000000003001af60   ._abort+0x4c
309         S: 0000000031e03d20 R: 000000003001afdc   .assert_fail+0x34
310         S: 0000000031e03da0 R: 00000000300146d8   .load_and_boot_kernel+0xb30
311         S: 0000000031e03e70 R: 0000000030026cf0   .fast_reboot_entry+0x39c
312         S: 0000000031e03f00 R: 0000000030002a4c   fast_reset_entry+0x2c
313         --- OPAL boot ---
314
315  The OPAL flash API hooks directly into the blocklevel layer, so there's
316  no delay for e.g. the host kernel, just for asynchronously loaded
317  resources during boot.
318- fast-reboot: occ: Call occ_pstates_init() on fast-reset on all machines
319
320  Commit 815417dcda2e ("init, occ: Initialise OCC earlier on BMC systems")
321  conditionally invoked occ_pstates_init() only on FSP based systems in
322  load_and_boot_kernel(). Due to this pstate table is re-parsed on FSP
323  system and skipped on BMC system during fast-reboot. So this patch fixes
324  this by invoking occ_pstates_init() on all boxes during fast-reboot.
325- opal/hmi: Don't retry TOD recovery if it is already in failed state.
326
327  On TOD failure, all cores/thread receives HMI and very first thread that
328  gets interrupt fixes the TOD where as others just resets the respective
329  HMER error bit and return. But when TOD is unrecoverable, all the threads
330  try to do TOD recovery one by one causing threads to spend more time inside
331  opal. Set a global flag when TOD is unrecoverable so that rest of the
332  threads go back to linux immediately avoiding lock ups in system
333  reboot/panic path.
334- hw/bt: Do not disable ipmi message retry during OPAL boot
335
336  Currently OPAL doesn't know whether BMC is functioning or not. If BMC is
337  down (like BMC reboot), then we keep on retry sending message to BMC. So
338  in some corner cases we may hit hard lockup issue in kernel.
339
340  Ideally we should avoid using synchronous path as much as possible. But
341  for now commit 01f977c3 added option to disable message retry in synchronous.
342  But this fix is not required during boot. Hence lets disable IPMI message
343  retry during OPAL boot.
344- hdata/memory: Fix warning message
345
346  Even though we added memory to device tree, we are getting below warning. ::
347
348    [   57.136949696,3] Unable to use memory range 0 from MSAREA 0
349    [   57.137049753,3] Unable to use memory range 0 from MSAREA 1
350    [   57.137152335,3] Unable to use memory range 0 from MSAREA 2
351    [   57.137251218,3] Unable to use memory range 0 from MSAREA 3
352- hw/bt: Add backend interface to disable ipmi message retry option
353
354  During boot OPAL makes IPMI_GET_BT_CAPS call to BMC to get BT interface
355  capabilities which includes IPMI message max resend count, message
356  timeout, etc,. Most of the time OPAL gets response from BMC within
357  specified timeout. In some corner cases (like mboxd daemon reset in BMC,
358  BMC reboot, etc) OPAL may not get response within timeout period. In
359  such scenarios, OPAL resends message until max resend count reaches.
360
361  OPAL uses synchronous IPMI message (ipmi_queue_msg_sync()) for few
362  operations like flash read, write, etc. Thread will wait in OPAL until
363  it gets response from BMC. In some corner cases like BMC reboot, thread
364  may wait in OPAL for long time (more than 20 seconds) and results in
365  kernel hardlockup.
366
367  This patch introduces new interface to disable message resend option. We
368  will disable message resend option for synchrous message. This will
369  greatly reduces kernel hardlock up issues.
370
371  This is short term fix. Long term solution is to convert all synchronous
372  messages to asynhrounous one.
373- ipmi/power: Fix system reboot issue
374
375  Kernel makes reboot/shudown OPAL call for reboot/shutdown. Once kernel
376  gets response from OPAL it runs opal_poll_events() until firmware
377  handles the request.
378
379  On BMC based system, OPAL makes IPMI call (IPMI_CHASSIS_CONTROL) to
380  initiate system reboot/shutdown. At present OPAL queues IPMI messages
381  and return SUCESS to Host. If BMC is not ready to accept command (like
382  BMC reboot), then these message will fail. We have to manually
383  reboot/shutdown the system using BMC interface.
384
385  This patch adds logic to validate message return value. If message failed,
386  then it will resend the message. At some stage BMC will be ready to accept
387  message and handles IPMI message.
388- firmware-versions: Add test case for parsing VERSION
389
390  Also make it possible to use with afl-lop/afl-fuzz just to help make
391  *sure* we're all good.
392
393  Additionally, if we hit a entry in VERSION that is larger than our
394  buffer size, we skip over it gracefully rather than overwriting the
395  stack. This is only a problem if VERSION isn't trusted, which as of
396  4b8cc05a94513816d43fb8bd6178896b430af08f it is verified as part of
397  Secure Boot.
398- core/fast-reboot: improve NMI handling during fast reset
399
400  Improve sreset and MCE handling in fast reboot. Switch the HILE bit
401  off before copying OPAL's exception vectors, so NMIs can be handled
402  properly. Also disable MSR[ME] while the vectors are being overwritten
403- core/cpu: HID update race
404
405  If the per-core HID register is updated concurrently by multiple
406  threads, updates can get lost. This has been observed during fast
407  reboot where the HILE bit does not get cleared on all cores, which
408  can cause machine check exception interrupts to crash.
409
410  Fix this by only updating HID on thread0.
411- SLW: Print verbose info on errors only
412
413  Change print level from debug to warning for reporting
414  bad EC_PPM_SPECIAL_WKUP_* scom values. To reduce cluttering
415  in the log print only on error.
416
417Since v6.3-rc2:
418
419- hw/xscom: add missing P9P chip name
420- asm/head: balance branches to avoid link stack predictor mispredicts
421
422  The Linux wrapper for OPAL call and return is arranged like this: ::
423
424      __opal_call:
425          mflr   r0
426          std    r0,PPC_STK_LROFF(r1)
427          LOAD_REG_ADDR(r11, opal_return)
428          mtlr   r11
429          hrfid  -> OPAL
430
431      opal_return:
432          ld     r0,PPC_STK_LROFF(r1)
433          mtlr   r0
434          blr
435
436  When skiboot returns to Linux, it branches to LR (i.e., opal_return)
437  with a blr. This unbalances the link stack predictor and will cause
438  mispredicts back up the return stack.
439- external/mambo: also invoke readline for the non-autorun case
440- asm/head.S: set POWER9 radix HID bit at entry
441
442  When running in virtual memory mode, the radix MMU hid bit should not
443  be changed, so set this in the initial boot SPR setup.
444
445  As a side effect, fast reboot also has HID0:RADIX bit set by the
446  shared spr init, so no need for an explicit call.
447- build: link with --orphan-handling=warn
448
449  The linker can warn when the linker script does not explicitly place
450  all sections. These orphan sections are placed according to
451  heuristics, which may not always be desirable. Enable this warning.
452- build: -fno-asynchronous-unwind-tables
453
454  skiboot does not use unwind tables, this option saves about 100kB,
455  mostly from .text.
456- opal/hmi: Initialize the hmi event with old value of TFMR.
457
458  Do this before we fix TFAC errors. Otherwise the event at host console
459  shows no thread error reported in TFMR register.
460
461  Without this patch the console event show TFMR with no thread error:
462  (DEC parity error TFMR[59] injection) ::
463
464    [   53.737572] Severe Hypervisor Maintenance interrupt [Recovered]
465    [   53.737596]  Error detail: Timer facility experienced an error
466    [   53.737611]  HMER: 0840000000000000
467    [   53.737621]  TFMR: 3212000870e04000
468
469  After this patch it shows old TFMR value on host console: ::
470
471    [ 2302.267271] Severe Hypervisor Maintenance interrupt [Recovered]
472    [ 2302.267305]  Error detail: Timer facility experienced an error
473    [ 2302.267320]  HMER: 0840000000000000
474    [ 2302.267330]  TFMR: 3212000870e14010
475
476
477IBM FSP based platforms
478-----------------------
479
480- platforms/firenze: Rework I2C controller fixups
481- platforms/zz: Re-enable LXVPD slot information parsing
482
483  From memory this was disabled in the distant past since we were waiting
484  for an updates to the LXPVD format. It looks like that never happened
485  so re-enable it for the ZZ platform so that we can get PCI slot location
486  codes on ZZ.
487
488HIOMAP
489------
490- astbmc: Try IPMI HIOMAP for P8
491
492  The HIOMAP protocol was developed after the release of P8 in preparation
493  for P9. As a consequence P9 always uses it, but it has rarely been
494  enabled for P8. P8DTU has recently added IPMI HIOMAP support to its BMC
495  firmware, so enable its use in skiboot with P8 machines. Doing so
496  requires some rework to ensure fallback works correctly as in the past
497  the fallback was to mbox, which will only work for P9.
498- libflash/ipmi-hiomap: Enforce message size for empty response
499
500  The protocol defines the response to the associated messages as empty
501  except for the command ID and sequence fields. If the BMC is returning
502  extra data consider the message malformed.
503- libflash/ipmi-hiomap: Remove unused close handling
504
505  Issuing a HIOMAP_C_CLOSE is not required by the protocol specification,
506  rather a close can be implicit in a subsequent
507  CREATE_{READ,WRITE}_WINDOW request. The implicit close provides an
508  opportunity to reduce LPC traffic and the implementation takes up that
509  optimisation, so remove the case from the IPMI callback handler.
510- libflash/ipmi-hiomap: Overhaul event handling
511
512  Reworking the event handling was inspired by a bug report by Vasant
513  where the host would get wedged on multiple flash access attempts in the
514  face of a persistent error state on the BMC-side. The cause of this bug
515  was the early-exit based on ctx->update, which erronously assumed that
516  all events had been completely handled in prior calls to
517  ipmi_hiomap_handle_events(). This is not true if e.g.
518  HIOMAP_E_DAEMON_READY is clear in the prior calls.
519
520  Regardless, there were other correctness and efficiency problems with
521  the handling strategy:
522
523  * Ack-able event state was not restored in the face of errors in the
524    process of re-establishing protocol state
525  * It forced needless window restoration with respect to the context in
526    which ipmi_hiomap_handle_events() was called.
527  * Tests for HIOMAP_E_DAEMON_READY and HIOMAP_E_FLASH_LOST were redundant
528    with the overhauled error handling introduced in the previous patch
529
530  Fix all of the above issues and add comments to explain the event
531  handling flow.
532- libflash/ipmi-hiomap: Overhaul error handling
533
534  The aim is to improve the robustness with respect to absence of the
535  BMC-side daemon. The current error handling roughly mirrors what was
536  done for the mailbox implementation, but there's room for improvement.
537
538  Errors are split into two classes, those that affect the transport state
539  and those that affect the window validity. From here, we push the
540  transport state error checks right to the bottom of the stack, to ensure
541  the link is known to be in a good state before any message is sent.
542  Window validity tests remain as they were in the hiomap_window_move()
543  and ipmi_hiomap_read() functions. Validity tests are not necessary in
544  the write and erase paths as we will receive an error response from the
545  BMC when performing a dirty or flush on an invalid window.
546
547  Recovery also remains as it was, done on entry to the blocklevel
548  callbacks. If an error state is encountered in the middle of an
549  operation no attempt is made to recover it on the spot, instead the
550  error is returned up the stack and the caller can choose how it wishes
551  to respond.
552- libflash/ipmi-hiomap: Fix leak of msg in callback
553
554Since v6.3-rc1:
555
556- libflash/ipmi-hiomap: Fix blocks count issue
557
558  We convert data size to block count and pass block count to BMC.
559  If data size is not block aligned then we endup sending block count
560  less than actual data. BMC will write partial data to flash memory.
561
562  Sample log ::
563
564    [  594.388458416,7] HIOMAP: Marked flash dirty at 0x42010 for 8
565    [  594.398756487,7] HIOMAP: Flushed writes
566    [  594.409596439,7] HIOMAP: Marked flash dirty at 0x42018 for 3970
567    [  594.419897507,7] HIOMAP: Flushed writes
568
569  In this case HIOMAP sent data with block count=0 and hence BMC didn't
570  flush data to flash.
571
572
573
574POWER8
575------
576- hw/phb3/naples: Disable D-states
577
578  Putting "Mellanox Technologies MT27700 Family [ConnectX-4] [15b3:1013]"
579  (more precisely, the second of 2 its PCI functions, no matter in what
580  order) into the D3 state causes EEH with the "PCT timeout" error.
581  This has been noticed on garrison machines only and firestones do not
582  seem to have this issue.
583
584  This disables D-states changing for devices on root buses on Naples by
585  installing a config space access filter (copied from PHB4).
586- cpufeatures: Always advertise POWER8NVL as DD2
587
588  Despite the major version of PVR being 1 (0x004c0100) for POWER8NVL,
589  these chips are functionally equalent to P8/P8E DD2 levels.
590
591  This advertises POWER8NVL as DD2. As the result, skiboot adds
592  ibm,powerpc-cpu-features/processor-control-facility for such CPUs and
593  the linux kernel can use hypervisor doorbell messages to wake secondary
594  threads; otherwise "KVM: CPU %d seems to be stuck" would appear because
595  of missing LPCR_PECEDH.
596
597p8dtu Platform
598^^^^^^^^^^^^^^
599- p8dtu: Configure BMC graphics
600
601  We can no-longer read the values from the BMC in the way we have in the
602  past. Values were provided by Eric Chen of SMC.
603- p8dtu: Enable HIOMAP support
604
605Vesnin Platform
606^^^^^^^^^^^^^^^
607- platforms/vesnin: Disable PCIe port bifurcation
608
609  PCIe ports connected to CPU1 and CPU3 now work as x16 instead of x8x8.
610
611- Fix hang in pnv_platform_error_reboot path due to TOD failure.
612
613  On TOD failure, with TB stuck, when linux heads down to
614  pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
615  cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
616  all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
617  But with panic cpu stuck inside OPAL, linux never recovers/reboot. ::
618
619    p0 c1 t0
620    NIA : 0x000000003001dd3c <.time_wait+0x64>
621    CFAR : 0x000000003001dce4 <.time_wait+0xc>
622    MSR : 0x9000000002803002
623    LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
624
625    STACK: SP NIA
626    0x0000000031c236e0 0x0000000031c23760 (big-endian)
627    0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
628    0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
629    0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
630    0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
631    0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
632    0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
633    0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
634    0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
635    0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
636    0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
637    0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
638    0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
639    0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
640    0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
641    0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
642    0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
643    0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
644    0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
645    0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
646    0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
647    0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
648
649  This is because, there is a while loop towards the end of
650  ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
651  with "msg". It loops over time_wait_ms() until exit condition is met. In
652  normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
653  a chance to check ipmi response and set sync_msg to NULL. ::
654
655            while (sync_msg == msg)
656                    time_wait_ms(10);
657
658  But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
659  returns immediately without calling pollers and hence we end up looping
660  forever. This patch fixes this hang by calling opal_run_pollers() in TB
661  failed state as well.
662
663
664.. _skiboot-6.3-power9:
665
666POWER9
667------
668
669- Retry link training at PCIe GEN1 if presence detected but training repeatedly failed
670
671  Certain older PCIe 1.0 devices will not train unless the training process starts at GEN1 speeds.
672  As a last resort when a device will not train, fall back to GEN1 speed for the last training attempt.
673
674  This is verified to fix devices based on the Conexant CX23888 on the Talos II platform.
675- hw/phb4: Drop FRESET_DEASSERT_DELAY state
676
677  The delay between the ASSERT_DELAY and DEASSERT_DELAY states is set to
678  one timebase tick. This state seems to have been a hold over from PHB3
679  where it was used to add a 1s delay between de-asserting PERST and
680  polling the link for the CAPI FPGA. There's no requirement for that here
681  since the link polling on PHB4 is a bit smarter so we should be fine.
682- hw/phb4: Factor out PERST control
683
684  Some time ago Mikey added some code work around a bug we found where a
685  certain RAID card wouldn't come back again after a fast-reboot. The
686  workaround is setting the Link Disable bit before asserting PERST and
687  clear it after de-asserting PERST.
688
689  Currently we do this in the FRESET path, but not in the CRESET path.
690  This patch moves the PERST control into its own function to reduce
691  duplication and to the workaround is applied in all circumstances.
692- hw/phb4: Remove FRESET presence check
693
694  When we do an freset the first step is to check if a card is present in
695  the slot. However, this only occurs when we enter phb4_freset() with the
696  slot state set to SLOT_NORMAL. This occurs in:
697
698  a) The creset path, and
699  b) When the OS manually requests an FRESET via an OPAL call.
700
701  (a) is problematic because in the boot path the generic code will put the
702  slot into FRESET_START manually before calling into phb4_freset(). This
703  can result in a situation where a device is detected on boot, but not
704  after a CRESET.
705
706  I've noticed this occurring on systems where the PHB's slot presence
707  detect signal is not wired to an adapter. In this situation we can rely
708  on the in-band presence mechanism, but the presence check will make
709  us exit before that has a chance to work.
710
711  Additionally, if we enter from the CRESET path this early exit leaves
712  the slot's PERST signal being left asserted. This isn't currently an issue,
713  but if we want to support hotplug of devices into the root port it will
714  be.
715- hw/phb4: Skip FRESET PERST when coming from CRESET
716
717  PERST is asserted at the beginning of the CRESET process to prevent
718  the downstream device from interacting with the host while the PHB logic
719  is being reset and re-initialised. There is at least a 100ms wait during
720  the CRESET processing so it's not necessary to wait this time again
721  in the FRESET handler.
722
723  This patch extends the delay after re-setting the PHB logic to extend
724  to the 250ms PERST wait period that we typically use and sets the
725  skip_perst flag so that we don't wait this time again in the FRESET
726  handler.
727- hw/phb4: Look for the hub-id from in the PBCQ node
728
729  The hub-id is stored in the PBCQ node rather than the stack node so we
730  never add it to the PHB node. This breaks the lxvpd slot lookup code
731  since the hub-id is encoded in the VPD record that we need to find the
732  slot information.
733- hdata/iohub: Look for IOVPD on P9
734
735  P8 and P9 use the same IO VPD setup, so we need to load the IOHUB VPD on
736  P9 systems too.
737
738Since v6.3-rc2:
739
740- hw/phb4: Squash the IO bridge window
741
742  The PCI-PCI bridge spec says that bridges that implement an IO window
743  should hardcode the IO base and limit registers to zero.
744  Unfortunately, these registers only define the upper bits of the IO
745  window and the low bits are assumed to be 0 for the base and 1 for the
746  limit address. As a result, setting both to zero can be mis-interpreted
747  as a 4K IO window.
748
749  This patch fixes the problem the same way PHB3 does. It sets the IO base
750  and limit values to 0xf000 and 0x1000 respectively which most software
751  interprets as a disabled window.
752
753  lspci before patch: ::
754
755    0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
756            I/O behind bridge: 00000000-00000fff
757
758  lspci after patch: ::
759
760    0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
761            I/O behind bridge: None
762
763- hw/xscom: Enable sw xstop by default on p9
764
765  This was disabled at some point during bringup to make life easier for
766  the lab folks trying to debug NVLink issues. This hack really should
767  have never made it out into the wild though, so we now have the
768  following situation occuring in the field:
769
770  1) A bad happens
771  2) The host kernel recieves an unrecoverable HMI and calls into OPAL to
772     request a platform reboot.
773  3) OPAL rejects the reboot attempt and returns to the kernel with
774     OPAL_PARAMETER.
775  4) Kernel panics and attempts to kexec into a kdump kernel.
776
777  A side effect of the HMI seems to be CPUs becoming stuck which results
778  in the initialisation of the kdump kernel taking a extremely long time
779  (6+ hours). It's also been observed that after performing a dump the
780  kdump kernel then crashes itself because OPAL has ended up in a bad
781  state as a side effect of the HMI.
782
783  All up, it's not very good so re-enable the software checkstop by
784  default. If people still want to turn it off they can using the nvram
785  override.
786
787
788CAPI2
789^^^^^
790- capp/phb4: Prevent HMI from getting triggered when disabling CAPP
791
792  While disabling CAPP an HMI gets triggered as soon as ETU is put in
793  reset mode. This is caused as before we can disabled CAPP, it detects
794  PHB link going down and triggers an HMI requesting Opal to perform
795  CAPP recovery. This has an un-intended side effect of spamming the
796  Opal logs with malfunction alert messages and may also confuse the
797  user.
798
799  To prevent this we mask the CAPP FIR error 'PHB Link Down' Bit(31)
800  when we are disabling CAPP just before we put ETU in reset in
801  phb4_creset(). Also now since bringing down the PHB link now wont
802  trigger an HMI and CAPP recovery, hence we manually set the
803  PHB4_CAPP_RECOVERY flag on the phb to force recovery during creset.
804
805- phb4/capp: Implement sequence to disable CAPP and enable fast-reset
806
807  We implement h/w sequence to disable CAPP in disable_capi_mode() and
808  with it also enable fast-reset for CAPI mode in phb4_set_capi_mode().
809
810  Sequence to disable CAPP is executed in three phases. The first two
811  phase is implemented in disable_capi_mode() where we reset the CAPP
812  registers followed by PEC registers to their init values. The final
813  third final phase is to reset the PHB CAPI Compare/Mask Register and
814  is done in phb4_init_ioda3(). The reason to move the PHB reset to
815  phb4_init_ioda3() is because by the time Opal PCI reset state machine
816  reaches this function the PHB is already un-fenced and its
817  configuration registers accessible via mmio.
818- capp/phb4: Force CAPP to PCIe mode during kernel shutdown
819
820  This patch introduces a new opal syncer for PHB4 named
821  phb4_host_sync_reset(). We register this opal syncer when CAPP is
822  activated successfully in phb4_set_capi_mode() so that it will be
823  called at kernel shutdown during fast-reset.
824
825  During kernel shutdown the function will then repeatedly call
826  phb->ops->set_capi_mode() to switch switch CAPP to PCIe mode. In case
827  set_capi_mode() indicates its OPAL_BUSY, which indicates that CAPP is
828  still transitioning to new state; it calls slot->ops.run_sm() to
829  ensure that Opal slot reset state machine makes forward progress.
830
831
832Witherspoon Platform
833^^^^^^^^^^^^^^^^^^^^
834- platforms/witherspoon: Make PCIe shared slot error message more informative
835
836  If we're missing chips for some reason, we print a warning when configuring
837  the PCIe shared slot.
838
839  The warning doesn't really make it clear what "shared slot" is, and if it's
840  printed, it'll come right after a bunch of messages about NPU setup, so
841  let's clarify the message to explicitly mention PCI.
842- witherspoon: Add nvlink2 interconnect information
843
844  See :ref:`skiboot-6.3-new-features` for details.
845
846Zaius Platform
847^^^^^^^^^^^^^^
848
849- zaius: Add BMC description
850
851  Frederic reported that Zaius was failing with a NULL dereference when
852  trying to initialise IPMI HIOMAP. It turns out that the BMC wasn't
853  described at all, so add a description.
854
855p9dsu platform
856^^^^^^^^^^^^^^
857- p9dsu: Fix p9dsu default variant
858
859  Add the default when no riser_id is returned from the ipmi query.
860
861  Allow a little more time for BMC reply and cleanup some label strings.
862
863
864PCIe
865----
866
867See :ref:`skiboot-6.3-power9` for POWER9 specific PCIe changes.
868
869- core/pcie-slot: Don't bail early in the power on case
870
871  Exiting early in the power off case makes sense since we can't disable
872  slot power (or assert PERST) for suprise hotplug slots. However, we
873  should not exit early in the power-on case since it's possible slot
874  power may have been disabled (or just not enabled at boot time).
875- firenze-pci: Always init slot info from LXVPD
876
877  We can slot information from the LXVPD without having power control
878  information about that slot. This patch changes the init path so that
879  we always override the add_properties() call rather than only when we
880  have power control information about the slot.
881- fsp/lxvpd: Print more LXVPD slot information
882
883  Useful to know since it changes the behaviour of the slot core.
884- core/pcie-slot: Set power state from the PWRCTL flag
885
886  For some reason we look at the power control indicator and use that to
887  determine if the slot is "off" rather than the power control flag that
888  is used to power down the slot.
889
890  While we're here change the default behaviour so that the slot is
891  assumed to be powered on if there's no slot capability, or if there's
892  no power control available.
893- core/pci: Increase the max slot string size
894
895  The maximum string length for the slot label / device location code in
896  the PCI summary is currently 32 characters. This results in some IBM
897  location codes being truncated due to their length, e.g. ::
898
899    PHB#0001:02:11.0 [SWDN]  SLOT=C11  x8
900    PHB#0001:13:00.0 [EP  ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
901    PHB#0001:13:00.1 [EP  ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
902    PHB#0001:13:00.2 [EP  ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
903    PHB#0001:13:00.3 [EP  ] *snip* LOC_CODE=U78D3.ND1.WZS004A-P1-C
904
905  Which obscure the actual location of the card, and it looks bad. This
906  patch increases the maximum length of the label string to 80 characters
907  since that's the maximum length for a location code.
908
909
910Since v6.3-rc3:
911
912- pci: Try harder to add meaningful ibm,loc-code
913
914  We keep the existing logic of looking to the parent for the slot-label or
915  slot-location-code, but we add logic to (if all that fails) we look
916  directly for the slot-location-code (as this should give us the correct
917  loc code for things directly under the PHB), and otherwise we just look
918  for a loc-code.
919
920  The applicable bit of PAPR here is:
921
922      R1–12.1–1. Each instance of a hardware entity (FRU) has a platform
923      unique location code and any node in the OF
924      device tree that describes a part of a hardware entity must include the
925      “ibm,loc-code” property with a
926      value that represents the location code for that hardware entity.
927
928  which we weren't really fully obeying at any recent (ever?) point in
929  time. Now we should do okay, at least for PCI.
930
931Since v6.3-rc2:
932- core/pci: Use PHB io-base-location by default for PHB slots
933
934  On witherspoon only the GPU slots and the three pluggable PCI slots
935  (SLOT0, 1, 2) have platform defined slot names. For builtin devices such
936  as the SATA controller or the PLX switch that fans out to the GPU slots
937  we have no location codes which some people consider an issue.
938
939  This patch address the problem by making the ibm,slot-location-code for
940  the root port device default to the ibm,io-base-location-code which is
941  typically the location code for the system itself.
942
943  e.g. ::
944
945    pciex@600c3c0100000/ibm,loc-code
946                     "UOPWR.0000000-Node0-Proc0"
947
948    pciex@600c3c0100000/pci@0/ibm,loc-code
949                     "UOPWR.0000000-Node0-Proc0"
950
951    pciex@600c3c0100000/pci@0/usb-xhci@0/ibm,loc-code
952                     "UOPWR.0000000-Node0"
953
954  The PHB node, and the root complex nodes have a loc code of the
955  processor they are attached to, while the usb-xhci device under the
956  root port has a location code of the system itself.
957
958- hw/phb4: Read ibm,loc-code from PBCQ node
959
960  On P9 the PBCQs are subdivided by stacks which implement the PCI Express
961  logic. When phb4 was forked from phb3 most of the properties that were
962  in the pbcq node moved into the stack node, but ibm,loc-code was not one
963  of them. This patch fixes the phb4 init sequence to read the base
964  location code from the PBCQ node (parent of the stack node) rather than
965  the stack node itself.
966
967
968.. _skiboot-6.3-OpenCAPI:
969
970OpenCAPI
971--------
972- npu2/hw-procedures: Fix parallel zcal for opencapi
973
974  For opencapi, we currently do impedance calibration when initializing
975  the PHY for the device, which could run in parallel if we have
976  multiple opencapi devices. But if 2 devices are on the same
977  obus, the 2 calibration sequences could overlap, which likely yields
978  bad results and is useless anyway since it only needs to be done once
979  per obus.
980
981  This patch splits the opencapi PHY reset in 2 parts:
982
983  - a 'init' part called serially at boot. That's when zcal is done. If
984    we have 2 devices on the same socket, the zcal won't be redone,
985    since we're called serially and we'll see it has already be done for
986    the obus
987  - a 'reset' part called during fundamental reset as a prereq for link
988    training. It does the PHY setup for a set of lanes and the dccal.
989
990  The PHY team confirmed there's no dependency between zcal and the
991  other reset steps and it can be moved earlier.
992- npu2-hw-procedures: Fix zcal in mixed opencapi and nvlink mode
993
994  The zcal procedure needs to be run once per obus. We keep track of
995  which obus is already calibrated in an array indexed by the obus
996  number. However, the obus number is inferred from the brick index,
997  which works well for nvlink but not for opencapi.
998
999  Create an obus_index() function, which, from a device, returns the
1000  correct obus index, irrespective of the device type.
1001- npu2-opencapi: Fix adapter reset when using 2 adapters
1002
1003  If two opencapi adapters are on the same obus, we may try to train the
1004  two links in parallel at boot time, when all the PCI links are being
1005  trained. Both links use the same i2c controller to handle the reset
1006  signal, so some care is needed to make sure resetting one doesn't
1007  interfere with the reset of the other. We need to keep track of the
1008  current state of the i2c controller (and use locking).
1009
1010  This went mostly unnoticed as you need to have 2 opencapi cards on the
1011  same socket and links tended to train anyway because of the retries.
1012- npu2-opencapi: Extend delay after releasing reset on adapter
1013
1014  Give more time to the FPGA to process the reset signal. The previous
1015  delay, 5ms, is too short for newer adapters with bigger FPGAs. Extend
1016  it to 250ms.
1017  Ultimately, that delay will likely end up being added to the opencapi
1018  specification, but we are not there yet.
1019- npu2-opencapi: ODL should be in reset when enabled
1020
1021  We haven't hit any problem so far, but from the ODL designer, the ODL
1022  should be in reset when it is enabled.
1023
1024  The ODL remains in reset until we start a fundamental reset to
1025  initiate link training. We still assert and deassert the ODL reset
1026  signal as part of the normal procedure just before training the
1027  link. Asserting is therefore useless at boot, since the ODL is already
1028  in reset, but we keep it as it's only a scom write and it's needed
1029  when we reset/retrain from the OS.
1030- npu2-opencapi: Keep ODL and adapter in reset at the same time
1031
1032  Split the function to assert and deassert the reset signal on the ODL,
1033  so that we can keep the ODL in reset while we reset the adapter,
1034  therefore having a window where both sides are in reset.
1035
1036  It is actually not required with our current DLx at boot time, but I
1037  need to split the ODL reset function for the following patch and it
1038  will become useful/required later when we introduce resetting an
1039  opencapi link from the OS.
1040- npu2-opencapi: Setup perf counters to detect CRC errors
1041
1042  It's possible to set up performance counters for the PLL to detect
1043  various conditions for the links in nvlink or opencapi mode. Since
1044  those counters are currently unused, let's configure them when an obus
1045  is in opencapi mode to detect CRC errors on the link. Each link has
1046  two counters:
1047  - CRC error detected by the host
1048  - CRC error detected by the DLx (NAK received by the host)
1049
1050  We also dump the counters shortly after the link trains, but they can
1051  be read multiple times through cronus, pdbg or linux. The counters are
1052  configured to be reset after each read.
1053
1054Since v6.3-rc1:
1055
1056- opal/hmi: Never trust a cow!
1057
1058  With opencapi, it's fairly common to trigger HMIs during AFU
1059  development on the FPGA, by not replying in time to an NPU command,
1060  for example. So shift the blame reported by that cow to avoid crowding
1061  my mailbox.
1062- hw/npu2: Dump (more) npu2 registers on link error and HMIs
1063
1064  We were already logging some NPU registers during an HMI. This patch
1065  cleans up a bit how it is done and separates what is global from what
1066  is specific to nvlink or opencapi.
1067
1068  Since we can now receive an error interrupt when an opencapi link goes
1069  down unexpectedly, we also dump the NPU state but we limit it to the
1070  registers of the brick which hit the error.
1071
1072  The list of registers to dump was worked out with the hw team to
1073  allow for proper debugging. For each register, we print the name as
1074  found in the NPU workbook, the scom address and the register value.
1075- hw/npu2: Report errors to the OS if an OpenCAPI brick is fenced
1076
1077  Now that the NPU may report interrupts due to the link going down
1078  unexpectedly, report those errors to the OS when queried by the
1079  'next_error' PHB callback.
1080
1081  The hardware doesn't support recovery of the link when it goes down
1082  unexpectedly. So we report the PHB as dead, so that the OS can log the
1083  proper message, notify the drivers and take the devices down.
1084- hw/npu2: Fix OpenCAPI PE assignment
1085
1086  When we support mixing NVLink and OpenCAPI devices on the same NPU, we're
1087  going to have to share the same range of 16 PE numbers between NVLink and
1088  OpenCAPI PHBs.
1089
1090  For OpenCAPI devices, PE assignment is only significant for determining
1091  which System Interrupt Log register is used for a particular brick - unlike
1092  NVLink, it doesn't play any role in determining how links are fenced.
1093
1094  Split the PE range into a lower half which is used for NVLink, and an upper
1095  half that is used for OpenCAPI, with a fixed PE number assigned per brick.
1096
1097  As the PE assignment for OpenCAPI devices is fixed, set the PE once
1098  during device init and then ignore calls to the set_pe() operation.
1099
1100- opal-api: Reserve 2 OPAL API calls for future OpenCAPI LPC use
1101
1102  OpenCAPI Lowest Point of Coherency (LPC) memory is going to require
1103  some extra OPAL calls to set up NPU BARs. These calls will most likely be
1104  called OPAL_NPU_LPC_ALLOC and OPAL_NPU_LPC_RELEASE, we're not quite ready
1105  to upstream that code yet though.
1106
1107
1108
1109NVLINK2
1110-------
1111- npu2: Allow ATSD for LPAR other than 0
1112
1113  Each XTS MMIO ATSD# register is accompanied by another register -
1114  XTS MMIO ATSD0 LPARID# - which controls LPID filtering for ATSD
1115  transactions.
1116
1117  When a host system passes a GPU through to a guest, we need to enable
1118  some ATSD for an LPAR. At the moment the host assigns one ATSD to
1119  a NVLink bridge and this maps it to an LPAR when GPU is assigned to
1120  the LPAR. The link number is used for an ATSD index.
1121
1122  ATSD6&7 stay mapped to the host (LPAR=0) all the time which seems to be
1123  acceptable price for the simplicity.
1124- npu2: Add XTS_BDF_MAP wildcard refcount
1125
1126  Currently PID wildcard is programmed into the NPU once and never cleared
1127  up. This works for the bare metal as MSR does not change while the host
1128  OS is running.
1129
1130  However with the device virtualization, we need to keep track of wildcard
1131  entries use and clear them up before switching a GPU from a host to
1132  a guest or vice versa.
1133
1134  This adds refcount to a NPU2, one counter per wildcard entry. The index
1135  is a short lparid (4 bits long) which is allocated in opal_npu_map_lpar()
1136  and should be smaller than NPU2_XTS_BDF_MAP_SIZE (defined as 16).
1137
1138Since v6.3-rc2:
1139- npu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing by default
1140
1141  V100 GPUs are known to violate NVLink2 protocol in some cases (one is when
1142  memory was accessed by the CPU and they by GPU using so called block
1143  linear mapping) and issue double probes to NPU which can cope with this
1144  problem only if CONFIG_ENABLE_SNARF_CPM ("disable/enable Probe.I.MO
1145  snarfing a cp_m") is not set in the CQ_SM Misc Config register #0.
1146  If the bit is set (which is the case today), NPU issues the machine
1147  check stop.
1148
1149  The snarfing feature is designed to detect 2 probes in flight and combine
1150  them into one.
1151
1152  This adds a new "opal-npu2-snarf-cpm" nvram variable which controls
1153  CONFIG_ENABLE_SNARF_CPM for all NVLinks to prevent the machine check
1154  stop from happening.
1155
1156  This disables snarfing by default as otherwise a broken GPU driver can
1157  crash the entire box even when a GPU is passed through to a guest.
1158  This provides a dial to allow regression tests (might be useful for
1159  a bare metal). To enable snarfing, the user needs to run: ::
1160
1161    sudo nvram -p ibm,skiboot --update-config opal-npu2-snarf-cpm=enable
1162
1163  and reboot the host system.
1164
1165- hw/npu2: Show name of opencapi error interrupts
1166
1167
1168Debugging and simulation
1169------------------------
1170
1171- external/mambo: Error out if kernel is too large
1172
1173  If you're trying to boot a gigantic kernel in mambo (which you can
1174  reproduce by building a kernel with CONFIG_MODULES=n) you'll get
1175  misleading errors like: ::
1176
1177    WARNING: 0: (0): [0:0]: Invalid/unsupported instr 0x00000000[INVALID]
1178    WARNING: 0: (0):  PC(EA): 0x0000000030000010 PC(RA):0x0000000030000010 MSR: 0x9000000000000000 LR: 0x0000000000000000
1179    WARNING: 0: (0):  numInstructions = 0
1180    WARNING: 1: (1): [0:0]: Invalid/unsupported instr 0x00000000[INVALID]
1181    WARNING: 1: (1):  PC(EA): 0x0000000000000E40 PC(RA):0x0000000000000E40 MSR: 0x9000000000000000 LR: 0x0000000000000000
1182    WARNING: 1: (1):  numInstructions = 1
1183    WARNING: 1: (1): Interrupt to 0x0000000000000E40 from 0x0000000000000E40
1184    INFO: 1: (2): ** Execution stopped: Continuous Interrupt, Instruction caused exception,  **
1185
1186  So add an error to skiboot.tcl to warn the user before this happens.
1187  Making PAYLOAD_ADDR further back is one way to do this but if there's a
1188  less gross way to generally work around this very niche problem, I can
1189  suggest that instead.
1190- external/mambo: Populate kernel-base-address in the DT
1191
1192  skiboot.tcl defines PAYLOAD_ADDR as 0x20000000, which is the default in
1193  skiboot.  This is also the default in skiboot unless kernel-base-address
1194  is set in the device tree.
1195
1196  If you change PAYLOAD_ADDR to something else for mambo, skiboot won't
1197  see it because it doesn't set that DT property, so fix it so that it does.
1198- external/mambo: allow CPU targeting for most debug utils
1199
1200  Debug util functions target CPU 0:0:0 by default Some can be
1201  overidden explicitly per invocation, and others can't at all.
1202  Even for those that can be overidden, it is a pain to type
1203  them out when you're debugging a particular thread.
1204
1205  Provide a new 'target' function that allows the default CPU
1206  target to be changed. Wire that up that default to all other utils.
1207  Provide a new 'S' step command which only steps the target CPU.
1208- qemu: bt device isn't always hanging off /
1209
1210  Just use the normal for_each_compatible instead.
1211
1212  Otherwise in the qemu model as executed by op-test,
1213  we wouldn't go down the astbmc_init() path, thus not having flash.
1214- devicetree: Add p9-simics.dts
1215
1216  Add a p9-based devicetree that's suitable for use with Simics.
1217- devicetree: Move power9-phb4.dts
1218
1219  Clean up the formatting of power9-phb4.dts and move it to
1220  external/devicetree/p9.dts. This sets us up to include it as the basis
1221  for other trees.
1222- devicetree: Add nx node to power9-phb4.dts
1223
1224  A (non-qemu) p9 without an nx node will assert in p9_darn_init(): ::
1225
1226      dt_for_each_compatible(dt_root, nx, "ibm,power9-nx")
1227              break;
1228      if (!nx) {
1229              if (!dt_node_is_compatible(dt_root, "qemu,powernv"))
1230                    assert(nx);
1231              return;
1232      }
1233
1234  Since NX is this essential, add it to the device tree.
1235- devicetree: Fix typo in power9-phb4.dts
1236
1237  Change "impi" to "ipmi".
1238- devicetree: Fix syntax error in power9-phb4.dts
1239
1240  Remove the extra space causing this: ::
1241
1242      Error: power9-phb4.dts:156.15-16 syntax error
1243      FATAL ERROR: Unable to parse input tree
1244- core/init: enable machine check on secondaries
1245
1246  Secondary CPUs currently run with MSR[ME]=0 during boot, whih means
1247  if they take a machine check, the system will checkstop.
1248
1249  Enable ME where possible and allow them to print registers.
1250
1251Utilities
1252---------
1253- pflash: Don't try update RO ToC
1254
1255  In the future it's likely the ToC will be marked as read-only. Don't
1256  error out by assuming its writable.
1257- pflash: Support encoding/decoding ECC'd partitions
1258
1259  With the new --ecc option, pflash can add/remove ECC when
1260  reading/writing flash partitions protected by ECC.
1261
1262  This is *not* flawless with current PNORs out in the wild though, as
1263  they do not typically fill the whole partition with valid ECC data, so
1264  you have to know how big the valid ECC'd data is and specify the size
1265  manually. Note that for some partitions this is pratically impossible
1266  without knowing the details of the content of the partition.
1267
1268  A future patch is likely to introduce an option to "stop reading data
1269  when ECC starts failing and assume everything is okay rather than error
1270  out" to support reading the "valid" data from existing PNOR images.
1271
1272Since v6.3-rc2:
1273
1274- opal-prd: Fix memory leak in is-fsp-system check
1275- opal-prd: Check malloc return value
1276