1.. _skiboot-6.1:
2
3skiboot-6.1
4===========
5
6skiboot v6.1 was released on Wednesday July 11th 2018. It is the first
7release of skiboot 6.1, which is the new stable release of skiboot
8following the 6.0 release, first released May 11th 2018.
9
10Skiboot 6.1 is the basis for op-build v2.1 and contains all bug fixes as
11of :ref:`skiboot-6.0.5`, and :ref:`skiboot-5.4.9` (the currently maintained
12stable releases). We expect further stable releases in the 6.0.x and 5.4.x
13series, while we do not expect to do any stable releases of 6.1.x.
14
15This final 6.1 release follows a single release candidate release, as this
16cycle we have been rather quiet, with mainly cleanup and bug fix patches
17going in.
18
19For how the skiboot stable releases work, see :ref:`stable-rules` for details.
20
21Over skiboot-6.0, we have the following changes:
22
23General changes and bug fixes
24-----------------------------
25
26Since :ref:`skiboot-6.1-rc1`:
27
28- slw: Fix trivial typo in debug message
29- vpd: Add vendor property to processor node
30
31  Processor FRU vpd doesn't contain vendor detail. We have to parse
32  module VPD to get vendor detail.
33
34- vpd: Sanitize VPD data
35
36  On OpenPower system, VPD keyword size tells us the maximum size of the data.
37  But they fill trailing end with space (0x20) instead of NULL. Also spec
38  doesn't stop user to have space (0x20) within actual data.
39
40  This patch discards trailing spaces before populating device tree.
41- core: always flush console before stopping
42
43  This catches a few cases (e.g., fast reboot failure messages) that
44  don't always make it to the console before the machine is rebooted.
45- core/cpu: parallelise global CPU register setting jobs
46
47  On a 176 thread system, before: ::
48
49    [  122.319923233,5] OPAL: Switch to big-endian OS
50    [  126.317897467,5] OPAL: Switch to little-endian OS
51
52  after: ::
53
54    [  212.439299889,5] OPAL: Switch to big-endian OS
55    [  212.469323643,5] OPAL: Switch to little-endian OS
56- init, occ: Initialise OCC earlier on BMC systems
57
58  We need to use the OCC to obtain presence data for the SXM2 slots on
59  Witherspoon systems. This is needed to determine device type for NVLink
60  GPUs and OpenCAPI devices which can be plugged into the same slot. Support
61  for this will be implemented in a future patch.
62
63  Currently, OCC initialisation is done just before handing over to Linux,
64  which is well after NPU probe. On FSP systems, OCC boot starts very late,
65  so we wait until the last possible moment to initialise the skiboot side in
66  order to give it the maximum time to boot. On BMC systems, OCC boot starts
67  earlier, so there aren't any issues in moving it earlier in the skiboot
68  init sequence.
69
70  When running on a BMC machine, call occ_pstates_init() as early as
71  possible in the init sequence. On FSP machines, continue to call it from
72  its current location.
73
74Since :ref:`skiboot-6.0`:
75
76- GCC8 build fixes
77- Add prepare_hbrt_update to hbrt interfaces
78
79  Add placeholder support for prepare_hbrt_update call into
80  hostboot runtime (opal-prd) code.  This interface is only
81  called as part of a concurrent code update on a FSP based
82  system.
83- cpu: Clear PCR SPR in opal_reinit_cpus()
84
85  Currently if Linux boots with a non-zero PCR, things can go bad where
86  some early userspace programs can take illegal instructions. This is
87  being fixed in Linux, but in the mean time, we should cleanup in
88  skiboot also.
89- pci: Fix PCI_DEVICE_ID()
90
91  The vendor ID is 16 bits not 8. This error leaves the top of the vendor
92  ID in the bottom bits of the device ID, which resulted in e.g. a failure
93  to run the PCI quirk for the AST VGA device.
94- Quieten console output on boot
95
96  We print out a whole bunch of things on boot, most of which aren't
97  interesting, so we should *not* print them instead.
98
99  Printing things like what CPUs we found and what PCI devices we found
100  *are* useful, so continue to do that. But we don't need to splat out
101  a bunch of things that are always going to be true.
102- core/console: fix deadlock when printing with console lock held
103
104  Some debugging options will print while the console lock is held,
105  which is why the console lock is taken as a recursive lock.
106  However console_write calls __flush_console, which will drop and
107  re-take the lock non-recursively in some cases.
108
109  Just set con_need_flush and return from __flush_console if we are
110  holding the console lock already.
111
112  This stack usage message (taken with this patch applied) could lead
113  to a deadlock without this: ::
114
115    CPU 0000 lowest stack mark 11768 bytes left pc=300cb808 token=0
116    CPU 0000 Backtrace:
117    S: 0000000031c03370 R: 00000000300cb808   .list_check_node+0x1c
118    S: 0000000031c03410 R: 00000000300cb910   .list_check+0x38
119    S: 0000000031c034b0 R: 00000000300190ac   .try_lock_caller+0xb8
120    S: 0000000031c03540 R: 00000000300192e0   .lock_caller+0x80
121    S: 0000000031c03600 R: 0000000030012c70   .__flush_console+0x134
122    S: 0000000031c036d0 R: 00000000300130cc   .console_write+0x68
123    S: 0000000031c03780 R: 00000000300347bc   .vprlog+0xc8
124    S: 0000000031c03970 R: 0000000030034844   ._prlog+0x50
125    S: 0000000031c03a00 R: 00000000300364a4   .log_simple_error+0x74
126    S: 0000000031c03b90 R: 000000003004ab48   .occ_pstates_init+0x184
127    S: 0000000031c03d50 R: 000000003001480c   .load_and_boot_kernel+0x38c
128    S: 0000000031c03e30 R: 000000003001571c   .main_cpu_entry+0x62c
129    S: 0000000031c03f00 R: 0000000030002700   boot_entry+0x1c0
130- opal-prd: Do not error out on first failure for soft/hard offline.
131
132  The memory errors (CEs and UEs) that are detected as part of background
133  memory scrubbing are reported by PRD asynchronously to opal-prd along with
134  affected memory ranges. hservice_memory_error() converts these ranges into
135  page granularity before hooking up them to soft/hard offline-ing
136  infrastructure.
137
138  But the current implementation of hservice_memory_error() does not hookup
139  all the pages to soft/hard offline-ing if any of the page offline action
140  fails. e.g hard offline can fail for:
141
142  - Pages that are not part of buddy managed pool.
143  - Pages that are reserved by kernel using memblock_reserved()
144  - Pages that are in use by kernel.
145
146  But for the pages that are in use by user space application, the hard
147  offline marks the page as hwpoison, sends SIGBUS signal to kill the
148  affected application as recovery action and returns success.
149
150  Hence, It is possible that some of the pages in that memory range are in
151  use by application or free. By stopping on first error we loose the
152  opportunity to hwpoison the subsequent pages which may be free or in use by
153  application. This patch fixes this issue.
154- libflash/blocklevel_write: Fix missing error handling
155
156  Caught by scan-build, we seem to trap the errors in rc, but
157  not take any recovery action during blocklevel_write.
158
159I2C
160^^^
161- p8-i2c: fix wrong request status when a reset is needed
162
163  If the bus is found in error state when starting a new request, the
164  engine is reset and we enter recovery. However, once complete, the
165  reset operation shows a status of complete in the status register. So
166  any badly-timed called to check_status() will think the current top
167  request is complete, even though it hasn't run yet.
168
169  So don't update any request status while we are in recovery, as
170  nothing useful for the request is supposed to happen in that state.
171- p8-i2c: Remove force reset
172
173  Force reset was added as an attempt to work around some issues with TPM
174  devices locking up their I2C bus. In that particular case the problem
175  was that the device would hold the SCL line down permanently due to a
176  device firmware bug. The force reset doesn't actually do anything to
177  alleviate the situation here, it just happens to reset the internal
178  master state enough to make the I2C driver appear to work until
179  something tries to access the bus again.
180
181  On P9 systems with secure boot enabled there is the added problem
182  of the "diagostic mode" not being supported on I2C masters A,B,C and
183  D. Diagnostic mode allows the SCL and SDA lines to be driven directly
184  by software. Without this force reset is impossible to implement.
185
186  This patch removes the force reset functionality entirely since:
187
188  a) it doesn't do what it's supposed to, and
189  b) it's butt ugly code
190
191  Additionally, turn p8_i2c_reset_engine() into p8_i2c_reset_port().
192  There's no need to reset every port on a master in response to an
193  error that occurred on a specific port.
194- libstb/i2c-driver: Bump max timeout
195
196  We have observed some TPMs clock streching the I2C bus for signifigant
197  amounts of time when processing commands. The same TPMs also have
198  errata that can result in permernantly locking up a bus in response to
199  an I2C transaction they don't understand. Using an excessively long
200  timeout to prevent this in the field.
201- hdata: Add TPM timeout workaround
202
203  Set the default timeout for any bus containing a TPM to one second. This
204  is needed to work around a bug in the firmware of certain TPMs that will
205  clock strech the I2C port the for up to a second. Additionally, when the
206  TPM is clock streching it responds to a STOP condition on the bus by
207  bricking itself. Clearing this error requires a hard power cycle of the
208  system since the TPM is powered by standby power.
209- p8-i2c: Allow a per-port default timeout
210
211  Add support for setting a default timeout for the I2C port to the
212  device-tree. This is consumed by skiboot.
213
214IPMI Watchdog
215^^^^^^^^^^^^^
216- ipmi-watchdog: Support handling re-initialization
217
218  Watchdog resets can return an error code from the BMC indicating that
219  the BMC watchdog was not initialized. Currently we abort skiboot due to
220  a missing error handler. This patch implements handling
221  re-initialization for the watchdog, automatically saving the last
222  watchdog set values and re-issuing them if needed.
223- ipmi-watchdog: The stop action should disable reset
224
225  Otherwise it is possible for the reset timer to elapse and trigger the
226  watchdog to wake back up. This doesn't affect the behavior of the
227  system since we are providing a NONE action to the BMC. However we would
228  like to avoid the action from taking place if possible.
229- ipmi-watchdog: Add a flag to determine if we are still ticking
230
231  This makes it easier for future changes to ensure that the watchdog
232  stops ticking and doesn't requeue itself for execution in the
233  background. This way it is safe for resets to be performed after the
234  ticks are assumed to be stopped and it won't start the timer again.
235- ipmi-watchdog: (prepare for) not disabling at shutdown
236
237  The op-build linux kernel has been configured to support the ipmi
238  watchdog. This driver will always handle the watchdog by either leaving
239  it enabled if configured, or by disabling it during module load if no
240  configuration is provided. This increases the coverage of the watchdog
241  during the boot process. The watchdog should no longer be disabled at
242  any point during skiboot execution.
243
244  We're not enabling this by default yet as people can (and do, at least in
245  development) mix and match old BOOTKERNEL with new skiboot and we don't
246  want to break that too obviously.
247- ipmi-watchdog: Don't reset the watchdog twice
248
249  There is no clarification for why this change was needed, but presumably
250  this is due to a buggy BMC implementation where the Watchdog Set command
251  was processed concurrently or after the initial Watchdog Reset. This
252  inversion would cause the watchdog to stop since the DONT_STOP bit was
253  not set. Since we are now using the DONT_STOP bit during initialization,
254  the watchdog should not be stopped even if an inversion occurs.
255- ipmi-watchdog: Make it possible to set DONT_STOP
256
257  The IPMI standard supports setting a DONT_STOP bit during an Watchdog
258  Set operation. Most of the time we don't want to stop the Watchdog when
259  updating the settings so we should be using this bit. This patch makes
260  it possible for callers of set_wdt to prevent the watchdog from being
261  stopped. This only changes the behavior of the watchdog during the
262  initial settings update when initializing skiboot. The watchdog is no
263  longer disabled and then immediately re-enabled.
264- ipmi-watchdog: WD_POWER_CYCLE_ACTION -> WD_RESET_ACTION
265
266  The IPMI specification denotes that action 0x1 is Host Reset and 0x3 is
267  Host Power Cycle. Use the correct name for Reset in our watchdog code.
268
269
270POWER8 platforms
271----------------
272
273- astbmc: Enable mbox depending on scratch reg
274
275  P8 boxes can opt in for mbox pnor support if they set the scratch
276  register bit to indicate it is supported.
277
278Simulator platforms
279-------------------
280
281Since :ref:`skiboot-6.1-rc1`:
282
283- pmem: volatile bindings for the poorly enabled
284
285  PMEM_DISK bindings were added, but they rely on a rather
286  recent mmap feature. This patch steals from those bindings
287  to add volatile bindings. I've used these bindings with
288  PMEM_VOLATILE to launch an instance with the publicly
289  available systemsim-p9. The bindings are volatile and one
290  should not expect any data to be saved/retrieved.
291
292Since :ref:`skiboot-6.0`:
293
294- plat/qemu: add PNOR support
295
296  To access the PNOR, OPAL/skiboot drives the BMC SPI controller using
297  the iLPC2AHB device of the BMC SuperIO controller and accesses the
298  flash contents using the LPC FW address space on which the PNOR is
299  remapped.
300
301  The QEMU PowerNV machine now integrates such models (SuperIO
302  controller, iLPC2AHB device) and also a pseudo Aspeed SoC AHB memory
303  space populated with the SPI controller registers (same model as for
304  ARM). The AHB window giving access to the contents of the BMC SPI
305  controller flash modules is mapped on the LPC FW address space.
306
307  The change should be compatible for machine without PNOR support.
308- external/mambo: Add support for readline if it exists
309
310  Add support for tclreadline package if it is present.
311  This patch loads the package and uses it when the
312  simulation stops for any reason.
313
314
315FSP based platforms
316-------------------
317
318- Disable fast reboot on FSP IPL side change
319
320  If FSP changes next IPL side, then disable fast reboot.
321
322  sample output: ::
323
324      [  620.196442259,5] FSP: Got sysparam update, param ID 0xf0000007
325      [  620.196444501,5] CUPD: FW IPL side changed. Disable fast reboot
326      [  620.196445389,5] CUPD: Next IPL side : perm
327- fsp/console: Always establish OPAL console API backend
328
329  Currently we only call set_opal_console() to establish the backend
330  used by the OPAL console API if we find at least one FSP serial
331  port in HDAT.
332
333  On systems where there is none (IPMI only), we fail to set it,
334  causing the console code to try to use the dummy console causing
335  an assertion failure during boot due to clashing on the device-tree
336  node names.
337
338  So always set it if an FSP is present
339
340AST BMC based platforms
341-----------------------
342
343- AMI BMC: use 0x3a as OEM command
344
345  The 0x3a OEM command is for IBM commands, while 0x32 was for AMI ones.
346  Sometime in the P8 timeframe, AMI BMCs were changed to listen for our
347  commands on either 0x32 or 0x3a. Since 0x3a is the direction forward,
348  we'll use that, as P9 machines with AMI BMCs probably also want these
349  to work, and let's not bet that 0x32 will continue to be okay.
350- astbmc: Set romulus BMC type to OpenBMC
351- platform/astbmc: Do not delete compatible property
352
353  P9 onwards OPAL is building device tree for BMC based system using
354  HDAT. We are populating bmc/compatible node with bmc version. Hence
355  do not delete this property.
356
357Utilities
358---------
359- external/xscom-utils: Add python library for xscom access
360
361  Patch adds a simple python library module for xscom access.
362  It directly manipulate the '/access' file for scom read
363  and write from debugfs 'scom' directory.
364
365  Example on how to generate a getscom using this module:
366
367  .. code-block:: python
368
369     from adu_scoms import *
370     getscom = GetSCom()
371     getscom.parse_args()
372     getscom.run_command()
373
374  Sample output for above getscom.py:
375
376  .. code-block:: console
377
378    # ./getscom.py -l
379    Chip ID  | Rev   | Chip type
380    ---------|-------|-----------
381    00000008 | DD2.0 | P9 (Nimbus) processor
382    00000000 | DD2.0 | P9 (Nimbus) processor
383- ffspart: Don't require user to create blank partitions manually
384
385  Add '--allow-empty' which allows the filename for a given partition to
386  be blank. If set ffspart will set that part of the PNOR file 'blank' and
387  set ECC bits if required.
388  Without this option behaviour is unchanged and ffspart will return an
389  error if it can not find the partition file.
390- pflash: Use correct prefix when installing
391
392  pflash uses lowercase prefix when running make install in it's
393  direcetory, but uppercase PREFIX when running it in shared. Use
394  lowercase everywhere.
395
396  With this the OpenBMC bitbake recipie can drop an out of tree patch it's
397  been carrying for years.
398
399
400POWER9
401------
402
403Since :ref:`skiboot-6.1-rc1`:
404
405- occ: sensors: Fix the size of the phandle array 'sensors' in DT
406
407  Fixes: 99505c03f493 (present in v5.10-rc4)
408- phb4: Delay training till after PERST is deasserted
409
410  This helps some cards train on the second PERST (ie fast-reboot). The
411  reason is not clear why but it helps, so YOLO!
412
413Since :ref:`skiboot-6.0`:
414
415- occ-sensor: Avoid using uninitialised struct cpu_thread
416
417  When adding the sensors in occ_sensors_init, if the type is not
418  OCC_SENSOR_LOC_CORE, then the loop to find 'c' will not be executed.
419  Then c->pir is used for both of the the add_sensor_node calls below.
420
421  This provides a default value of 0 instead.
422- NX: Add NX coprocessor init opal call
423
424  The read offset (4:11) in Receive FIFO control register is incremented
425  by FIFO size whenever CRB read by NX. But the index in RxFIFO has to
426  match with the corresponding entry in FIFO maintained by VAS in kernel.
427  VAS entry is reset to 0 when opening the receive window during driver
428  initialization. So when NX842 is reloaded or in kexec boot, possibility
429  of mismatch between RxFIFO control register and VAS entries in kernel.
430  It could cause CRB failure / timeout from NX.
431
432  This patch adds nx_coproc_init opal call for kernel to initialize
433  readOffset (4:11) and Queued (15:23) in RxFIFO control register.
434- SLW: Remove stop1_lite and stop2_lite
435
436  stop1_lite has been removed since it adds no additional benefit
437  over stop0_lite. stop2_lite has been removed since currently it adds
438  minimal benefit over stop2. However, the benefit is eclipsed by the time
439  required to ungate the clocks
440
441  Moreover, Lite states don't give up the SMT resources, can potentially
442  have a performance impact on sibling threads.
443
444  Since current OSs (Linux) aren't smart enough to make good decisions
445  with these stop states, we're (temporarly) removing them from what
446  we expose to the OS, the idea being to bring them back in a new
447  DT representation so that only an OS that knows what to do will
448  do things with them.
449- cpu: Use STOP1 on POWER9 for idle/sleep inside OPAL
450
451  The current code requests STOP3, which means it gets STOP2 in practice.
452
453  STOP2 has proven to occasionally be unreliable depending on FW
454  version and chip revision, it also requires a functional CME,
455  so instead, let's use STOP1. The difference is rather minimum
456  for something that is only used a few seconds during boot.
457
458NPU2 (NVLink2 and OpenCAPI)
459^^^^^^^^^^^^^^^^^^^^^^^^^^^
460
461Since :ref:`skiboot-6.1-rc1`:
462
463- capi: Select the correct IODA table entry for the mbt cache.
464
465  With the current code, the capi mmio window is not correctly configured
466  in the IODA table entry. The first entry (generally the non-prefetchable
467  BAR) is overwrriten.
468  This patch sets the capi window bar at the right place.
469- npu2/hw-procedures: Fence bricks via NTL instead of MISC
470
471  There are a couple of places we can set/unset fence for a brick:
472
473  1. MISC register: NPU2_MISC_FENCE_STATE
474  2. NTL register for the brick: NPU2_NTL_MISC_CFG1(ndev)
475
476  Recent testing of ATS in combination with GPU reset has exposed a side
477  effect of using (1); if fence is set for all six bricks, it triggers a
478  sticky nmmu latch which prevents the NPU from getting ATR responses.
479  This manifests as a hang in the tests.
480
481  We have npu2_dev_fence_brick() which uses (1), and only two calls to it.
482  Replace the call which sets fence with a write to (2). Remove the
483  corresponding unset call entirely. It's unneeded because the procedures
484  already do a progression from full fence to half to idle using (2).
485
486- phb4/capp: Calculate STQ/DMA read engines based on link-width for PEC
487
488  Presently in CAPI mode the number of STQ/DMA-read engines allocated on
489  PEC2 for CAPP is fixed to 6 and 0-30 respectively irrespective of the
490  PCI link width. These values are only suitable for x8 cards and
491  quickly run out if a x16 card is plugged to a PEC2 attached slot. This
492  usually manifests as CAPP reporting TLBI timeout due to these messages
493  getting stalled due to insufficient STQs.
494
495  To fix this we update enable_capi_mode() to check if PEC2 chiplet is
496  in x16 mode and if yes then we allocate 4/0-47 STQ/DMA-read engines
497  for the CAPP traffic.
498
499  Fixes: 37ea3cfdc852 (present in v5.7-rc1)
500- npu2: Use same compatible string for NVLink and OpenCAPI link nodes in device tree
501
502  Currently, we distinguish between NPU links for NVLink devices and OpenCAPI
503  devices through the use of two different compatible strings - ibm,npu-link
504  and ibm,npu-link-opencapi.
505
506  As we move towards supporting configurations with both NVLink and OpenCAPI
507  devices behind a single NPU, we need to detect the device type as part of
508  presence detection, which can't happen until well after the point where the
509  HDAT or platform code has created the NPU device tree nodes. Changing a
510  node's compatible string after it's been created is a bit ugly, so instead
511  we should move the device type to a new property which we can add to the
512  node later on.
513
514  Get rid of the ibm,npu-link-opencapi compatible string, add a new
515  ibm,npu-link-type property, and a helper function to check the link type.
516  Add an "unknown" device type in preparation for later patches to detect
517  device type dynamically.
518
519  These device tree bindings are entirely internal to skiboot and are not
520  consumed directly by Linux, so this shouldn't break anything (other than
521  internal BML lab environments).
522- occ: Add support for GPU presence detection
523
524  On the Witherspoon platform, we need to distinguish between NVLink GPUs and
525  OpenCAPI accelerators. In order to do this, we first need to find out
526  whether the SXM2 socket is populated.
527
528  On Witherspoon, the SXM2 socket's presence detection pin is only visible
529  via I2C from the APSS, and thus can only be exposed to the host via the
530  OCC. The OCC, per OCC Firmware Interface Specification for POWER9 version
531  0.22, now exposes this to skiboot through a field in the dynamic data
532  shared memory.
533
534  Add the necessary dynamic data changes required to read the version and
535  GPU presence fields. Add a function, occ_get_gpu_presence(), that can be
536  used to check GPU presence.
537
538  If the OCC isn't reporting presence (old OCC firmware, or some other
539  reason), we default to assuming there is a device present and wait until
540  link training to fail.
541
542  This will be used in later patches to fix up the NPU2 probe path for
543  OpenCAPI support on Witherspoon.
544- hw/npu2, core/hmi: Use NPU instead of NPU2 as log message prefix
545
546  The NPU2{DBG,INF,ERR} macros use "NPU%d" as a prefix to identify messages
547  relating to a particular NPU.
548
549  It's slightly confusing to have per-NPU messages prefixed with "NPU0" or
550  "NPU1" and NPU-generic messages prefixed with "NPU2". On some future system
551  we could potentially have a NPU #2 in which case it'd be really confusing.
552
553  Use NPU rather than NPU2 for NPU-generic log messages. There's no risk of
554  confusion with the original npu.c code since that's only for P8.
555
556Since :ref:`skiboot-6.0`:
557
558- npu2: Reset NVLinks on hot reset
559
560  This effectively fences GPU RAM on GPU reset so the host system
561  does not have to crash every time we stop a KVM guest with a GPU
562  passed through.
563- npu2-opencapi: reduce number of retries to train the link
564
565  We've been reliably training the opencapi link on the first attempt
566  for quite a while. Furthermore, if it doesn't train on the first
567  attempt, retries haven't been that useful. So let's reduce the number
568  of attempts we do to train the link.
569
570  2 retries = 3 attempts to train.
571
572  Each (failed) training sequence costs about 3 seconds.
573- opal/hmi: Display correct chip id while printing NPU FIRs.
574
575  HMIs for NPU xstops are broadcasted to all chips. All cores on all the
576  chips receive HMI. HMI handler correctly identifies and extracts the
577  NPU FIR details from affected chip, but while printing FIR data it
578  prints chip id and location code details of this_cpu()->chip_id which
579  may not be correct. This patch fixes this issue.
580- npu2-opencapi: Fix link state to report link down
581
582  The PHB callback 'get_link_state' is always reporting the link width,
583  irrespective of the link status and even when the link is down. It is
584  causing too much work (and failures) when the PHB is probed during pci
585  init.
586  The fix is to look at the link status first and report the link as
587  down when appropriate.
588- npu2-opencapi: Cleanup traces printed during link training
589
590  Now that links may train in parallel, traces shown during training can
591  be all mixed up. So add a prefix to all the traces to clearly identify
592  the chip and link the trace refers to: ::
593
594    OCAPI[<chip id>:<link id>]: this is a very useful message
595
596  The lower-level hardware procedures (npu2-hw-procedures.c) also print
597  traces which would need work. But that code is being reworked to be
598  better integrated with opencapi and nvidia, so leave it alone for now.
599- npu2-opencapi: Train links on fundamental reset
600
601  Reorder our link training steps so that they are executed on
602  fundamental reset instead of during the initial setup. Skiboot always
603  call a fundamental reset on all the PHBs during pci init.
604
605  It is done through a state machine, similarly to what is done for
606  'real' PHBs.
607
608  This is the first step for a longer term goal to be able to trigger an
609  adapter reset from linux. We'll need the reset callbacks of the PHB to
610  be defined. We have to handle the various delays differently, since a
611  linux thread shouldn't stay stuck waiting in opal for too long.
612- npu2-opencapi: Rework adapter reset
613
614  Rework a bit the code to reset the opencapi adapter:
615
616  - make clearer which i2c pin is resetting which device
617  - break the reset operation in smaller chunks. This is really to
618    prepare for a future patch.
619
620  No functional changes.
621- npu2-opencapi: Use presence detection
622
623  Presence detection is not part of the opencapi specification. So each
624  platform may choose to implement it the way it wants.
625
626  All current platforms implement it through an i2c device where we can
627  query a pin to know if a device is connected or not. ZZ and Zaius have
628  a similar design and even use the same i2c information and pin
629  numbers.
630  However, presence detection on older ZZ planar (older than v4) doesn't
631  work, so we don't activate it for now, until our lab systems are
632  upgraded and it's better tested.
633
634  Presence detection on witherspoon is still being worked on. It's
635  shaping up to be quite different, so we may have to revisit the topic
636  in a later patch.
637
638Testing and CI
639--------------
640
641Since :ref:`skiboot-6.1-rc1`:
642
643- test/qemu: start building qemu again, and use our built qemu for tests
644
645  We need to use QEMU_BIN rather than QEMU as the makefiles define
646  QEMU already.
647- opal-ci: qemu: Use the powernv-3.0 branch
648
649  This is based off the current development version of Qemu, and
650  importantly it contains the patch that allows skiboot and Linux to clear
651  the PCR that we require to boot.
652