1.. _skiboot-6.0-rc1:
2
3skiboot-6.0-rc1
4================
5
6skiboot v6.0-rc1 was released on Tuesday May 1st 2018. It is the first
7release candidate of skiboot 6.0, which will become the new stable release
8of skiboot following the 5.11 release, first released April 6th 2018.
9
10Skiboot 6.0 will mark the basis for op-build v2.0 and will be required for
11POWER9 systems.
12
13skiboot v6.0-rc1 contains all bug fixes as of :ref:`skiboot-5.11`,
14:ref:`skiboot-5.10.5`, and :ref:`skiboot-5.4.9` (the currently maintained
15stable releases). Once 6.0 is released, we do *not* expect any further
16stable releases in the 5.10.x series, nor in the 5.11.x series.
17
18For how the skiboot stable releases work, see :ref:`stable-rules` for details.
19
20The current plan is to cut the final 6.0 in early May, with skiboot 6.0
21being for all POWER8 and POWER9 platforms in op-build v2.0.
22
23Over skiboot-5.11, we have the following changes:
24
25New Features
26------------
27- Disable stop states from OPAL
28
29  On ZZ, stop4,5,11 are enabled for PowerVM, even though doing
30  so may cause problems with OPAL due to bugs in hcode.
31
32  For other platforms, this isn't so much of an issue as
33  we can just control stop states by the MRW. However the
34  rebuild-the-world approach to changing values there is a bit
35  annoying if you just want to rule out a specific stop state
36  from being problematic.
37
38  Provide an nvram option to override what's disabled in OPAL.
39
40  The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)
41
42  You can set an NVRAM override with: ::
43
44      nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF
45
46  This nvram override will disable *all* stop states.
47- interrupts: Create an "interrupts" property in the OPAL node
48
49  Deprecate the old "opal-interrupts", it's still there, but the new
50  property follows the standard and allow us to specify whether an
51  interrupt is level or edge sensitive.
52
53  Similarly create "interrupt-names" whose content is identical to
54  "opal-interrupts-names".
55- SBE: Add timer support on POWER9
56
57  SBE on P9 provides one shot programmable timer facility. We can use this
58  to implement OPAL timers and hence limit the reliance on the Linux
59  heartbeat (similar to HW timer facility provided by SLW on P8).
60- Add SBE driver support
61
62  SBE (Self Boot Engine) on P9 has two different jobs:
63  - Boot the chip up to the point the core is functional
64  - Provide various services like timer, scom, stash MPIPL, etc., at runtime
65
66  We will use SBE for various purposes like timer, MPIPL, etc.
67
68- opal:hmi: Add missing processor recovery reason string.
69
70  With this patch now we see reason string printed for CORE_WOF[43] bit. ::
71
72    [  477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
73    [  477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
74    [  477.352242181,7] HMI: PC - Thread hang recovery
75- Add DIMM actual speed to device tree
76
77  Recent HDAT provides DIMM actuall speed. Lets add this to device tree.
78- Fix DIMM size property
79
80  Today we parse vpd blob to get DIMM size information. This is limited
81  to FSP based system. HDAT provides DIMM size value. Lets use that to
82  populate device tree. So that we can get size information on BMC based
83  system as well.
84
85- PCI: Set slot power limit when supported
86
87  The PCIe slot capability can be implemented in a root or switch
88  downstream port to set the maximum power a card is allowed to draw
89  from the system. This patch adds support for setting the power limit
90  when the platform has defined one.
91- hdata/spira: parse vpd to add part-number and serial-number to xscom@ node
92
93  Expected by FWTS and associates our processor with the part/serial
94  number, which is obviously a good thing for one's own sanity.
95
96
97Improved HMI Handling
98^^^^^^^^^^^^^^^^^^^^^
99
100- opal/hmi: Add documentation for opal_handle_hmi2 call
101- opal/hmi: Generate hmi event for recovered HDEC parity error.
102- opal/hmi: check thread 0 tfmr to validate latched tfmr errors.
103
104  Due to P9 errata, HDEC parity and TB residue errors are latched for
105  non-zero threads 1-3 even if they are cleared. But these are not
106  latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr
107  value and ignore them on non-zero threads if they are not present on
108  thread 0.
109- opal/hmi: Print additional debug information in rendezvous.
110- opal/hmi: Fix handling of TFMR parity/corrupt error.
111
112  While testing TFMR parity/corrupt error it has been observed that HMIs are
113  delivered twice for this error
114
115    - First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
116    - Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.
117
118  On second HMI we end up throwing "HMI: TB invalid without core error
119  reported" even though TB is in a valid state.
120- opal/hmi: Stop flooding HMI event for TOD errors.
121
122  Fix the issue where every thread on the chip sends HMI event to host for
123  TOD errors. TOD errors are reported to all the core/threads on the chip.
124  Any one thread can fix the error and send event. Rest of the threads don't
125  need to send HMI event unnecessarily.
126- opal/hmi: Fix soft lockups during TOD errors
127
128  There are some TOD errors which do not affect working of TOD and TB. They
129  stay in valid state. Hence we don't need rendez vous for TOD errors that
130  does not affect TB working.
131
132  TOD errors that affects TOD/TB will report a global error on TFMR[44]
133  alongwith bit 51, and they will go in rendez vous path as expected.
134
135  But the TOD errors that does not affect TB register sets only TFMR bit 51.
136  The TFMR bit 51 is cleared when any single thread clears the TOD error.
137  Once cleared, the bit 51 is reflected to all the cores on that chip. Any
138  thread that reads the TFMR register after the error is cleared will see
139  TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through
140  rendez-vous path and threads that see TFMR[51]=0, returns doing
141  nothing. This ends up in a soft lockups in host kernel.
142
143  This patch fixes this issue by not considering TOD interrupt (TFMR[51])
144  as a core-global error and hence avoiding rendez-vous path completely.
145  Instead threads that see TFMR[51]=1 will now take different path that
146  just do the TOD error recovery.
147- opal/hmi: Do not send HMI event if no errors are found.
148
149  For TOD errors, all the cores in the chip get HMIs. Any one thread from any
150  core can fix the issue and TFMR will have error conditions cleared. Rest of
151  the threads need take any action if TOD errors are already cleared. Hence
152  thread 0 of every core should get a fresh copy of TFMR before going ahead
153  recovery path. Initialize recover = -1, so that if no errors found that
154  thread need not send a HMI event to linux. This helps in stop flooding host
155  with hmi event by every thread even there are no errors found.
156- opal/hmi: Initialize the hmi event with old value of HMER.
157
158  Do this before we check for TFAC errors. Otherwise the event at host console
159  shows no error reported in HMER register.
160
161  Without this patch the console event show HMER with all zeros ::
162
163    [  216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
164    [  216.753498]  Error detail: Timer facility experienced an error
165    [  216.753509]  HMER: 0000000000000000
166    [  216.753518]  TFMR: 3c12000870e04000
167
168  After this patch it shows old HMER values on host console: ::
169
170    [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
171    [ 2237.652651]  Error detail: Timer facility experienced an error
172    [ 2237.652766]  HMER: 0840000000000000
173    [ 2237.652837]  TFMR: 3c12000870e04000
174- opal/hmi: Rework HMI handling of TFAC errors
175
176  This patch reworks the HMI handling for TFAC errors by introducing
177  4 rendez-vous points improve the thread synchronization while handling
178  timebase errors that requires all thread to clear dirty data from TB/HDEC
179  register before clearing the errors.
180- opal/hmi: Don't bother passing HMER to pre-recovery cleanup
181
182  The test for TFAC error is now redundant so we remove it and
183  remove the HMER argument.
184- opal/hmi: Move timer related error handling to a separate function
185
186  Currently no functional change. This is a first step to completely
187  rewriting how these things are handled.
188- opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux
189
190  It returns a 64-bit flags mask currently set to provide info
191  about which timer facilities were lost, and whether an event
192  was generated.
193- opal/hmi: Remove races in clearing HMER
194
195  Writing to HMER acts as an "AND". The current code writes back the
196  value we originally read with the bits we handled cleared. This is
197  racy, if a new bit gets set in HW after the original read, we'll end
198  up clearing it without handling it.
199
200  Instead, use an all 1's mask with only the bit handled cleared.
201- opal/hmi: Don't re-read HMER multiple times
202
203  We want to make sure all reporting and actions are based
204  upon the same snapshot of HMER in case bits get added
205  by HW while we are in OPAL.
206
207libflash and ffspart
208^^^^^^^^^^^^^^^^^^^^
209
210Many improvements to the `ffspart` utility and `libflash` have come
211in this release, making `ffspart` suitable for building bit-identical
212PNOR images as the existing tooling used by `op-build`. The plan is to
213switch `op-build` to use this infrastructure in the not too distant
214future.
215
216- libflash/blocklevel: Make read/write be ECC agnostic for callers
217
218  The blocklevel abstraction allows for regions of the backing store to be
219  marked as ECC protected so that blocklevel can decode/encode the ECC
220  bytes into the buffer automatically without the caller having to be ECC
221  aware.
222
223  Unfortunately this abstraction is far from perfect, this is only useful
224  if reads and writes are performed at the start of the ECC region or in
225  some circumstances at an ECC aligned position - which requires the
226  caller be aware of the ECC regions.
227
228  The problem that has arisen is that the blocklevel abstraction is
229  initialised somewhere but when it is later called the caller is unaware
230  if ECC exists in the region it wants to arbitrarily read and write to.
231  This should not have been a problem since blocklevel knows. Currently
232  misaligned reads will fail ECC checks and misaligned writes will
233  overwrite ECC bytes and the backing store will become corrupted.
234
235  This patch add the smarts to blocklevel_read() and blocklevel_write() to
236  cope with the problem. Note that ECC can always be bypassed by calling
237  blocklevel_raw_() functions.
238
239  All this work means that the gard tool can can safely call
240  blocklevel_read() and blocklevel_write() and as long as the blocklevel
241  knows of the presence of ECC then it will deal with all cases.
242
243  This also commit removes code in the gard tool which compensated for
244  inadequacies no longer present in blocklevel.
245- libflash/blocklevel: Return region start from ecc_protected()
246
247  Currently all ecc_protected() does is say if a region is ECC protected
248  or not. Knowing a region is ECC protected is one thing but there isn't
249  much that can be done afterwards if this is the only known fact. A lot
250  more can be done if the caller is told where the ECC region begins.
251
252  Knowing where the ECC region start it allows to caller to align its
253  read/and writes. This allows for more flexibility calling read and write
254  without knowing exactly how the backing store is organised.
255- libflash/ecc: Add helpers to align a position within an ecc buffer
256
257  As part of ongoing work to make ECC invisible to higher levels up the
258  stack this function converts a 'position' which should be ECC agnostic
259  to the equivalent position within an ECC region starting at a specified
260  location.
261- libflash/ecc: Add functions to deal with unaligned ECC memcpy
262- external/ffspart: Improve error output
263- libffs: Fix bad checks for partition overlap
264
265  Not all TOCs are written at zero
266- libflash/libffs: Allow caller to specifiy header partition
267
268  An FFS TOC is comprised of two parts. A small header which has a magic
269  and very minimmal information about the TOC which will be common to all
270  partitions, things like number of patritions, block sizes and the like.
271  Following this small header are a series of entries. Importantly there
272  is always an entry which encompases the TOC its self, this is usually
273  called the 'part' partition.
274
275  Currently libffs always assumes that the 'part' partition is at zero.
276  While there is always a TOC and zero there doesn't actually have to be.
277  PNORs may have multiple TOCs within them, therefore libffs needs to be
278  flexible enough to allow callers to specify TOCs not at zero.
279
280  The 'part' partition is otherwise a regular partition which may have
281  flags associated with it. libffs should allow the user to set the flags
282  for the 'part' partition.
283
284  This patch achieves both by allowing the caller to specify the 'part'
285  partition. The caller can not and libffs will provide a sensible
286  default.
287- libflash/libffs: Refcount ffs entries
288
289  Currently consumers can add an new ffs entry to multiple headers, this
290  is fine but freeing any of the headers will cause the entry to be freed,
291  this causes double free problems.
292
293  Even if only one header is uses, the consumer of the library still has a
294  reference to the entry, which they may well reuse at some other point.
295
296  libffs will now refcount entries and only free when there are no more
297  references.
298
299  This patch also removes the pointless return value of ffs_hdr_free()
300- libflash/libffs: Switch to storing header entries in an array
301
302  Since the libffs no longer needs to sort the entries as they get added
303  it makes little sense to have the complexity of a linked list when an
304  array will suffice.
305- libflash/libffs: Remove backup partition from TOC generation code
306
307  It turns out this code was messy and not all that reliable. Doing it at
308  the library level adds complexity to the library and restrictions to the
309  caller.
310
311  A simpler approach can be achived with the just instantiating multiple
312  ffs_header structures pointing to different parts of the same file.
313- libflash/libffs: Remove the 'sides' from the FFS TOC generation code
314
315  It turns out this code was messy and not all that reliable. Doing it at
316  the library level adds complexity to the library and restrictions to the
317  caller.
318
319  A simpler approach can be achived with the just instantiating multiple
320  ffs_header structures pointing to different parts of the same file.
321- libflash/libffs: Always add entries to the end of the TOC
322
323  It turns out that sorted order isn't the best idea. This removes
324  flexibility from the caller. If the user wants their partitions in
325  sorted order, they should insert them in sorted order.
326- external/ffspart: Remove side, order and backup options
327
328  These options are currently flakey in libflash/libffs so there isn't
329  much point to being able to use them in ffspart.
330
331  Future reworks planned for libflash/libffs will render these options
332  redundant anyway.
333- libflash/libffs: ffs_close() should use ffs_hdr_free()
334- libflash/libffs: Add setter for a partitions actual size
335- pflash: Use ffs_entry_user_to_string() to standardise flag strings
336- libffs: Standardise ffs partition flags
337
338  It seems we've developed a character respresentation for ffs partition
339  flags. Currently only pflash really prints them so it hasn't been a
340  problem but now ffspart wants to read them in from user input.
341
342  It is important that what libffs reads and what pflash prints remain
343  consistent, we should move the code into libffs to avoid problems.
344- external/ffspart: Allow # comments in input file\
345
346p9dsu Platform changes
347----------------------
348
349The p9dsu platform from SuperMicro (also known as 'Boston') has received
350a number of updates, and the patches once carried by SuperMicro are now
351upstream.
352
353- p9dsu: detect p9dsu variant even when hostboot doesn't tell us
354
355  The SuperMicro BMC can tell us what riser type we have, which dictates
356  the PCI slot tables. Usually, in an environment that a customer would
357  experience, Hostboot will do the query with an SMC specific patch
358  (not upstream as there's no platform specific code in hostboot)
359  and skiboot knows what variant it is based on the compatible string.
360
361  However, if you're using upstream hostboot, you only get the bare
362  'p9dsu' compatible type. We can work around this by asking the BMC
363  ourselves and setting the slot table appropriately. We do this
364  syncronously in platform init so that we don't start probing
365  PCI before we setup the slot table.
366- p9dsu: add slot power limit.
367- p9dsu: add pci slot table for Boston LC 1U/2U and Boston LA/ESS.
368- p9dsu HACK: fix system-vpd eeprom
369- p9dsu: change esel command from AMI to IBM 0x3a.
370
371ZZ Platform Changes
372-------------------
373
374- hdata/i2c: Fix up pci hotplug labels
375
376  These labels are used on the devices used to do PCIe slot power control
377  for implementing PCIe hotplug. I'm not sure how they ended up as
378  "eeprom-pgood" and "eeprom-controller" since that doesn't make any sense.
379- hdata/i2c: Ignore multi-port I2C devices
380
381  Recent FSP firmware builds add support for multi-port I2C devices such
382  as the GPIO expanders used for the presence detect of OpenCAPI devices
383  and the PCIe hotplug controllers used to power cycle PCIe slots on ZZ.
384
385  The OpenCAPI driver inside of skiboot currently uses a platform-specific
386  method to talk to the relevant I2C device rather than relying on HDAT
387  since not all platforms correctly report the I2C devices (hello Zaius).
388  Additionally the nature of multi-port devices require that we a device
389  specific handler so that we generate the correct DT bindings. Currently
390  we don't and there is no immediate need for this support so just ignore
391  the multi-port devices for now.
392- hdata/i2c: Replace `i2c_` prefix with `dev_`
393
394  The current naming scheme makes it easy to conflate "i2cm_port" and
395  "i2c_port." The latter is used to describe multi-port I2C devices such
396  as GPIO expanders and multi-channel PCIe hotplug controllers. Rename
397  i2c_port to dev_port to make the two a bit more distinct.
398
399  Also rename i2c_addr to dev_addr for consistency.
400- hdata/i2c: Ignore CFAM I2C master
401
402  Recent FSP firmware builds put in information about the CFAM I2C master
403  in addition the to host I2C masters accessible via XSCOM. Odds are this
404  information should not be there since there's no handshaking between the
405  FSP/BMC and the host over who controls that I2C master, but it is so
406  we need to deal with it.
407
408  This patch adds filtering to the HDAT parser so it ignores the CFAM I2C
409  master. Without this it will create a bogus i2cm@<addr> which migh cause
410  issues.
411- ZZ: hw/imc: Add support to load imc catalog lid file
412
413  Add support to load the imc catalog from a lid file packaged
414  as part of the system firmware. Lid number allocated
415  is 0x80f00103.lid.
416
417
418Bugs Fixed
419----------
420- core: Fix iteration condition to skip garded cpu
421- uart: fix uart_opal_flush to take console lock over uart_con_flush
422  This bug meant that OPAL_CONSOLE_FLUSH didn't take the appropriate locks.
423  Luckily, since this call is only currently used in the crash path.
424- xive: fix missing unlock in error path
425- OPAL_PCI_SET_POWER_STATE: fix locking in error paths
426
427  Otherwise we could exit OPAL holding locks, potentially leading
428  to all sorts of problems later on.
429- hw/slw: Don't assert on a unknown chip
430
431  For some reason skiboot populates nodes in /cpus/ for the cores on
432  chips that are deconfigured. As a result Linux includes the threads
433  of those cores in it's set of possible CPUs in the system and attempts
434  to set the SPR values that should be used when waking a thread from
435  a deep sleep state.
436
437  However, in the case where we have deconfigured chip we don't create
438  a xscom node for that chip and as a result we don't have a proc_chip
439  structure for that chip either. In turn, this results in an assertion
440  failure when calling opal_slw_set_reg() since it expects the chip
441  structure to exist. Fix this up and print an error instead.
442- opal/hmi: Generate one event per core for processor recovery.
443
444  Processor recovery is per core error. All threads on that core receive
445  HMI. All threads don't need to generate HMI event for same error.
446
447  Let thread 0 only generate the event.
448- sensors: Dont add DTS sensors when OCC inband sensors are available
449
450  There are two sets of core temperature sensors today. One is DTS scom
451  based core temperature sensors and the second group is the sensors
452  provided by OCC. DTS is the highest temperature among the different
453  temperature zones in the core while OCC core temperature sensors are
454  the average temperature of the core. DTS sensors are read directly by
455  the host by SCOMing the DTS sensors while OCC sensors are read and
456  updated by OCC to main memory.
457
458  Reading DTS sensors by SCOMing is a heavy and slower operation as
459  compared to reading OCC sensors which is as good as reading memory.
460  So dont add DTS sensors when OCC sensors are available.
461- core/fast-reboot: Increase timeout for dctl sreset to 1sec
462
463  Direct control xscom can take more time to complete. We seem to
464  wait too little on Boston failing fast-reboot for no good reason.
465
466  Increase timeout to 1 sec as a reasonable value for sreset to be delivered
467  and core to start executing instructions.
468- occ: sensors-groups: Add DT properties to mark HWMON sensor groups
469
470  Fix the sensor type to match HWMON sensor types. Add compatible flag
471  to indicate the environmental sensor groups so that operations on
472  these groups can be handled by HWMON linux interface.
473- core: Correctly load initramfs in stb container
474
475  Skiboot does not calculate the actual size and start location of the
476  initramfs if it is wrapped by an STB container (for example if loading
477  an initramfs from the ROOTFS partition).
478
479  Check if the initramfs is in an STB container and determine the size and
480  location correctly in the same manner as the kernel. Since
481  load_initramfs() is called after load_kernel() move the call to
482  trustedboot_exit_boot_services() into load_and_boot_kernel() so it is
483  called after both of these.
484- hdat/i2c.c: quieten "v2 found, parsing as v1"
485- hw/imc: Check for pause_microcode_at_boot() return status
486
487  pause_microcode_at_boot() loops through all the chip's ucode
488  control block and pause the ucode if it is in the running state.
489  But it does not fail if any of the chip's ucode is not initialised.
490
491  Add code to return a failure if ucode is not initialized in any
492  of the chip. Since pause_microcode_at_boot() is called just before
493  attaching the IMC device nodes in imc_init(), add code to check for
494  the function return.
495
496
497Slot location code fixes:
498
499- npu2: Use ibm, loc-code rather than ibm, slot-label
500
501  The ibm,slot-label property is to name the slot that appears under a
502  PCIe bridge. In the past we (ab)used the slot tables to attach names
503  to GPU devices and their corresponding NVLinks which resulted in npu2.c
504  using slot-label as a location code rather than as a way to name slots.
505
506  Fix this up since it's confusing.
507- hdata/slots: Apply slot label to the parent slot
508
509  Slot names only really make sense when applied to an actual slot rather
510  than a device. On witherspoon the GPU devices have a name associated with
511  the device rather than the slot for the GPUs. Add a hack that moves the
512  slot label to the parent slot rather than on the device itself.
513- pci-dt-slot: Big ol' cleanup
514
515  The underlying data that we get from HDAT can only really describe a
516  PCIe system. As such we can simplify the devicetree slot lookup code
517  by only caring about the important cases, namly, root ports and switch
518  downstream ports.
519
520  This also fixes a bug where root port didn't get a Slot label applied
521  which results in devices under that port not having ibm,loc-code set.
522  This results in the EEH core being unable to report the location of
523  EEHed devices under that port.
524
525opal-prd
526^^^^^^^^
527- opal-prd: Insert powernv_flash module
528
529  Explictly load powernv_flash module on BMC based system so that we are sure
530  that flash device is created before starting opal-prd daemon.
531
532  Note that I have replaced pnor_available() check with is_fsp_system(). As we
533  want to load module on BMC system only. Also pnor_init has enough logic to
534  detect flash device. Hence pnor_available() becomes redundant check.
535
536NPU2/NVLINK2
537^^^^^^^^^^^^
538- npu2/hw-procedures: fence bricks on GPU reset
539
540  The NPU workbook defines a way of fencing a brick and
541  getting the brick out of fence state. We do have an implementation
542  of bringing the brick out of fenced/quiesced state. We do
543  the latter in our procedures, but to support run time reset
544  we need to do the former.
545
546  The fencing ensures that access to memory behind the links
547  will not lead to HMI's, but instead SUE's will be populated
548  in cache (in the case of speculation). The expectation is then
549  that prior to and after reset, the operating system components
550  will flush the cache for the region of memory behind the GPU.
551
552  This patch does the following:
553
554  1. Implements a npu2_dev_fence_brick() function to set/clear
555     fence state
556  2. Clear FIR bits prior to clearing the fence status
557  3. Clear's the fence status
558  4. We take the powerbus out of CQ fence much later now,
559     in credits_check() which is the last hardware procedure
560     called after link training.
561- hw/npu2.c: Remove static configuration of NPU2 register
562
563  The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to
564  select NVLink mode, however Hostboot should configure other bits in this
565  register.
566
567  For some reason Skiboot was explicitly clearing bit-6
568  (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared
569  as recent Hostboot versions explicitly set it to the correct value based on
570  the specific system configuration. Therefore Skiboot should not alter it.
571
572  Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or
573  not. Hostboot does not configure this bit so Skiboot should continue to
574  configure it.
575- npu2: Improve log output of GPU-to-link mapping
576
577  Debugging issues related to unconnected NVLinks can be a little less
578  irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog().
579
580  In short, change this: ::
581
582      NPU2: comparing GPU 'GPU2' and NPU2 'GPU1'
583      NPU2: comparing GPU 'GPU3' and NPU2 'GPU1'
584      NPU2: comparing GPU 'GPU4' and NPU2 'GPU1'
585      NPU2: comparing GPU 'GPU5' and NPU2 'GPU1'
586            :
587      npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem.
588
589  to this: ::
590
591      NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1'
592      NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1'
593      NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1'
594      NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1'
595            :
596      NPU6:0:1.0 No PCI device found for slot 'GPU1'
597- npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context init
598
599  A bad GPU or other condition may leave us with a subset of links that
600  never get initialized. If an ATSD is sent to one of those bricks, it
601  will never complete, leaving us waiting forever for a response: ::
602
603    watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050]
604    ...
605    Modules linked in: nvidia_uvm(O) nvidia(O)
606    CPU: 23 PID: 2050 Comm: acos Tainted: G        W  O    4.14.0 #2
607    task: c0000000285cfc00 task.stack: c000001fea860000
608    NIP:  c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60
609    REGS: c000001fea863550 TRAP: 0901   Tainted: G        W  O     (4.14.0)
610    MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28004484  XER: 20040000
611    CFAR: c0000000000abdf4 SOFTE: 1
612    GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820
613    GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560
614    GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff
615    GPR12: 0000000000008000 c000000003167e80
616    NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0
617    LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370
618
619
620  ATSDs are only sent to bricks which have a valid entry in the XTS_BDF
621  table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless
622  we make it all the way to creating a context for the BDF.
623
624Secure and Trusted Boot
625^^^^^^^^^^^^^^^^^^^^^^^
626- hdata/tpmrel: detect tpm not present by looking up the stinfo->status
627
628  Skiboot detects if tpm is present by checking if a secureboot_tpm_info
629  entry exists. However, if a tpm is not present, hostboot also creates a
630  secureboot_tpm_info entry. In this case, hostboot creates an empty
631  entry, but setting the field tpm_status to TPM_NOT_PRESENT.
632
633  This detects if tpm is not present by looking up the stinfo->status.
634
635  This fixes the "TPMREL: TPM node not found for chip_id=0 (HB bug)"
636  issue, reproduced when skiboot is running on a system that has no tpm.
637
638PCI
639^^^
640- phb4: Restore bus numbers after CRS
641
642  Currently we restore PCIe bus numbers right after the link is
643  up. Unfortunately as this point we haven't done CRS so config space
644  may not be accessible.
645
646  This moves the bus number restore till after CRS has happened.
647- romulus: Add a barebones slot table
648- phb4: Quieten and improve "Timeout waiting for electrical link"
649
650  This happens normally if a slot doesn't have a working HW presence
651  detect and relies instead of inband presence detect.
652
653  The message we display is scary and not very useful unless ou
654  are debugging, so quiten it up and change it to something more
655  meaningful.
656- pcie-slot: Don't fail powering on an already on switch
657
658  If the power state is already the required value, return
659  OPAL_SUCCESS rather than OPAL_PARAMETER to avoid spurrious
660  errors during boot.
661
662CAPI/OpenCAPI
663^^^^^^^^^^^^^
664- capi: Keep the current mmio windows in the mbt cache table.
665
666  When the phb is used as a CAPI interface, the current mmio windows list
667  is cleaned before adding the capi and the prefetchable memory (M64)
668  windows, which implies that the non-prefetchable BAR is no more
669  configured.
670  This patch allows to set only the mbt bar to pass capi mmio window and
671  to keep, as defined, the other mmio values (M32 and M64).
672- npu2-opencapi: Fix 'link internal error' FIR, take 2
673
674  When setting up an opencapi link, we set the transport muxes first,
675  then set the PHY training config register, which includes disabling
676  nvlink mode for the bricks. That's the order of the init sequence, as
677  found in the NPU workbook.
678
679  In reality, doing so works, but it raises 2 FIR bits in the PowerBus
680  OLL FIR Register for the 2 links when we configure the transport
681  muxes. Presumably because nvlink is not disabled yet and we are
682  configuring the transport muxes for opencapi.
683
684  bit 60:
685    link0 internal error
686  bit 61:
687    link1 internal error
688
689  Overall the current setup ends up being correct and everything works,
690  but we raise 2 FIR bits.
691
692  So tweak the order of operations to disable nvlink before configuring
693  the transport muxes. Incidentally, this is what the scripts from the
694  opencapi enablement team were doing all along.
695- npu2-opencapi: Fix 'link internal error' FIR, take 1
696
697  When we setup a link, we always enable ODL0 and ODL1 at the same time
698  in the PHY training config register, even though we are setting up
699  only one OTL/ODL, so it raises a "link internal error" FIR bit in the
700  PowerBus OLL FIR Register for the second link. The error is harmless,
701  as we'll eventually setup the second link, but there's no reason to
702  raise that FIR bit.
703
704  The fix is simply to only enable the ODL we are using for the link.
705- phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode.
706
707  The cxl driver will set the capi value, like other drivers already do.
708- phb4: set TVT1 for tunneled operations in capi mode
709
710  The ASN indication is used for tunneled operations (as_notify and
711  atomics). Tunneled operation messages can be sent in PCI mode as
712  well as CAPI mode.
713
714  The address field of as_notify messages is hijacked to encode the
715  LPID/PID/TID of the target thread, so those messages should not go
716  through address translation. Therefore bit 59 is part of the ASN
717  indication.
718
719  This patch sets TVT#1 in bypass mode when capi mode is enabled,
720  to prevent as_notify messages from being dropped.
721
722Debugging/Testing improvements
723------------------------------
724- core/stack: backtrace unwind basic OPAL call details
725
726  Put OPAL callers' r1 into the stack back chain, and then use that to
727  unwind back to the OPAL entry frame (as opposed to boot entry, which
728  has a 0 back chain).
729
730  From there, dump the OPAL call token and the caller's r1. A backtrace
731  looks like this: ::
732
733      CPU 0000 Backtrace:
734       S: 0000000031c03ba0 R: 000000003001a548   ._abort+0x4c
735       S: 0000000031c03c20 R: 000000003001baac   .opal_run_pollers+0x3c
736       S: 0000000031c03ca0 R: 000000003001bcbc   .opal_poll_events+0xc4
737       S: 0000000031c03d20 R: 00000000300051dc   opal_entry+0x12c
738       --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 ---
739
740  This is pretty basic for the moment, but it does give you the bottom
741  of the Linux stack. It will allow some interesting improvements in
742  future.
743
744  First, with the eframe, all the call's parameters can be printed out
745  as well.  The ___backtrace / ___print_backtrace API needs to be
746  reworked in order to support this, but it's otherwise very simple
747  (see opal_trace_entry()).
748
749  Second, it will allow Linux's stack to be passed back to Linux via
750  a debugging opal call. This will allow Linux's BUG() or xmon to
751  also print the Linux back trace in case of a NMI or MCE or watchdog
752  lockup that hits in OPAL.
753- asm/head: implement quiescing without stack or clobbering regs
754
755  Quiescing currently is implmeented in C in opal_entry before the
756  opal call handler is called. This works well enough for simple
757  cases like fast reset when one CPU wants all others out of the way.
758
759  Linux would like to use it to prevent an sreset IPI from
760  interrupting firmware, which could lead to deadlocks when crash
761  dumping or entering the debugger. Linux interrupts do not recover
762  well when returning back to general OPAL code, due to r13 not being
763  restored. OPAL also can't be re-entered, which may happen e.g.,
764  from the debugger.
765
766  So move the quiesce hold/reject to entry code, beore the stack or
767  r1 or r13 registers are switched. OPAL can be interrupted and
768  returned to or re-entered during this period.
769
770  This does not completely solve all such problems. OPAL will be
771  interrupted with sreset if the quiesce times out, and it can be
772  interrupted by MCEs as well. These still have the issues above.
773- core/opal: Allow poller re-entry if OPAL was re-entered
774
775  If an NMI interrupts the middle of running pollers and the OS
776  invokes pollers again (e.g., for console output), the poller
777  re-entrancy check will prevent it from running and spam the
778  console.
779
780  That check was designed to catch a poller calling opal_run_pollers,
781  OPAL re-entrancy is something different and is detected elsewhere.
782  Avoid the poller recursion check if OPAL has been re-entered. This
783  is a best-effort attempt to cope with errors.
784- core/opal: Emergency stack for re-entry
785
786  This detects OPAL being re-entered by the OS, and switches to an
787  emergency stack if it was. This protects the firmware's main stack
788  from re-entrancy and allows the OS to use NMI facilities for crash
789  / debug functionality.
790
791  Further nested re-entry will destroy the previous emergency stack
792  and prevent returning, but those should be rare cases.
793
794  This stack is sized at 16kB, which doubles the size of CPU stacks,
795  so as not to introduce a regression in primary stack size. The 16kB
796  stack originally had a 4kB machine check stack at the top, which was
797  removed by 80eee1946 ("opal: Remove machine check interrupt patching
798  in OPAL."). So it is possible the size could be tightened again, but
799  that would require further analysis.
800
801- hdat_to_dt: hash_prop the same on all platforms
802  Fixes this unit test on ppc64le hosts.
803- mambo: Add persistent memory disk support
804
805  This adds support to for mapping disks images using persistent
806  memory. Disks can be added by setting this ENV variable:
807
808    PMEM_DISK="/mydisks/disk1.img,/mydisks/disk2.img"
809
810  These will show up in Linux as /dev/pmem0 and /dev/pmem1.
811
812  This uses a new feature in mambo "mysim memory mmap .." which is only
813  available since mambo commit 0131f0fc08 (from 24/4/2018).
814
815  This also needs the of_pmem.c driver in Linux which is only available
816  since v4.17. It works with powernv_defconfig + CONFIG_OF_PMEM.
817- external/mambo: Add di command to decode instructions
818
819  By default you get 16 instructions but you can specify the number you
820  want.  i.e. ::
821
822      systemsim % di 0x100 4
823      0x0000000000000100: Enc:0xA64BB17D : mtspr   HSPRG1,r13
824      0x0000000000000104: Enc:0xA64AB07D : mfspr   r13,HSPRG0
825      0x0000000000000108: Enc:0xF0092DF9 : std     r9,0x9F0(r13)
826      0x000000000000010C: Enc:0xA6E2207D : mfspr   r9,PPR
827
828  Using di since it's what xmon uses.
829- mambo/mambo_utils.tcl: Inject an MCE at a specified address
830
831  Currently we don't support injecting an MCE on a specific address.
832  This is useful for testing functionality like memcpy_mcsafe()
833  (see https://patchwork.ozlabs.org/cover/893339/)
834
835  The core of the functionality is a routine called
836  inject_mce_ue_on_addr, which takes an addr argument and injects
837  an MCE (load/store with UE) when the specified address is accessed
838  by code. This functionality can easily be enhanced to cover
839  instruction UE's as well.
840
841  A sample use case to create an MCE on stack access would be ::
842
843    set addr [mysim display gpr 1]
844    inject_mce_ue_on_addr $addr
845
846  This would cause an mce on any r1 or r1 based access
847- external/mambo: improve helper for machine checks
848
849  Improve workarounds for stop injection, because mambo often will
850  trigger on 0x104/204 when injecting sreset/mces.
851
852  This also adds a workaround to skip injecting on reservations to
853  avoid infinite loops when doing inject_mce_step.
854- travis: Enable ppc64le builds
855
856  At least on the IBM Travis Enterprise instance, we can now do
857  ppc64le builds!
858
859  We can only build a subset of our matrix due to availability of
860  ppc64le distros. The Dockerfiles need some tweaking to only
861  attempt to install (x86_64 only) Mambo binaries, as well as the
862  build scripts.
863- external: Add "lpc" tool
864
865  This is a little front-end to the lpc debugfs files to access
866  the LPC bus from userspace on the host.
867- core/test/run-trace: fix on ppc64el
868
869
870