1.. _skiboot-6.0:
2
3skiboot-6.0
4===========
5
6skiboot v6.0 was released on Friday May 11th 2018. It is the first
7release of skiboot 6.0, which is the new stable release of skiboot
8following the 5.11 release, first released April 6th 2018.
9
10Skiboot 6.0 is the basis for op-build v2.0 and will is *required* for
11POWER9 systems.
12
13skiboot v6.0 contains all bug fixes as of :ref:`skiboot-5.11`,
14:ref:`skiboot-5.10.5`, and :ref:`skiboot-5.4.9` (the currently maintained
15stable releases). We do *not* expect any further stable releases in the
165.10.x series, nor in the 5.11.x series.
17
18For how the skiboot stable releases work, see :ref:`stable-rules` for details.
19
20Over skiboot-5.11, we have the following changes:
21
22
23New Features
24------------
25
26Since 6.0-rc1:
27
28- Update default stop-state-disable mask to cut only stop11
29
30  Stability improvements in microcode for stop4/stop5 are
31  available in upstream hcode images. Stop4 and stop5 can
32  be safely enabled by default.
33
34  Use ~0xE0000000 to cut all but stop0,1,2 in case there
35  are any issues with stop4/5.
36
37  example: ::
38
39    nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF
40
41  **Note**: that DD2.1 chips that have a frequency <1867Mhz possible *need* to
42  run a hcode image *different* than the default in op-build (set
43  `BR2_HCODE_LATEST_VERSION=y` in your config)
44- ibm,firmware-versions: add hcode to device tree
45
46  op-build commit 736a08b996e292a449c4996edb264011dfe56a40
47  added hcode to the VERSION partition, let's parse it out
48  and let the user know.
49- ipmi: Add BMC firmware version to device tree
50
51  BMC Get device ID command gives BMC firmware version details. Lets add this
52  to device tree. User space tools will use this information to display BMC
53  version details.
54
55Since 5.11:
56
57- Disable stop states from OPAL
58
59  On ZZ, stop4,5,11 are enabled for PowerVM, even though doing
60  so may cause problems with OPAL due to bugs in hcode.
61
62  For other platforms, this isn't so much of an issue as
63  we can just control stop states by the MRW. However the
64  rebuild-the-world approach to changing values there is a bit
65  annoying if you just want to rule out a specific stop state
66  from being problematic.
67
68  Provide an nvram option to override what's disabled in OPAL.
69
70  The OPAL mask is currently ~0xE0000000 (i.e. all but stop 0,1,2)
71
72  You can set an NVRAM override with: ::
73
74      nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0xFFFFFFF
75
76  This nvram override will disable *all* stop states.
77- interrupts: Create an "interrupts" property in the OPAL node
78
79  Deprecate the old "opal-interrupts", it's still there, but the new
80  property follows the standard and allow us to specify whether an
81  interrupt is level or edge sensitive.
82
83  Similarly create "interrupt-names" whose content is identical to
84  "opal-interrupts-names".
85- SBE: Add timer support on POWER9
86
87  SBE on P9 provides one shot programmable timer facility. We can use this
88  to implement OPAL timers and hence limit the reliance on the Linux
89  heartbeat (similar to HW timer facility provided by SLW on P8).
90- Add SBE driver support
91
92  SBE (Self Boot Engine) on P9 has two different jobs:
93  - Boot the chip up to the point the core is functional
94  - Provide various services like timer, scom, stash MPIPL, etc., at runtime
95
96  We will use SBE for various purposes like timer, MPIPL, etc.
97
98- opal:hmi: Add missing processor recovery reason string.
99
100  With this patch now we see reason string printed for CORE_WOF[43] bit. ::
101
102    [  477.352234986,7] HMI: [Loc: U78D3.001.WZS004A-P1-C48]: P:8 C:22 T:3: Processor recovery occurred.
103    [  477.352240742,7] HMI: Core WOF = 0x0000000000100000 recovered error:
104    [  477.352242181,7] HMI: PC - Thread hang recovery
105- Add DIMM actual speed to device tree
106
107  Recent HDAT provides DIMM actuall speed. Lets add this to device tree.
108- Fix DIMM size property
109
110  Today we parse vpd blob to get DIMM size information. This is limited
111  to FSP based system. HDAT provides DIMM size value. Lets use that to
112  populate device tree. So that we can get size information on BMC based
113  system as well.
114
115- PCI: Set slot power limit when supported
116
117  The PCIe slot capability can be implemented in a root or switch
118  downstream port to set the maximum power a card is allowed to draw
119  from the system. This patch adds support for setting the power limit
120  when the platform has defined one.
121- hdata/spira: parse vpd to add part-number and serial-number to xscom@ node
122
123  Expected by FWTS and associates our processor with the part/serial
124  number, which is obviously a good thing for one's own sanity.
125
126
127Improved HMI Handling
128^^^^^^^^^^^^^^^^^^^^^
129
130- opal/hmi: Add documentation for opal_handle_hmi2 call
131- opal/hmi: Generate hmi event for recovered HDEC parity error.
132- opal/hmi: check thread 0 tfmr to validate latched tfmr errors.
133
134  Due to P9 errata, HDEC parity and TB residue errors are latched for
135  non-zero threads 1-3 even if they are cleared. But these are not
136  latched on thread 0. Hence, use xscom SCOMC/SCOMD to read thread 0 tfmr
137  value and ignore them on non-zero threads if they are not present on
138  thread 0.
139- opal/hmi: Print additional debug information in rendezvous.
140- opal/hmi: Fix handling of TFMR parity/corrupt error.
141
142  While testing TFMR parity/corrupt error it has been observed that HMIs are
143  delivered twice for this error
144
145    - First time HMI is delivered with HMER[4,5]=1 and TFMR[60]=1.
146    - Second time HMI is delivered with HMER[4,5]=1 and TFMR[60]=0 with valid TB.
147
148  On second HMI we end up throwing "HMI: TB invalid without core error
149  reported" even though TB is in a valid state.
150- opal/hmi: Stop flooding HMI event for TOD errors.
151
152  Fix the issue where every thread on the chip sends HMI event to host for
153  TOD errors. TOD errors are reported to all the core/threads on the chip.
154  Any one thread can fix the error and send event. Rest of the threads don't
155  need to send HMI event unnecessarily.
156- opal/hmi: Fix soft lockups during TOD errors
157
158  There are some TOD errors which do not affect working of TOD and TB. They
159  stay in valid state. Hence we don't need rendez vous for TOD errors that
160  does not affect TB working.
161
162  TOD errors that affects TOD/TB will report a global error on TFMR[44]
163  alongwith bit 51, and they will go in rendez vous path as expected.
164
165  But the TOD errors that does not affect TB register sets only TFMR bit 51.
166  The TFMR bit 51 is cleared when any single thread clears the TOD error.
167  Once cleared, the bit 51 is reflected to all the cores on that chip. Any
168  thread that reads the TFMR register after the error is cleared will see
169  TFMR bit 51 reset. Hence the threads that see TFMR[51]=1, falls through
170  rendez-vous path and threads that see TFMR[51]=0, returns doing
171  nothing. This ends up in a soft lockups in host kernel.
172
173  This patch fixes this issue by not considering TOD interrupt (TFMR[51])
174  as a core-global error and hence avoiding rendez-vous path completely.
175  Instead threads that see TFMR[51]=1 will now take different path that
176  just do the TOD error recovery.
177- opal/hmi: Do not send HMI event if no errors are found.
178
179  For TOD errors, all the cores in the chip get HMIs. Any one thread from any
180  core can fix the issue and TFMR will have error conditions cleared. Rest of
181  the threads need take any action if TOD errors are already cleared. Hence
182  thread 0 of every core should get a fresh copy of TFMR before going ahead
183  recovery path. Initialize recover = -1, so that if no errors found that
184  thread need not send a HMI event to linux. This helps in stop flooding host
185  with hmi event by every thread even there are no errors found.
186- opal/hmi: Initialize the hmi event with old value of HMER.
187
188  Do this before we check for TFAC errors. Otherwise the event at host console
189  shows no error reported in HMER register.
190
191  Without this patch the console event show HMER with all zeros ::
192
193    [  216.753417] Severe Hypervisor Maintenance interrupt [Recovered]
194    [  216.753498]  Error detail: Timer facility experienced an error
195    [  216.753509]  HMER: 0000000000000000
196    [  216.753518]  TFMR: 3c12000870e04000
197
198  After this patch it shows old HMER values on host console: ::
199
200    [ 2237.652533] Severe Hypervisor Maintenance interrupt [Recovered]
201    [ 2237.652651]  Error detail: Timer facility experienced an error
202    [ 2237.652766]  HMER: 0840000000000000
203    [ 2237.652837]  TFMR: 3c12000870e04000
204- opal/hmi: Rework HMI handling of TFAC errors
205
206  This patch reworks the HMI handling for TFAC errors by introducing
207  4 rendez-vous points improve the thread synchronization while handling
208  timebase errors that requires all thread to clear dirty data from TB/HDEC
209  register before clearing the errors.
210- opal/hmi: Don't bother passing HMER to pre-recovery cleanup
211
212  The test for TFAC error is now redundant so we remove it and
213  remove the HMER argument.
214- opal/hmi: Move timer related error handling to a separate function
215
216  Currently no functional change. This is a first step to completely
217  rewriting how these things are handled.
218- opal/hmi: Add a new opal_handle_hmi2 that returns direct info to Linux
219
220  It returns a 64-bit flags mask currently set to provide info
221  about which timer facilities were lost, and whether an event
222  was generated.
223- opal/hmi: Remove races in clearing HMER
224
225  Writing to HMER acts as an "AND". The current code writes back the
226  value we originally read with the bits we handled cleared. This is
227  racy, if a new bit gets set in HW after the original read, we'll end
228  up clearing it without handling it.
229
230  Instead, use an all 1's mask with only the bit handled cleared.
231- opal/hmi: Don't re-read HMER multiple times
232
233  We want to make sure all reporting and actions are based
234  upon the same snapshot of HMER in case bits get added
235  by HW while we are in OPAL.
236
237libflash and ffspart
238^^^^^^^^^^^^^^^^^^^^
239
240Many improvements to the `ffspart` utility and `libflash` have come
241in this release, making `ffspart` suitable for building bit-identical
242PNOR images as the existing tooling used by `op-build`. The plan is to
243switch `op-build` to use this infrastructure in the not too distant
244future.
245
246- libflash/blocklevel: Make read/write be ECC agnostic for callers
247
248  The blocklevel abstraction allows for regions of the backing store to be
249  marked as ECC protected so that blocklevel can decode/encode the ECC
250  bytes into the buffer automatically without the caller having to be ECC
251  aware.
252
253  Unfortunately this abstraction is far from perfect, this is only useful
254  if reads and writes are performed at the start of the ECC region or in
255  some circumstances at an ECC aligned position - which requires the
256  caller be aware of the ECC regions.
257
258  The problem that has arisen is that the blocklevel abstraction is
259  initialised somewhere but when it is later called the caller is unaware
260  if ECC exists in the region it wants to arbitrarily read and write to.
261  This should not have been a problem since blocklevel knows. Currently
262  misaligned reads will fail ECC checks and misaligned writes will
263  overwrite ECC bytes and the backing store will become corrupted.
264
265  This patch add the smarts to blocklevel_read() and blocklevel_write() to
266  cope with the problem. Note that ECC can always be bypassed by calling
267  blocklevel_raw_() functions.
268
269  All this work means that the gard tool can can safely call
270  blocklevel_read() and blocklevel_write() and as long as the blocklevel
271  knows of the presence of ECC then it will deal with all cases.
272
273  This also commit removes code in the gard tool which compensated for
274  inadequacies no longer present in blocklevel.
275- libflash/blocklevel: Return region start from ecc_protected()
276
277  Currently all ecc_protected() does is say if a region is ECC protected
278  or not. Knowing a region is ECC protected is one thing but there isn't
279  much that can be done afterwards if this is the only known fact. A lot
280  more can be done if the caller is told where the ECC region begins.
281
282  Knowing where the ECC region start it allows to caller to align its
283  read/and writes. This allows for more flexibility calling read and write
284  without knowing exactly how the backing store is organised.
285- libflash/ecc: Add helpers to align a position within an ecc buffer
286
287  As part of ongoing work to make ECC invisible to higher levels up the
288  stack this function converts a 'position' which should be ECC agnostic
289  to the equivalent position within an ECC region starting at a specified
290  location.
291- libflash/ecc: Add functions to deal with unaligned ECC memcpy
292- external/ffspart: Improve error output
293- libffs: Fix bad checks for partition overlap
294
295  Not all TOCs are written at zero
296- libflash/libffs: Allow caller to specifiy header partition
297
298  An FFS TOC is comprised of two parts. A small header which has a magic
299  and very minimmal information about the TOC which will be common to all
300  partitions, things like number of patritions, block sizes and the like.
301  Following this small header are a series of entries. Importantly there
302  is always an entry which encompases the TOC its self, this is usually
303  called the 'part' partition.
304
305  Currently libffs always assumes that the 'part' partition is at zero.
306  While there is always a TOC and zero there doesn't actually have to be.
307  PNORs may have multiple TOCs within them, therefore libffs needs to be
308  flexible enough to allow callers to specify TOCs not at zero.
309
310  The 'part' partition is otherwise a regular partition which may have
311  flags associated with it. libffs should allow the user to set the flags
312  for the 'part' partition.
313
314  This patch achieves both by allowing the caller to specify the 'part'
315  partition. The caller can not and libffs will provide a sensible
316  default.
317- libflash/libffs: Refcount ffs entries
318
319  Currently consumers can add an new ffs entry to multiple headers, this
320  is fine but freeing any of the headers will cause the entry to be freed,
321  this causes double free problems.
322
323  Even if only one header is uses, the consumer of the library still has a
324  reference to the entry, which they may well reuse at some other point.
325
326  libffs will now refcount entries and only free when there are no more
327  references.
328
329  This patch also removes the pointless return value of ffs_hdr_free()
330- libflash/libffs: Switch to storing header entries in an array
331
332  Since the libffs no longer needs to sort the entries as they get added
333  it makes little sense to have the complexity of a linked list when an
334  array will suffice.
335- libflash/libffs: Remove backup partition from TOC generation code
336
337  It turns out this code was messy and not all that reliable. Doing it at
338  the library level adds complexity to the library and restrictions to the
339  caller.
340
341  A simpler approach can be achived with the just instantiating multiple
342  ffs_header structures pointing to different parts of the same file.
343- libflash/libffs: Remove the 'sides' from the FFS TOC generation code
344
345  It turns out this code was messy and not all that reliable. Doing it at
346  the library level adds complexity to the library and restrictions to the
347  caller.
348
349  A simpler approach can be achived with the just instantiating multiple
350  ffs_header structures pointing to different parts of the same file.
351- libflash/libffs: Always add entries to the end of the TOC
352
353  It turns out that sorted order isn't the best idea. This removes
354  flexibility from the caller. If the user wants their partitions in
355  sorted order, they should insert them in sorted order.
356- external/ffspart: Remove side, order and backup options
357
358  These options are currently flakey in libflash/libffs so there isn't
359  much point to being able to use them in ffspart.
360
361  Future reworks planned for libflash/libffs will render these options
362  redundant anyway.
363- libflash/libffs: ffs_close() should use ffs_hdr_free()
364- libflash/libffs: Add setter for a partitions actual size
365- pflash: Use ffs_entry_user_to_string() to standardise flag strings
366- libffs: Standardise ffs partition flags
367
368  It seems we've developed a character respresentation for ffs partition
369  flags. Currently only pflash really prints them so it hasn't been a
370  problem but now ffspart wants to read them in from user input.
371
372  It is important that what libffs reads and what pflash prints remain
373  consistent, we should move the code into libffs to avoid problems.
374- external/ffspart: Allow # comments in input file\
375
376p9dsu Platform changes
377----------------------
378
379The p9dsu platform from SuperMicro (also known as 'Boston') has received
380a number of updates, and the patches once carried by SuperMicro are now
381upstream.
382
383Since 6.0-rc1:
384
385- p9dsu: timeout for variant detection, default to 2uess
386
387
388Since 5.11:
389
390- p9dsu: detect p9dsu variant even when hostboot doesn't tell us
391
392  The SuperMicro BMC can tell us what riser type we have, which dictates
393  the PCI slot tables. Usually, in an environment that a customer would
394  experience, Hostboot will do the query with an SMC specific patch
395  (not upstream as there's no platform specific code in hostboot)
396  and skiboot knows what variant it is based on the compatible string.
397
398  However, if you're using upstream hostboot, you only get the bare
399  'p9dsu' compatible type. We can work around this by asking the BMC
400  ourselves and setting the slot table appropriately. We do this
401  syncronously in platform init so that we don't start probing
402  PCI before we setup the slot table.
403- p9dsu: add slot power limit.
404- p9dsu: add pci slot table for Boston LC 1U/2U and Boston LA/ESS.
405- p9dsu HACK: fix system-vpd eeprom
406- p9dsu: change esel command from AMI to IBM 0x3a.
407
408ZZ Platform Changes
409-------------------
410
411- hdata/i2c: Fix up pci hotplug labels
412
413  These labels are used on the devices used to do PCIe slot power control
414  for implementing PCIe hotplug. I'm not sure how they ended up as
415  "eeprom-pgood" and "eeprom-controller" since that doesn't make any sense.
416- hdata/i2c: Ignore multi-port I2C devices
417
418  Recent FSP firmware builds add support for multi-port I2C devices such
419  as the GPIO expanders used for the presence detect of OpenCAPI devices
420  and the PCIe hotplug controllers used to power cycle PCIe slots on ZZ.
421
422  The OpenCAPI driver inside of skiboot currently uses a platform-specific
423  method to talk to the relevant I2C device rather than relying on HDAT
424  since not all platforms correctly report the I2C devices (hello Zaius).
425  Additionally the nature of multi-port devices require that we a device
426  specific handler so that we generate the correct DT bindings. Currently
427  we don't and there is no immediate need for this support so just ignore
428  the multi-port devices for now.
429- hdata/i2c: Replace `i2c_` prefix with `dev_`
430
431  The current naming scheme makes it easy to conflate "i2cm_port" and
432  "i2c_port." The latter is used to describe multi-port I2C devices such
433  as GPIO expanders and multi-channel PCIe hotplug controllers. Rename
434  i2c_port to dev_port to make the two a bit more distinct.
435
436  Also rename i2c_addr to dev_addr for consistency.
437- hdata/i2c: Ignore CFAM I2C master
438
439  Recent FSP firmware builds put in information about the CFAM I2C master
440  in addition the to host I2C masters accessible via XSCOM. Odds are this
441  information should not be there since there's no handshaking between the
442  FSP/BMC and the host over who controls that I2C master, but it is so
443  we need to deal with it.
444
445  This patch adds filtering to the HDAT parser so it ignores the CFAM I2C
446  master. Without this it will create a bogus i2cm@<addr> which migh cause
447  issues.
448- ZZ: hw/imc: Add support to load imc catalog lid file
449
450  Add support to load the imc catalog from a lid file packaged
451  as part of the system firmware. Lid number allocated
452  is 0x80f00103.lid.
453
454
455Bugs Fixed
456----------
457
458Since 6.0-rc2:
459
460- core/opal: Fix recursion check in opal_run_pollers()
461
462  An earlier commit introduced a counter variable poller_recursion to
463  limit to the number number of error messages shown when opal_pollers
464  are run recursively. However the check for the counter value was
465  placed in a way that the poller recursion was only detected first 16
466  times and then allowed afterwards.
467
468  This patch fixes this by moving the check for the counter value inside
469  the conditional branch with some re-factoring so that opal_poller
470  recursion is not erroneously allowed after poll_recursion is detected
471  first 16 times.
472- phb4: Print WOF registers on fence detect
473
474  Without the WOF registers it's hard to figure out what went wrong first,
475  so print those when we print the FIRs when a fence is detected.
476- p9dsu: detect variant in init only if probe fails to found.
477
478  Currently the slot table init happens twice in both probe and init
479  functions due to the variant detection logic called with in-correct
480  condition check.
481
482Since 6.0-rc1:
483
484- core/direct-controls: improve p9_stop_thread error handling
485
486  p9_stop_thread should fail the operation if it finds the thread was
487  already quiescd. This implies something else is doing direct controls
488  on the thread (e.g., pdbg) or there is some exceptional condition we
489  don't know how to deal with. Proceeding here would cause things to
490  trample on each other, for example the hard lockup watchdog trying to
491  send a sreset to the core while it is stopped for debugging with pdbg
492  will end in tears.
493
494  If p9_stop_thread times out waiting for the thread to quiesce, do
495  not hit it with a core_start direct control, because we don't know
496  what state things are in and doing more things at this point is worse
497  than doing nothing. There is no good recipe described in the workbook
498  to de-assert the core_stop control if it fails to quiesce the thread.
499  After timing out here, the thread may eventually quiesce and get
500  stuck, but that's simpler to debug than undefied behaviour.
501
502- core/direct-controls: fix p9_cont_thread for stopped/inactive threads
503
504  Firstly, p9_cont_thread should check that the thread actually was
505  quiesced before it tries to resume it. Anything could happen if we
506  try this from an arbitrary thread state.
507
508  Then when resuming a quiesced thread that is inactive or stopped (in
509  a stop idle state), we must not send a core_start direct control,
510  clear_maint must be used in these cases.
511- hmi: Clear unknown debug trigger
512
513  On some systems, seeing hangs like this when Linux starts: ::
514
515      [ 170.027252763,5] OCC: All Chip Rdy after 0 ms
516      [ 170.062930145,5] INIT: Starting kernel at 0x20011000, fdt at 0x30ae0530 366247 bytes)
517      [ 171.238270428,5] OPAL: Switch to little-endian OS
518
519  If you look at the in memory skiboot console (or do `nvram -p
520  ibm,skiboot --update-config log-level-driver=7`) we see the console get
521  spammed with: ::
522
523      [ 5209.109790675,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
524      [ 5209.109792716,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
525      [ 5209.109794695,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
526      [ 5209.109796689,7] HMI: Received HMI interrupt: HMER = 0x0000400000000000
527
528  We're taking the debug trigger (bit 17) early on, before the
529  hmi_debug_trigger function in the kernel is set up.
530
531  This clears the HMI in Skiboot and reports to the kernel instead of
532  bringing down the machine.
533
534- core/hmi: assign flags=0 in case nothing set by handle_hmi_exception
535
536  Theoretically we could have returned junk to the OS in this parameter.
537
538- SLW: Fix mambo boot to use stop states
539
540  After commit 35c66b8ce5a2 ("SLW: Move MAMBO simulator checks to
541  slw_init"), mambo boot no longer calls add_cpu_idle_state_properties()
542  and as such we never enable stop states.
543
544  After adding the call back, we get more testing coverage as well
545  as faster mambo SMT boots.
546
547- phb4: Hardware init updates
548
549  CFG Write Request Timeout was incorrectly set to informational and not
550  fatal for both non-CAPI and CAPI, so set it to fatal.  This was a
551  mistake in the specification.  Correcting this fixes a niche bug in
552  escalation (which is necessary on pre-DD2.2) that can cause a checkstop
553  due to a NCU timeout.
554
555  In addition, set the values in the timeout control registers to match.
556  This fixes an extremely rare and unreproducible bug, though the current
557  timings don't make sense since they're higher than the NCU timeout (16)
558  which will checkstop the machine anyway.
559
560- SLW: quieten 'Configuring self-restore' for DARN,NCU_SPEC_BAR and HRMOR
561
562Since 5.11:
563
564- core: Fix iteration condition to skip garded cpu
565- uart: fix uart_opal_flush to take console lock over uart_con_flush
566  This bug meant that OPAL_CONSOLE_FLUSH didn't take the appropriate locks.
567  Luckily, since this call is only currently used in the crash path.
568- xive: fix missing unlock in error path
569- OPAL_PCI_SET_POWER_STATE: fix locking in error paths
570
571  Otherwise we could exit OPAL holding locks, potentially leading
572  to all sorts of problems later on.
573- hw/slw: Don't assert on a unknown chip
574
575  For some reason skiboot populates nodes in /cpus/ for the cores on
576  chips that are deconfigured. As a result Linux includes the threads
577  of those cores in it's set of possible CPUs in the system and attempts
578  to set the SPR values that should be used when waking a thread from
579  a deep sleep state.
580
581  However, in the case where we have deconfigured chip we don't create
582  a xscom node for that chip and as a result we don't have a proc_chip
583  structure for that chip either. In turn, this results in an assertion
584  failure when calling opal_slw_set_reg() since it expects the chip
585  structure to exist. Fix this up and print an error instead.
586- opal/hmi: Generate one event per core for processor recovery.
587
588  Processor recovery is per core error. All threads on that core receive
589  HMI. All threads don't need to generate HMI event for same error.
590
591  Let thread 0 only generate the event.
592- sensors: Dont add DTS sensors when OCC inband sensors are available
593
594  There are two sets of core temperature sensors today. One is DTS scom
595  based core temperature sensors and the second group is the sensors
596  provided by OCC. DTS is the highest temperature among the different
597  temperature zones in the core while OCC core temperature sensors are
598  the average temperature of the core. DTS sensors are read directly by
599  the host by SCOMing the DTS sensors while OCC sensors are read and
600  updated by OCC to main memory.
601
602  Reading DTS sensors by SCOMing is a heavy and slower operation as
603  compared to reading OCC sensors which is as good as reading memory.
604  So dont add DTS sensors when OCC sensors are available.
605- core/fast-reboot: Increase timeout for dctl sreset to 1sec
606
607  Direct control xscom can take more time to complete. We seem to
608  wait too little on Boston failing fast-reboot for no good reason.
609
610  Increase timeout to 1 sec as a reasonable value for sreset to be delivered
611  and core to start executing instructions.
612- occ: sensors-groups: Add DT properties to mark HWMON sensor groups
613
614  Fix the sensor type to match HWMON sensor types. Add compatible flag
615  to indicate the environmental sensor groups so that operations on
616  these groups can be handled by HWMON linux interface.
617- core: Correctly load initramfs in stb container
618
619  Skiboot does not calculate the actual size and start location of the
620  initramfs if it is wrapped by an STB container (for example if loading
621  an initramfs from the ROOTFS partition).
622
623  Check if the initramfs is in an STB container and determine the size and
624  location correctly in the same manner as the kernel. Since
625  load_initramfs() is called after load_kernel() move the call to
626  trustedboot_exit_boot_services() into load_and_boot_kernel() so it is
627  called after both of these.
628- hdat/i2c.c: quieten "v2 found, parsing as v1"
629- hw/imc: Check for pause_microcode_at_boot() return status
630
631  pause_microcode_at_boot() loops through all the chip's ucode
632  control block and pause the ucode if it is in the running state.
633  But it does not fail if any of the chip's ucode is not initialised.
634
635  Add code to return a failure if ucode is not initialized in any
636  of the chip. Since pause_microcode_at_boot() is called just before
637  attaching the IMC device nodes in imc_init(), add code to check for
638  the function return.
639
640
641Slot location code fixes:
642
643- npu2: Use ibm, loc-code rather than ibm, slot-label
644
645  The ibm,slot-label property is to name the slot that appears under a
646  PCIe bridge. In the past we (ab)used the slot tables to attach names
647  to GPU devices and their corresponding NVLinks which resulted in npu2.c
648  using slot-label as a location code rather than as a way to name slots.
649
650  Fix this up since it's confusing.
651- hdata/slots: Apply slot label to the parent slot
652
653  Slot names only really make sense when applied to an actual slot rather
654  than a device. On witherspoon the GPU devices have a name associated with
655  the device rather than the slot for the GPUs. Add a hack that moves the
656  slot label to the parent slot rather than on the device itself.
657- pci-dt-slot: Big ol' cleanup
658
659  The underlying data that we get from HDAT can only really describe a
660  PCIe system. As such we can simplify the devicetree slot lookup code
661  by only caring about the important cases, namly, root ports and switch
662  downstream ports.
663
664  This also fixes a bug where root port didn't get a Slot label applied
665  which results in devices under that port not having ibm,loc-code set.
666  This results in the EEH core being unable to report the location of
667  EEHed devices under that port.
668
669opal-prd
670^^^^^^^^
671- opal-prd: Insert powernv_flash module
672
673  Explictly load powernv_flash module on BMC based system so that we are sure
674  that flash device is created before starting opal-prd daemon.
675
676  Note that I have replaced pnor_available() check with is_fsp_system(). As we
677  want to load module on BMC system only. Also pnor_init has enough logic to
678  detect flash device. Hence pnor_available() becomes redundant check.
679
680NPU2/NVLINK2
681^^^^^^^^^^^^
682- npu2/hw-procedures: fence bricks on GPU reset
683
684  The NPU workbook defines a way of fencing a brick and
685  getting the brick out of fence state. We do have an implementation
686  of bringing the brick out of fenced/quiesced state. We do
687  the latter in our procedures, but to support run time reset
688  we need to do the former.
689
690  The fencing ensures that access to memory behind the links
691  will not lead to HMI's, but instead SUE's will be populated
692  in cache (in the case of speculation). The expectation is then
693  that prior to and after reset, the operating system components
694  will flush the cache for the region of memory behind the GPU.
695
696  This patch does the following:
697
698  1. Implements a npu2_dev_fence_brick() function to set/clear
699     fence state
700  2. Clear FIR bits prior to clearing the fence status
701  3. Clear's the fence status
702  4. We take the powerbus out of CQ fence much later now,
703     in credits_check() which is the last hardware procedure
704     called after link training.
705- hw/npu2.c: Remove static configuration of NPU2 register
706
707  The NPU_SM_CONFIG0 register currently needs to be configured in Skiboot to
708  select NVLink mode, however Hostboot should configure other bits in this
709  register.
710
711  For some reason Skiboot was explicitly clearing bit-6
712  (CONFIG_DISABLE_VG_NOT_SYS). It is unclear why this bit was getting cleared
713  as recent Hostboot versions explicitly set it to the correct value based on
714  the specific system configuration. Therefore Skiboot should not alter it.
715
716  Bit-58 (CONFIG_NVLINK_MODE) selects if NVLink mode should be enabled or
717  not. Hostboot does not configure this bit so Skiboot should continue to
718  configure it.
719- npu2: Improve log output of GPU-to-link mapping
720
721  Debugging issues related to unconnected NVLinks can be a little less
722  irritating if we use the NPU2DEV{DBG,INF}() macros instead of prlog().
723
724  In short, change this: ::
725
726      NPU2: comparing GPU 'GPU2' and NPU2 'GPU1'
727      NPU2: comparing GPU 'GPU3' and NPU2 'GPU1'
728      NPU2: comparing GPU 'GPU4' and NPU2 'GPU1'
729      NPU2: comparing GPU 'GPU5' and NPU2 'GPU1'
730            :
731      npu2_dev_bind_pci_dev: No PCI device for NPU2 device 0006:00:01.0 to bind to. If you expect a GPU to be there, this is a problem.
732
733  to this: ::
734
735      NPU6:0:1.0 Comparing GPU 'GPU2' and NPU2 'GPU1'
736      NPU6:0:1.0 Comparing GPU 'GPU3' and NPU2 'GPU1'
737      NPU6:0:1.0 Comparing GPU 'GPU4' and NPU2 'GPU1'
738      NPU6:0:1.0 Comparing GPU 'GPU5' and NPU2 'GPU1'
739            :
740      NPU6:0:1.0 No PCI device found for slot 'GPU1'
741- npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context init
742
743  A bad GPU or other condition may leave us with a subset of links that
744  never get initialized. If an ATSD is sent to one of those bricks, it
745  will never complete, leaving us waiting forever for a response: ::
746
747    watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050]
748    ...
749    Modules linked in: nvidia_uvm(O) nvidia(O)
750    CPU: 23 PID: 2050 Comm: acos Tainted: G        W  O    4.14.0 #2
751    task: c0000000285cfc00 task.stack: c000001fea860000
752    NIP:  c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60
753    REGS: c000001fea863550 TRAP: 0901   Tainted: G        W  O     (4.14.0)
754    MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28004484  XER: 20040000
755    CFAR: c0000000000abdf4 SOFTE: 1
756    GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820
757    GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560
758    GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff
759    GPR12: 0000000000008000 c000000003167e80
760    NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0
761    LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370
762
763
764  ATSDs are only sent to bricks which have a valid entry in the XTS_BDF
765  table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless
766  we make it all the way to creating a context for the BDF.
767
768Secure and Trusted Boot
769^^^^^^^^^^^^^^^^^^^^^^^
770- hdata/tpmrel: detect tpm not present by looking up the stinfo->status
771
772  Skiboot detects if tpm is present by checking if a secureboot_tpm_info
773  entry exists. However, if a tpm is not present, hostboot also creates a
774  secureboot_tpm_info entry. In this case, hostboot creates an empty
775  entry, but setting the field tpm_status to TPM_NOT_PRESENT.
776
777  This detects if tpm is not present by looking up the stinfo->status.
778
779  This fixes the "TPMREL: TPM node not found for chip_id=0 (HB bug)"
780  issue, reproduced when skiboot is running on a system that has no tpm.
781
782PCI
783^^^
784- phb4: Restore bus numbers after CRS
785
786  Currently we restore PCIe bus numbers right after the link is
787  up. Unfortunately as this point we haven't done CRS so config space
788  may not be accessible.
789
790  This moves the bus number restore till after CRS has happened.
791- romulus: Add a barebones slot table
792- phb4: Quieten and improve "Timeout waiting for electrical link"
793
794  This happens normally if a slot doesn't have a working HW presence
795  detect and relies instead of inband presence detect.
796
797  The message we display is scary and not very useful unless ou
798  are debugging, so quiten it up and change it to something more
799  meaningful.
800- pcie-slot: Don't fail powering on an already on switch
801
802  If the power state is already the required value, return
803  OPAL_SUCCESS rather than OPAL_PARAMETER to avoid spurrious
804  errors during boot.
805
806CAPI/OpenCAPI
807^^^^^^^^^^^^^
808- capi: Keep the current mmio windows in the mbt cache table.
809
810  When the phb is used as a CAPI interface, the current mmio windows list
811  is cleaned before adding the capi and the prefetchable memory (M64)
812  windows, which implies that the non-prefetchable BAR is no more
813  configured.
814  This patch allows to set only the mbt bar to pass capi mmio window and
815  to keep, as defined, the other mmio values (M32 and M64).
816- npu2-opencapi: Fix 'link internal error' FIR, take 2
817
818  When setting up an opencapi link, we set the transport muxes first,
819  then set the PHY training config register, which includes disabling
820  nvlink mode for the bricks. That's the order of the init sequence, as
821  found in the NPU workbook.
822
823  In reality, doing so works, but it raises 2 FIR bits in the PowerBus
824  OLL FIR Register for the 2 links when we configure the transport
825  muxes. Presumably because nvlink is not disabled yet and we are
826  configuring the transport muxes for opencapi.
827
828  bit 60:
829    link0 internal error
830  bit 61:
831    link1 internal error
832
833  Overall the current setup ends up being correct and everything works,
834  but we raise 2 FIR bits.
835
836  So tweak the order of operations to disable nvlink before configuring
837  the transport muxes. Incidentally, this is what the scripts from the
838  opencapi enablement team were doing all along.
839- npu2-opencapi: Fix 'link internal error' FIR, take 1
840
841  When we setup a link, we always enable ODL0 and ODL1 at the same time
842  in the PHY training config register, even though we are setting up
843  only one OTL/ODL, so it raises a "link internal error" FIR bit in the
844  PowerBus OLL FIR Register for the second link. The error is harmless,
845  as we'll eventually setup the second link, but there's no reason to
846  raise that FIR bit.
847
848  The fix is simply to only enable the ODL we are using for the link.
849- phb4: Do not set the PBCQ Tunnel BAR register when enabling capi mode.
850
851  The cxl driver will set the capi value, like other drivers already do.
852- phb4: set TVT1 for tunneled operations in capi mode
853
854  The ASN indication is used for tunneled operations (as_notify and
855  atomics). Tunneled operation messages can be sent in PCI mode as
856  well as CAPI mode.
857
858  The address field of as_notify messages is hijacked to encode the
859  LPID/PID/TID of the target thread, so those messages should not go
860  through address translation. Therefore bit 59 is part of the ASN
861  indication.
862
863  This patch sets TVT#1 in bypass mode when capi mode is enabled,
864  to prevent as_notify messages from being dropped.
865
866Debugging/Testing improvements
867------------------------------
868
869Since 6.0-rc1:
870
871- mambo: Enable XER CA32 and OV32 bits on P9
872
873  POWER9 adds 32 bit carry and overflow bits to the XER, but we need to
874  set the relevant CTRL1 bit to enable them.
875- Makefile: Fix building natively on ppc64le
876
877  When on ppc64le and CROSS is not set by the environment, make assumes
878  ppc64 and sets a default CROSS. Check for ppc64le as well, so that
879  'make' works out of the box on ppc64le.
880- Experimental support for building with Clang
881- Improvements to testing and Travis CI
882
883Since 5.11:
884
885- core/stack: backtrace unwind basic OPAL call details
886
887  Put OPAL callers' r1 into the stack back chain, and then use that to
888  unwind back to the OPAL entry frame (as opposed to boot entry, which
889  has a 0 back chain).
890
891  From there, dump the OPAL call token and the caller's r1. A backtrace
892  looks like this: ::
893
894      CPU 0000 Backtrace:
895       S: 0000000031c03ba0 R: 000000003001a548   ._abort+0x4c
896       S: 0000000031c03c20 R: 000000003001baac   .opal_run_pollers+0x3c
897       S: 0000000031c03ca0 R: 000000003001bcbc   .opal_poll_events+0xc4
898       S: 0000000031c03d20 R: 00000000300051dc   opal_entry+0x12c
899       --- OPAL call entry token: 0xa caller R1: 0xc0000000006d3b90 ---
900
901  This is pretty basic for the moment, but it does give you the bottom
902  of the Linux stack. It will allow some interesting improvements in
903  future.
904
905  First, with the eframe, all the call's parameters can be printed out
906  as well.  The ___backtrace / ___print_backtrace API needs to be
907  reworked in order to support this, but it's otherwise very simple
908  (see opal_trace_entry()).
909
910  Second, it will allow Linux's stack to be passed back to Linux via
911  a debugging opal call. This will allow Linux's BUG() or xmon to
912  also print the Linux back trace in case of a NMI or MCE or watchdog
913  lockup that hits in OPAL.
914- asm/head: implement quiescing without stack or clobbering regs
915
916  Quiescing currently is implmeented in C in opal_entry before the
917  opal call handler is called. This works well enough for simple
918  cases like fast reset when one CPU wants all others out of the way.
919
920  Linux would like to use it to prevent an sreset IPI from
921  interrupting firmware, which could lead to deadlocks when crash
922  dumping or entering the debugger. Linux interrupts do not recover
923  well when returning back to general OPAL code, due to r13 not being
924  restored. OPAL also can't be re-entered, which may happen e.g.,
925  from the debugger.
926
927  So move the quiesce hold/reject to entry code, beore the stack or
928  r1 or r13 registers are switched. OPAL can be interrupted and
929  returned to or re-entered during this period.
930
931  This does not completely solve all such problems. OPAL will be
932  interrupted with sreset if the quiesce times out, and it can be
933  interrupted by MCEs as well. These still have the issues above.
934- core/opal: Allow poller re-entry if OPAL was re-entered
935
936  If an NMI interrupts the middle of running pollers and the OS
937  invokes pollers again (e.g., for console output), the poller
938  re-entrancy check will prevent it from running and spam the
939  console.
940
941  That check was designed to catch a poller calling opal_run_pollers,
942  OPAL re-entrancy is something different and is detected elsewhere.
943  Avoid the poller recursion check if OPAL has been re-entered. This
944  is a best-effort attempt to cope with errors.
945- core/opal: Emergency stack for re-entry
946
947  This detects OPAL being re-entered by the OS, and switches to an
948  emergency stack if it was. This protects the firmware's main stack
949  from re-entrancy and allows the OS to use NMI facilities for crash
950  / debug functionality.
951
952  Further nested re-entry will destroy the previous emergency stack
953  and prevent returning, but those should be rare cases.
954
955  This stack is sized at 16kB, which doubles the size of CPU stacks,
956  so as not to introduce a regression in primary stack size. The 16kB
957  stack originally had a 4kB machine check stack at the top, which was
958  removed by 80eee1946 ("opal: Remove machine check interrupt patching
959  in OPAL."). So it is possible the size could be tightened again, but
960  that would require further analysis.
961
962- hdat_to_dt: hash_prop the same on all platforms
963  Fixes this unit test on ppc64le hosts.
964- mambo: Add persistent memory disk support
965
966  This adds support to for mapping disks images using persistent
967  memory. Disks can be added by setting this ENV variable:
968
969    PMEM_DISK="/mydisks/disk1.img,/mydisks/disk2.img"
970
971  These will show up in Linux as /dev/pmem0 and /dev/pmem1.
972
973  This uses a new feature in mambo "mysim memory mmap .." which is only
974  available since mambo commit 0131f0fc08 (from 24/4/2018).
975
976  This also needs the of_pmem.c driver in Linux which is only available
977  since v4.17. It works with powernv_defconfig + CONFIG_OF_PMEM.
978- external/mambo: Add di command to decode instructions
979
980  By default you get 16 instructions but you can specify the number you
981  want.  i.e. ::
982
983      systemsim % di 0x100 4
984      0x0000000000000100: Enc:0xA64BB17D : mtspr   HSPRG1,r13
985      0x0000000000000104: Enc:0xA64AB07D : mfspr   r13,HSPRG0
986      0x0000000000000108: Enc:0xF0092DF9 : std     r9,0x9F0(r13)
987      0x000000000000010C: Enc:0xA6E2207D : mfspr   r9,PPR
988
989  Using di since it's what xmon uses.
990- mambo/mambo_utils.tcl: Inject an MCE at a specified address
991
992  Currently we don't support injecting an MCE on a specific address.
993  This is useful for testing functionality like memcpy_mcsafe()
994  (see https://patchwork.ozlabs.org/cover/893339/)
995
996  The core of the functionality is a routine called
997  inject_mce_ue_on_addr, which takes an addr argument and injects
998  an MCE (load/store with UE) when the specified address is accessed
999  by code. This functionality can easily be enhanced to cover
1000  instruction UE's as well.
1001
1002  A sample use case to create an MCE on stack access would be ::
1003
1004    set addr [mysim display gpr 1]
1005    inject_mce_ue_on_addr $addr
1006
1007  This would cause an mce on any r1 or r1 based access
1008- external/mambo: improve helper for machine checks
1009
1010  Improve workarounds for stop injection, because mambo often will
1011  trigger on 0x104/204 when injecting sreset/mces.
1012
1013  This also adds a workaround to skip injecting on reservations to
1014  avoid infinite loops when doing inject_mce_step.
1015- travis: Enable ppc64le builds
1016
1017  At least on the IBM Travis Enterprise instance, we can now do
1018  ppc64le builds!
1019
1020  We can only build a subset of our matrix due to availability of
1021  ppc64le distros. The Dockerfiles need some tweaking to only
1022  attempt to install (x86_64 only) Mambo binaries, as well as the
1023  build scripts.
1024- external: Add "lpc" tool
1025
1026  This is a little front-end to the lpc debugfs files to access
1027  the LPC bus from userspace on the host.
1028- core/test/run-trace: fix on ppc64el
1029