1.. _skiboot-5.7-rc1:
2
3skiboot-5.7-rc1
4===============
5
6skiboot v5.7-rc1 was released on Monday July 3rd 2017. It is the first
7release candidate of skiboot 5.7, which will become the new stable release
8of skiboot following the 5.6 release, first released 24th May 2017.
9
10skiboot v5.7-rc1 contains all bug fixes as of :ref:`skiboot-5.4.6`
11and :ref:`skiboot-5.1.19` (the currently maintained stable releases). We
12do not currently expect to do any 5.6.x stable releases.
13
14For how the skiboot stable releases work, see :ref:`stable-rules` for details.
15
16The current plan is to cut the final 5.7 by July 12th, with skiboot 5.7
17being for all POWER8 and POWER9 platforms in op-build v1.18 (Due July 12th).
18This is a short cycle as this release is mainly targetted towards POWER9
19bringup efforts.
20
21This is the second release using the new regular six week release cycle,
22similar to op-build, but slightly offset to allow for a short stabilisation
23period. Expected release dates and contents are tracked using GitHub milestone
24and issues: https://github.com/open-power/skiboot/milestones
25
26Over skiboot-5.6, we have the following changes:
27
28New Features
29------------
30
31New features in this release for POWER9 systems:
32
33- In Memory Counters (IMC) (See :ref:`imc` for details)
34- phb4: Activate shared PCI slot on witherspoon (see :ref:`Shared Slot <shared-slot-5.7-rc1-rn>`)
35- phb4 capi (i.e. CAPI2): Enable capi mode for PHB4 (see :ref:`CAPI on PHB4 <capi2-5.7-rc1-rn>`)
36
37New feature for IBM FSP based systems:
38
39- fsp/tpo: Provide support for disabling TPO alarm
40
41  This patch adds support for disabling a preconfigured
42  Timed-Power-On(TPO) alarm on FSP based systems. Presently once a TPO alarm
43  is configured from the kernel it will be triggered even if its
44  subsequently disabled.
45
46  With this patch a TPO alarm can be disabled by passing
47  y_m_d==hr_min==0 to fsp_opal_tpo_write(). A branch is added to the
48  function to handle this case by sending FSP_CMD_TPO_DISABLE message to
49  the FSP instead of usual FSP_CMD_TPO_WRITE message. The kernel is
50  expected to call opal_tpo_write() with y_m_d==hr_min==0 to request
51  opal to disable TPO alarm.
52
53POWER9
54------
55
56Development on POWER9 systems continues in earnest.
57
58This release includes the first support for POWER9 DD2 chips. Future releases
59will likely contain more bug fixes, this release has booted on real hardware.
60
61- hdata: Reserve Trace Areas
62
63  When hostboot is configured to setup in memory tracing it will reserve
64  some memory for use by the hardware tracing facility. We need to mark
65  these areas as off limits to the operating system and firmware.
66- hdata: Make out-of-range idata print at PR_DEBUG
67
68  Some fields just aren't populated on some systems.
69
70- hdata: Ignore unnamed memory reservations.
71
72  Hostboot should name any and all memory reservations that it provides.
73  Currently some hostboots export a broken reservation covering the first
74  256MB of memory and this causes the system to crash at boot due to an
75  invalid free because this overlaps with the static "ibm,os-reserve"
76  region (which covers the first 768MB of memory).
77
78  According to the hostboot team unnamed reservations are invalid and can
79  be ignored.
80
81- hdata: Check the Host I2C devices array version
82
83  Currently this is not populated on FSP machines which causes some
84  obnoxious errors to appear in the boot log. We also only want to
85  parse version 1 of this structure since future versions will completely
86  change the array item format.
87
88- Ensure P9 DD1 workarounds apply only to Nimbus
89
90  The workarounds for P9 DD1 are only needed for Nimbus. P9 Cumulus will
91  be DD1 but don't need these same workarounds.
92
93  This patch ensures the P9 DD1 workarounds only apply to Nimbus. It
94  also renames some things to make clear what's what.
95
96- cpu: Cleanup AMR and IAMR when re-initializing CPUs
97
98  There's a bug in current Linux kernels leaving crap in those registers
99  accross kexec and not sanitizing them on boot. This breaks kexec under
100  some circumstances (such as booting a hash kernel from a radix one
101  on P9 DD2.0).
102
103  The long term fix is in Linux, but this workaround is a reasonable
104  way of "sanitizing" those SPRs when Linux calls opal_reinit_cpus()
105  and shouldn't have adverse effects.
106
107  We could also use that same mechanism to cleanup other things as
108  well such as restoring some other SPRs to their default value in
109  the future.
110
111- Set POWER9 RPR SPR to 0x00000103070F1F3F.  Same value as P8.
112
113  Without this, thread priorities inside a core don't work.
114
115- cpu: Support setting HID[RADIX] and set it by default on P9
116
117  This adds new opal_reinit_cpus() flags to setup radix or hash
118  mode in HID[8] on POWER9.
119
120  By default HID[8] will be set. On P9 DD1.0, Linux will change
121  it as needed. On P9 DD2.0 hash works in radix mode (radix is
122  really "dual" mode) so KVM won't break and existing kernels
123  will work.
124
125  Newer kernels built for hash will call this to clear the HID bit
126  and thus get the full size of the TLB as an optimization.
127
128- Add "cleanup_global_tlb" for P9 and later
129
130  Uses broadcast TLBIE's to cleanup the TLB on all cores and on
131  the nest MMU
132
133- xive: DD2.0 updates
134
135  Add support for StoreEOI, fix StoreEOI MMIO offset in ESB page,
136  and other cleanups
137
138- Update default TSCR value for P9 as recommended by HW folk.
139
140- xive: Fix initialisation of xive_cpu_state struct
141
142  When using XIVE emulation with DEBUG=1, we run into crashes in log_add()
143  due to the xive_cpu_state->log_pos being uninitialised (and thus, with
144  DEBUG enabled, initialised to the poison value of 0x99999999).
145
146OCC/Power Management
147^^^^^^^^^^^^^^^^^^^^
148
149With this release, it's possible to boot POWER9 systems with the OCC
150enabled and change CPU frequencies. Doing so does require other firmware
151components to also support this (otherwise the frequency will not be set).
152
153- occ: Skip setting cores to nominal frequency in P9
154
155  In P9, once OCC is up, it is supposed to setup the cores to nominal
156  frequency. So skip this step in OPAL.
157- occ: Fix Pstate ordering for P9
158
159  In P9 the pstate values are positive. They are continuous set of
160  unsigned integers [0 to +N] where Pmax is 0 and Pmin is N. The
161  linear ordering of pstates for P9 has changed compared to P8.
162  P8 has neagtive pstate values advertised as [0 to -N] where Pmax
163  is 0 and Pmin is -N. This patch adds helper routines to abstract
164  pstate comparison with pmax and adds sanity pstate limit checks.
165  This patch also fixes pstate arithmetic by using labs().
166- p8-i2c: occ: Add support for OCC to use I2C engines
167
168  This patch adds support to share the I2C engines with host and OCC.
169  OCC uses I2C engines to read DIMM temperatures and to communicate with
170  GPU. OCC Flag register is used for locking between host and OCC. Host
171  requests for the bus by setting a bit in OCC Flag register. OCC sends
172  an interrupt to indicate the change in ownership.
173
174opal-prd/PRD
175^^^^^^^^^^^^
176
177- opal-prd: Handle SBE passthrough message passing
178
179  This patch adds support to send SBE pass through command to HBRT.
180- SBE: Add passthrough command support
181
182  SBE sends passthrough command. We have to capture this interrupt and
183  send event to HBRT via opal-prd (user space daemon).
184- opal-prd: hook up reset_pm_complex
185
186  This change provides the facility to invoke HBRT's reset_pm_complex, in
187  the same manner is done with process_occ_reset previously.
188
189  We add a control command for `opal-prd pm-complex reset`, which is just
190  an alias for occ_reset at this stage.
191
192- prd: Implement firmware side of opaque PRD channel
193
194  This change introduces the firmware side of the opaque HBRT <--> OPAL
195  message channel. We define a base message format to be shared with HBRT
196  (in include/prd-fw-msg.h), and allow firmware requests and responses to
197  be sent over this channel.
198
199  We don't currently have any notifications defined, so have nothing to do
200  for firmware_notify() at this stage.
201
202- opal-prd: Add firmware_request & firmware_notify implementations
203
204  This change adds the implementation of firmware_request() and
205  firmware_notify(). To do this, we need to add a message queue, so that
206  we can properly handle out-of-order messages coming from firmware.
207
208- opal-prd: Add support for variable-sized messages
209
210  With the introductuion of the opaque firmware channel, we want to
211  support variable-sized messages. Rather than expecting to read an
212  entire 'struct opal_prd_msg' in one read() call, we can split this
213  over mutiple reads, potentially expanding our message buffer.
214
215- opal-prd: Sync hostboot interfaces with HBRT
216
217  This change adds new callbacks defined for p9, and the base thunks for
218  the added calls.
219
220- opal-prd: interpret log level prefixes from HBRT
221
222  Interpret the (optional) \*_MRK log prefixes on HBRT messages, and set
223  the syslog log priority to suit.
224
225- opal-prd: Add occ reset to usage text
226- opal-prd: allow different chips for occ control actions
227
228  The `occ reset` and `occ error` actions can both take a chip id
229  argument, but we're currently just using zero. This change changes the
230  control message format to pass the chip ID from the control process to
231  the opal-prd daemon.
232
233
234PCI/PHB4
235^^^^^^^^
236
237- phb4: Fix number of index bits in IODA tables
238
239  On PHB4 the number of index bits in the IODA table address register
240  was bumped to 10 bits to accomodate for 1024 MSIs and 1024 TVEs (DD2).
241
242  However our macro only defined the field to be 9 bits, thus causing
243  "interesting" behaviours on some systems.
244
245- phb4: Harden init with bad PHBs
246
247  Currently if we read all 1's from the EEH or IRQ capabilities, we end
248  up train wrecking on some other random code (eg. an assert() in xive).
249
250  This hardens the PHB4 code to look for these bad reads and more
251  gracefully fails the init for that PHB alone.  This allows the rest of
252  the system to boot and ignore those bad PHBs.
253
254- phb4 capi (i.e. CAPI2): Handle HMI events
255
256  Find the CAPP on the chip associated with the HMI event for PHB4.
257  The recovery mode (re-initialization of the capp, resume of functional
258  operations) is only available with P9 DD2. A new patch will be provided
259  to support this feature.
260
261.. _capi2-5.7-rc1-rn:
262
263- phb4 capi (i.e. CAPI2): Enable capi mode for PHB4
264
265  Enable the Coherently attached processor interface. The PHB is used as
266  a CAPI interface.
267  CAPI Adapters can be connected to either PEC0 or PEC2. Single port
268  CAPI adapter can be connected to either PEC0 or PEC2, but Dual-Port
269  Adapter can be only connected to PEC2
270  * CAPP0 attached to PHB0(PEC0 - single port)
271  * CAPP1 attached to PHB3(PEC2 - single or dual port)
272
273- hw/phb4: Rework phb4_get_presence_state()
274
275  There are two issues in current implementation: It should return errcode
276  visibile to Linux, which has prefix OPAL_*. The code isn't very obvious.
277
278  This returns OPAL_HARDWARE when the PHB is broken. Otherwise, OPAL_SUCCESS
279  is always returned. In the mean while, It refactors the code to make it
280  obvious: OPAL_PCI_SLOT_PRESENT is returned when the presence signal (low active)
281  or PCIe link is active. Otherwise, OPAL_PCI_SLOT_EMPTY is returned.
282
283- phb4: Error injection for config space
284
285  Implement CFG (config space) error injection.
286
287  This works the same as PHB3.  MMIO and DMA error injection require a
288  rewrite, so they're unsupported for now.
289
290  While it's not feature complete, this at least provides an easy way to
291  inject an error that will trigger EEH.
292
293- phb4: Error clear implementation
294- phb4: Mask link down errors during reset
295
296  During a hot reset the PCI link will drop, so we need to mask link down
297  events to prevent unnecessary errors.
298- phb4: Implement root port initialization
299
300  phb4_root_port_init() was a NOP before, so fix that.
301- phb4: Complete reset implementation
302
303  This implements complete reset (creset) functionality for POWER9 DD1.
304
305  Only partially tested and contends with some DD1 errata, but it's a start.
306
307.. _shared-slot-5.7-rc1-rn:
308
309- phb4: Activate shared PCI slot on witherspoon
310
311  Witherspoon systems come with a 'shared' PCI slot: physically, it
312  looks like a x16 slot, but it's actually two x8 slots connected to two
313  PHBs of two different chips. Taking advantage of it requires some
314  logic on the PCI adapter. Only the Mellanox CX5 adapter is known to
315  support it at the time of this writing.
316
317  This patch enables support for the shared slot on witherspoon if a x16
318  adapter is detected. Each x8 slot has a presence bit, so both bits
319  need to be set for the activation to take place. Slot sharing is
320  activated through a gpio.
321
322  Note that there's no easy way to be sure that the card is indeed a
323  shared-slot compatible PCI adapter and not a normal x16 card. Plugging
324  a normal x16 adapter on the shared slot should be avoided on
325  witherspoon, as the link won't train on the second slot, resulting in
326  a timeout and a longer boot time. Only the first slot is usable and
327  the x16 adapter will end up using only half the lines.
328
329  If the PCI card plugged on the physical slot is only x8 (or less),
330  then the presence bit of the second slot is not set, so this patch
331  does nothing. The x8 (or less) adapter should work like on any other
332  physical slot.
333
334- phb4: Block D-state power management on direct slots
335
336  As current revisions of PHB4 don't properly handle the resulting
337  L1 link transition.
338
339- phb4: Call pci config filters
340
341- phb4: Mask out write-1-to-clear registers in RC cfg
342
343  The root complex config space only supports 4-byte accesses. Thus, when
344  the client requests a smaller size write, we do a read-modify-write to
345  the register.
346
347  However, some register have bits defined as "write 1 to clear".
348
349  If we do a RMW cycles on such a register and such bits are 1 in the
350  part that the client doesn't intend to modify, we will accidentally
351  write back those 1's and clear the corresponding bit.
352
353  This avoids it by masking out those magic bits from the "old" value
354  read from the register.
355
356- phb4: Properly mask out link down errors during reset
357- phb3/4: Silence a useless warning
358
359  PHB's don't have base location codes on non-FSP systems and it's
360  normal.
361
362- phb4: Workaround bug in spec 053
363
364  Wait for DLP PGRESET to clear *after* lifting the PCIe core reset
365
366- phb4: DD2.0 updates
367
368  Support StoreEOI, full complements of PEs (twice as big TVT)
369  and other updates.
370
371  Also renumber init steps to match spec 063
372
373
374NPU2
375^^^^
376
377Note that currently NPU2 support is limited to POWER9 DD1 hardware.
378
379- platforms/astbmc/witherspoon.c: Add NPU2 slot mappings
380
381  For NVLink2 to function PCIe devices need to be associated with the right
382  NVLinks. This association is supposed to be passed down to Skiboot via HDAT but
383  those fields are still not correctly filled out. To work around this we add slot
384  tables for the NVLinks similar to what we have for P8+.
385
386- hw/npu2.c: Fix device aperture calculation
387
388  The POWER9 NPU2 implements an address compression scheme to compress 56-bit P9
389  physical addresses to 47-bit GPU addresses. System software needs to know both
390  addresses, unfortunately the calculation of the compressed address was
391  incorrect. Fix it here.
392
393- hw/npu2.c: Change MCD BAR allocation order
394
395  MCD BARs need to be correctly aligned to the size of the region. As GPU
396  memory is allocated from the top of memory down we should start allocating
397  from the highest GPU memory address to the lowest to ensure correct
398  alignment.
399
400- NPU2: Add flag to nvlink config space indicating DL reset state
401
402  Device drivers need to be able to determine if the DL is out of reset or
403  not so they can safely probe to see if links have already been trained.
404  This patch adds a flag to the vendor specific config space indicating if
405  the DL is out of reset.
406
407- hw/npu2.c: Hardcode MSR_SF when setting up npu XTS contexts
408
409  We don't support anything other than 64-bit mode for address translations so we
410  can safely hardcode it.
411
412- hw/npu2-hw-procedures.c: Add nvram option to override zcal calculations
413
414  In some rare cases the zcal state machine may fail and flag an error. According
415  to hardware designers it is sometimes ok to ignore this failure and use nominal
416  values for the calculations. In this case we add a nvram variable
417  (nv_zcal_override) which will cause skiboot to ignore the failure and use the
418  nominal value specified in nvram.
419- npu2: Fix npu2_{read,write}_4b()
420
421  When writing or reading 4-byte values, we need to use the upper half of
422  the 64-bit SCOM register.
423
424  Fix npu2_{read,write}_4b() and their callers to use uint32_t, and
425  appropriately shift the value being written or returned.
426
427
428- hw/npu2.c: Fix opal_npu_map_lpar to search for existing BDF
429- hw/npu2-hw-procedures.c: Fix running of zcal procedure
430
431    The zcal procedure should only be run once per obus (ie. once per group of 3
432    links). Clean up the code and fix the potential buffer overflow due to a typo.
433    Also updates the zcal settings to their proper values.
434- hw/npu2.c: Add memory coherence directory programming
435
436  The memory coherence directory (MCD) needs to know which system memory addresses
437  belong to the GPU. This amounts to setting a BAR and a size in the MCD to cover
438  the addresses assigned to each of the GPUs. To ease assignment we assume GPUs
439  are assigned memory in a contiguous block per chip.
440
441
442pflash/libflash
443---------------
444
445- libflash/libffs: Zero checksum words
446
447  On writing ffs entries to flash libffs doesn't zero checksum words
448  before calculating the checksum across the entire structure. This causes
449  an inaccurate calculation of the checksum as it may calculate a checksum
450  on non-zero checksum bytes.
451
452- libffs: Fix ffs_lookup_part() return value
453
454  It would return success when the part wasn't found
455- libflash/libffs: Correctly update the actual size of the partition
456
457  libffs has been updating FFS partition information in the wrong place
458  which leads to incomplete erases and corruption.
459- libflash: Initialise entries list earlier
460
461  In the bail-out path we call ffs_close() to tear down the partially
462  initialised ffs_handle. ffs_close() expects the entries list to be
463  initialised so we need to do that earlier to prevent a null pointer
464  dereference.
465
466mbox-flash
467----------
468
469mbox-flash is the emerging standard way of talking to host PNOR flash
470on POWER9 systems.
471
472- libflash/mbox-flash: Implement MARK_WRITE_ERASED mbox call
473
474  Version two of the mbox-flash protocol defines a new command:
475  MARK_WRITE_ERASED.
476
477  This command provides a simple way to mark a region of flash as all 0xff
478  without the need to go and write all 0xff. This is an optimisation as
479  there is no need for an erase before a write, it is the responsibility of
480  the BMC to deal with the flash correctly, however in v1 it was ambiguous
481  what a client should do if the flash should be erased but not actually
482  written to. This allows of a optimal path to resolve this problem.
483
484- libflash/mbox-flash: Update to V2 of the protocol
485
486  Updated version 2 of the protocol can be found at:
487  https://github.com/openbmc/mboxbridge/blob/master/Documentation/mbox_protocol.md
488
489  This commit changes mbox-flash such that it will preferentially talk
490  version 2 to any capable daemon but still remain capable of talking to
491  v1 daemons.
492
493  Version two changes some of the command definitions for increased
494  consistency and usability.
495  Version two includes more attention bits - these are now dealt with at a
496  simple level.
497- libflash/mbox-flash: Implement MARK_WRITE_ERASED mbox call
498
499  Version two of the mbox-flash protocol defines a new command:
500  MARK_WRITE_ERASED.
501
502  This command provides a simple way to mark a region of flash as all 0xff
503  without the need to go and write all 0xff. This is an optimisation as
504  there is no need for an erase before a write, it is the responsibility of
505  the BMC to deal with the flash correctly, however in v1 it was ambiguous
506  what a client should do if the flash should be erased but not actually
507  written to. This allows of a optimal path to resolve this problem.
508
509- libflash/mbox-flash: Update to V2 of the protocol
510
511  Updated version 2 of the protocol can be found at:
512  https://github.com/openbmc/mboxbridge/blob/master/Documentation/mbox_protocol.md
513
514  This commit changes mbox-flash such that it will preferentially talk
515  version 2 to any capable daemon but still remain capable of talking to
516  v1 daemons.
517
518  Version two changes some of the command definitions for increased
519  consistency and usability.
520  Version two includes more attention bits - these are now dealt with at a
521  simple level.
522
523- hw/lpc-mbox: Use message registers for interrupts
524
525  Currently the BMC raises the interrupt using the BMC control register.
526  It does so on all accesses to the 16 'data' registers meaning that when
527  the BMC only wants to set the ATTN (on which we have interrupts enabled)
528  bit we will also get a control register based interrupt.
529
530  The solution here is to mask that interrupt permanantly and enable
531  interrupts on the protocol defined 'response' data byte.
532
533General fixes
534-------------
535
536- Reduce log level on non-error log messages
537
538  90% of what we print isn't useful to a normal user. This
539  dramatically reduces the amount of messages printed by
540  OPAL in normal circumstances.
541
542- init: Silence messages and call ourselves "OPAL"
543- psi: Switch to ESB mode later
544
545  There's an errata, if we switch to ESB mode before setting up
546  the various ESB mode related registers, a pending interrupts
547  can go wrong.
548
549- lpc: Enable "new" SerIRQ mode
550- hw/ipmi/ipmi-sel: missing newline in prlog warning
551
552- p8-i2c OCC lock: fix locking in p9_i2c_bus_owner_change
553- Convert important polling loops to spin at lowest SMT priority
554
555  The pattern of calling cpu_relax() inside a polling loop does
556  not suit the powerpc SMT priority instructions. Prefrred is to
557  set a low priority then spin until break condition is reached,
558  then restore priority.
559
560- Improve cpu_idle when PM is disabled
561
562  Split cpu_idle() into cpu_idle_delay() and cpu_idle_job() rather than
563  requesting the idle type as a function argument. Have those functions
564  provide a default polling (non-PM) implentation which spin at the
565  lowest SMT priority.
566
567- core/fdt: Always add a reserve map
568
569  Currently we skip adding the reserved ranges block to the generated
570  FDT blob if we are excluding the root node. This can result in a DTB
571  that dtc will barf on because the reserved memory ranges overlap with
572  the start of the dt_struct block. As an example: ::
573
574    $ fdtdump broken.dtb -d
575    /dts-v1/;
576    // magic:               0xd00dfeed
577    // totalsize:           0x7f3 (2035)
578    // off_dt_struct:       0x30  <----\
579    // off_dt_strings:      0x7b8       | this is bad!
580    // off_mem_rsvmap:      0x30  <----/
581    // version:             17
582    // last_comp_version:   16
583    // boot_cpuid_phys:     0x0
584    // size_dt_strings:     0x3b
585    // size_dt_struct:      0x788
586
587    /memreserve/ 0x100000000 0x300000004;
588    /memreserve/ 0x3300000001 0x169626d2c;
589    /memreserve/ 0x706369652d736c6f 0x7473000000000003;
590            *continues*
591
592  With this patch: ::
593
594    $ fdtdump working.dtb -d
595    /dts-v1/;
596    // magic:               0xd00dfeed
597    // totalsize:           0x803 (2051)
598    // off_dt_struct:       0x40
599    // off_dt_strings:      0x7c8
600    // off_mem_rsvmap:      0x30
601    // version:             17
602    // last_comp_version:   16
603    // boot_cpuid_phys:     0x0
604    // size_dt_strings:     0x3b
605    // size_dt_struct:      0x788
606
607    // 0040: tag: 0x00000001 (FDT_BEGIN_NODE)
608    / {
609    // 0048: tag: 0x00000003 (FDT_PROP)
610    // 07fb: string: phandle
611    // 0054: value
612        phandle = <0x00000001>;
613            *continues*
614
615- hw/lpc-mbox: Use message registers for interrupts
616
617  Currently the BMC raises the interrupt using the BMC control register.
618  It does so on all accesses to the 16 'data' registers meaning that when
619  the BMC only wants to set the ATTN (on which we have interrupts enabled)
620  bit we will also get a control register based interrupt.
621
622  The solution here is to mask that interrupt permanantly and enable
623  interrupts on the protocol defined 'response' data byte.
624
625
626PCI
627---
628- pci: Wait 20ms before checking presence detect on PCIe
629
630  As the PHB presence logic has a debounce timer that can take
631  a while to settle.
632
633- phb3+iov: Fixup support for config space filters
634
635  The filter should be called before the HW access and its
636  return value control whether to perform the access or not
637- core/pci: Use PCI slot's power facality in pci_enable_bridge()
638
639  The current implmentation has incorrect assumptions: there is
640  always a PCI slot associated with root port and PCIe switch
641  downstream port and all of them are capable to change its
642  power state by register PCICAP_EXP_SLOTCTL. Firstly, there
643  might not a PCI slot associated with the root port or PCIe
644  switch downstream port. Secondly, the power isn't controlled
645  by standard config register (PCICAP_EXP_SLOTCTL). There are
646  I2C slave devices used to control the power states on Tuleta.
647
648  In order to use the PCI slot's methods to manage the power
649  states, this does:
650
651  * Introduce PCI_SLOT_FLAG_ENFORCE, indicates the request operation
652    is enforced to be applied.
653  * pci_enable_bridge() is split into 3 functions: pci_bridge_power_on()
654    to power it on; pci_enable_bridge() as a place holder and
655    pci_bridge_wait_link() to wait the downstream link to come up.
656  * In pci_bridge_power_on(), the PCI slot's specific power management
657    methods are used if there is a PCI slot associated with the PCIe
658    switch downstream port or root port.
659- platforms/astbmc/slots.c: Allow comparison of bus numbers when matching slots
660
661  When matching devices on multiple down stream PLX busses we need to compare more
662  than just the device-id of the PCIe BDFN, so increase the mask to do so.
663
664Tests and simulators
665--------------------
666
667- boot-tests: add OpenBMC support
668- boot_test.sh: Add SMC BMC support
669
670  Your BMC needs a special debug image flashed to use this, the exact
671  image and methods aren't something I can publish here, but if you work
672  for IBM or SMC you can find out from the right sources.
673
674  A few things are needed to move around to be able to flash to a SMC BMC.
675
676  For a start, the SSH daemon will only accept connections after a special
677  incantation (which I also can't share), but you should put that in the
678  ~/.skiboot_boot_tests file along with some other default login information
679  we don't publicise too broadly (because Security Through Obscurity is
680  *obviously* a good idea....)
681
682  We also can't just directly "ssh /bin/true", we need an expect script,
683  and we can't scp, but we can anonymous rsync!
684
685  You also need a pflash binary to copy over.
686- hdata_to_dt: Add PVR overrides to the usage text
687- mambo: Add a reservation for the initramfs
688
689  On most systems the initramfs is loaded inside the part of memory
690  reserved for the OS [0x0-0x30000000] and skiboot will never touch it.
691  On mambo it's loaded at 0x80000000 and if you're unlucky skiboot can
692  allocate over the top of it and corrupt the initramfs blob.
693
694  There might be the downside that the kernel cannot re-use the initramfs
695  memory since it's marked as reserved, but the kernel might also free it
696  anyway.
697- mambo: Update P9 PVR to reflect Scale out 24 core chips
698
699  The P9 PVR bits 48:51 don't indicate a revision but instead different
700  configurations.  From BookIV we have:
701
702  ==== ===================
703  Bits Configuration
704  ==== ===================
705     0 Scale out 12 cores
706     1 Scale out 24 cores
707     2 Scale up 12 cores
708     3 Scale up 24 cores
709  ==== ===================
710
711  Skiboot will mostly the use "Scale out 24 core" configuration
712  (ie. SMT4 not SMT8) so reflect this in mambo.
713- core: Move enable_mambo_console() into chip initialisation
714
715  Rather than having a wart in main_cpu_entry() that initialises the mambo
716  console, we can move it into init_chips() which is where we discover that we're
717  on mambo.
718
719- mambo: Create multiple chips when we have multiple CPUs
720
721  Currently when we boot mambo with multiple CPUs, we create multiple CPU nodes in
722  the device tree, and each claims to be on a separate chip.
723
724  However we don't create multiple xscom nodes, which means skiboot only knows
725  about a single chip, and all CPUs end up on it. At the moment mambo is not able
726  to create multiple xscom controllers. We can create fake ones, just by faking
727  the device tree up, but that seems uglier than this solution.
728
729  So create a mambo-chip for each CPU other than 0, to tell skiboot we want a
730  separate chip created. This then enables Linux to see multiple chips: ::
731
732      smp: Brought up 2 nodes, 2 CPUs
733      numa: Node 0 CPUs: 0
734      numa: Node 1 CPUs: 1
735
736- chip: Add support for discovering chips on mambo
737
738  Currently the only way for skiboot to discover chips is by looking for xscom
739  nodes. But on mambo it's currently not possible to create multiple xscom nodes,
740  which means we can only simulate a single chip system.
741
742  However it seems we can fairly cleanly add support for a special mambo chip
743  node, and use that to instantiate multiple chips.
744
745  Add a check in init_chip() that we're not clobbering an already initialised
746  chip, now that we have two places that initialise chips.
747- mambo: Make xscom claim to be DD 2.0
748
749  In the mambo tcl we set the CPU version to DD 2.0, because mambo is not
750  bug compatible with DD 1.
751
752  But in xscom_read_cfam_chipid() we have a hard coded value, to work
753  around the lack of the f000f register, which claims to be P9 DD 1.0.
754
755  This doesn't seem to cause crashes or anything, but at boot we do see: ::
756
757      [    0.003893084,5] XSCOM: chip 0x0 at 0x1a0000000000 [P9N DD1.0]
758
759  So fix it to claim that the xscom is also DD 2.0 to match the CPU.
760
761- mambo: Match whole string when looking up symbols with linsym/skisym
762
763  linsym/skisym use a regex to match the symbol name, and accepts a
764  partial match against the entry in the symbol map, which can lead to
765  somewhat confusing results, eg: ::
766
767      systemsim % linsym early_setup
768      0xc000000000027890
769      systemsim % linsym early_setup$
770      0xc000000000aa8054
771      systemsim % linsym early_setup_secondary
772      0xc000000000027890
773
774  I don't think that's the behaviour we want, so append a $ to the name so
775  that the symbol has to match against the whole entry, eg: ::
776
777      systemsim % linsym early_setup
778      0xc000000000aa8054
779
780- Disable nap on P8 Mambo, public release has bugs
781- mambo: Allow loading multiple CPIOs
782
783  Currently we have support for loading a single CPIO and telling Linux to
784  use it as the initrd. But the Linux code actually supports having
785  multiple CPIOs contiguously in memory, between initrd-start and end, and
786  will unpack them all in order. That is a really nice feature as it means
787  you can have a base CPIO with your root filesystem, and then tack on
788  others as you need for various tests etc.
789
790  So expand the logic to handle SKIBOOT_INITRD, and treat it as a comma
791  separated list of CPIOs to load. I chose comma as it's fairly rare in
792  filenames, but we could make it space, colon, whatever. Or we could add
793  a new environment variable entirely. The code also supports trimming
794  whitespace from the values, so you can have "cpio1, cpio2".
795- hdata/test: Add memory reservations to hdata_to_dt
796
797  Currently memory reservations are parsed, but since they are not
798  processed until mem_region_init() they don't appear in the output
799  device tree blob. Several bugs have been found with memory reservations
800  so we want them to be part of the test output.
801
802  Add them and clean up several usages of printf() since we want only the
803  dtb to appear in standard out.
804
805IBM FSP systems
806---------------
807
808- FSP/CONSOLE: Fix possible NULL dereference
809- platforms/ibm-fsp/firenze: Fix PCI slot power-off pattern
810
811  When powering off the PCI slot, the corresponding bits should
812  be set to 0bxx00xx00 instead of 0bxx11xx11. Otherwise, the
813  specified PCI slot can't be put into power-off state. Fortunately,
814  it didn't introduce any side-effects so far.
815- FSP/CONSOLE: Workaround for unresponsive ipmi daemon
816
817  We use TCE mapped area to write data to console. Console header
818  (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates
819  next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer).
820
821  Kernel makes opal_console_write() OPAL call to write data to console.
822  OPAL write data to TCE mapped area and sends MBOX command to FSP.
823  If our console becomes full and we have data to write to console,
824  we keep on waiting until FSP reads data.
825
826  In some corner cases, where FSP is active but not responding to
827  console MBOX message (due to buggy IPMI) and we have heavy console
828  write happening from kernel, then eventually our console buffer
829  becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to
830  kernel. Kernel will keep on retrying. This is creating kernel soft
831  lockups. In some extreme case when every CPU is trying to write to
832  console, user will not be able to ssh and thinks system is hang.
833
834  If we reset FSP or restart IPMI daemon on FSP, system recovers and
835  everything becomes normal.
836
837  This patch adds workaround to above issue by returning OPAL_HARDWARE
838  when cosole is full. Side effect of this patch is, we may endup dropping
839  latest console data. But better to drop console data than system hang.
840
841- FSP: Set status field in response message for timed out message
842
843  For timed out FSP messages, we set message status as "fsp_msg_timeout".
844  But most FSP driver users (like surviellance) are ignoring this field.
845  They always look for FSP returned status value in callback function
846  (second byte in word1). So we endup treating timed out message as success
847  response from FSP.
848
849  Sample output: ::
850
851    [69902.432509048,7] SURV: Sending the heartbeat command to FSP
852    [70023.226860117,4] FSP: Response from FSP timed out, word0 = d66a00d7, word1 = 0 state: 3
853    ....
854    [70023.226901445,7] SURV: Received heartbeat acknowledge from FSP
855    [70023.226903251,3] FSP: fsp_trigger_reset() entry
856
857  Here SURV code thought it got valid response from FSP. But actually we didn't
858  receive response from FSP.
859
860  This patch fixes above issue by updating status field in response structure.
861
862- FSP: Improve timeout message
863
864- FSP/RTC: Fix possible FSP R/R issue in rtc write path
865- hw/fsp/rtc: read/write cached rtc tod on fsp hir.
866
867  Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
868  reset. Use latest fsp_in_rr() function to properly read the cached rtc
869  value when fsp reset initiated by the hir.
870
871  Below is the kernel trace when we set hw clock, when hir process starts. ::
872
873    [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
874    [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
875    [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
876    [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
877    [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
878    [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901   Not tainted  (4.10.0-14-generic)
879    [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
880    [ 1727.775889]   CR: 28024442  XER: 20000000
881    [ 1727.775890] CFAR: c00000000008472c SOFTE: 1
882                   GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
883                   GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
884                   GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
885                   GPR12: c0000000000846e8 c00000000fba0100
886    [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
887    [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
888    [ 1727.775899] Call Trace:
889    [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
890    [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
891    [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
892    [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
893    [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
894    [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
895    [ 1727.775908] Instruction dump:
896    [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
897    [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
898
899  This is found when executing the testcase
900  https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
901
902  With this fix ran fsp hir torture testcase in the above test
903  which is working fine.
904- occ: Set return variable to correct value
905
906  When entering this section of code rc will be zero. If fsp_mkmsg() fails
907  the code responsible for printing an error message won't be set.
908  Resetting rc should allow for the error case to trigger if fsp_mkmsg
909  fails.
910- capp: Fix hang when CAPP microcode LID is missing on FSP machine
911
912  When the LID is absent, we fail early with an error from
913  start_preload_resource. In that case, capp_ucode_info.load_result
914  isn't set properly causing a subsequent capp_lid_download() to
915  call wait_for_resource_loaded() on something that isn't being
916  loaded, thus hanging.
917
918- FSP: Add check to detect FSP R/R inside fsp_sync_msg()
919
920  OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued
921  -> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response
922  from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue
923  including inflight message (fsp_reset_cmdclass()). But we are not resetting
924  inflight message state.
925
926  In extreme croner case where we sent message to FSP via fsp_sync_msg() path
927  and FSP R/R happens before getting respose from FSP, then we will endup waiting
928  in fsp_sync_msg() until everything becomes normal.
929
930  This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller
931    if FSP is in R/R.
932- FSP: Add check to detect FSP R/R inside fsp_sync_msg()
933
934  OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued
935  -> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response
936  from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue
937  including inflight message (fsp_reset_cmdclass()). But we are not resetting
938  inflight message state.
939
940  In extreme croner case where we sent message to FSP via fsp_sync_msg() path
941  and FSP R/R happens before getting respose from FSP, then we will endup waiting
942  in fsp_sync_msg() until everything becomes normal.
943
944  This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller
945    if FSP is in R/R.
946- capp: Fix hang when CAPP microcode LID is missing on FSP machine
947
948  When the LID is absent, we fail early with an error from
949  start_preload_resource. In that case, capp_ucode_info.load_result
950  isn't set properly causing a subsequent capp_lid_download() to
951  call wait_for_resource_loaded() on something that isn't being
952  loaded, thus hanging.
953- FSP/CONSOLE: Do not free fsp_msg in error path
954
955  as we reuse same msg to send next output message.
956
957- platform/zz: Acknowledge OCC_LOAD mbox message in ZZ
958
959  In P9 FSP box, OCC image is pre-loaded. So do not handle the load
960  command and send SUCCESS to FSP on recieving OCC_LOAD mbox message.
961
962- FSP/RTC: Improve error log
963
964astbmc systems
965--------------
966
967- platforms/astbmc: Don't validate model on palmetto
968
969  The platform isn't compatible with palmetto until the root device-tree
970  node's "model" property is NULL or "palmetto". However, we could have
971  "TN71-BP012" for the property on palmetto. ::
972
973       linux# cat /proc/device-tree/model
974       TN71-BP012
975
976  This skips the validation on root device-tree node's "model" property
977  on palmetto, meaning we check the "compatible" property only.
978
979
980