1.. _skiboot-6.2:
2
3skiboot-6.2
4===============
5
6skiboot v6.2 was released on Friday December 14th 2018. It is the first
7release of skiboot 6.2, which becomes the new stable release
8of skiboot following the 6.1 release, first released July 11th 2018.
9
10Skiboot 6.2 will mark the basis for op-build v2.2.
11
12skiboot v6.2 contains all bug fixes as of :ref:`skiboot-6.0.14`,
13and :ref:`skiboot-5.4.10` (the currently maintained
14stable releases).
15
16For how the skiboot stable releases work, see :ref:`stable-rules` for details.
17
18This release has been a longer cycle than typical for a variety of reasons. It
19also contains a lot of cleanup work and minor bug fixes (much like skiboot 6.1
20did).
21
22Over skiboot 6.1, we have the following changes:
23
24General
25-------
26
27Since v6.2-rc2:
28
29- i2c: Fix i2c request hang during opal init if timers are not checked
30
31  If an i2c request cannot go through the first time, because the bus is
32  found in error and need a reset or it's locked by the OCC for example,
33  the underlying i2c implementation is using timers to manage the
34  request. However during opal init, opal pollers may not be called, it
35  depends in the context in which the i2c request is made. If the
36  pollers are not called, the timers are not checked and we can end up
37  with an i2c request which will not move foward and skiboot hangs.
38
39  Fix it by explicitly checking the timers if we are waiting for an i2c
40  request to complete and it seems to be taking a while.
41
42Since v6.1:
43
44- cpu: Quieten OS endian switch messages
45
46  Users see these when loading an OS from Petitboot: ::
47
48     [  119.486794100,5] OPAL: Switch to big-endian OS
49     [  120.022302604,5] OPAL: Switch to little-endian OS
50
51  Which is expected and doesn't provide any information the user can act
52  on. Switch them to PR_INFO so they still appear in the log, but not on
53  the serial console.
54- Recognise signed VERSION partition
55
56  A few things need to change to support a signed VERSION partition:
57
58  - A signed VERSION partition will be 4K + SECURE_BOOT_HEADERS_SIZE (4K).
59  - The VERSION partition needs to be loaded after secure/trusted boot is
60    set up, and therefore after nvram_init().
61  - Added to the trustedboot resources array.
62
63  This also moves the ipmi_dt_add_bmc_info() call to after
64  flash_dt_add_fw_version() since it adds info to ibm,firmware-versions.
65- Run pollers in time_wait() when not booting
66
67  This only bit us hard with hiomap in one scenario.
68
69  Our OPAL API has been OPAL_POLL_EVENTS may be needed to make forward
70  progress on ongoing operations, and the internal to skiboot API has been
71  that time_wait() of a suitable time will run pollers (on at least one
72  CPU) to help ensure forward progress can be made.
73
74  In a perfect world, interrupts are used but they may: a) be disabled, or
75  b) the thing we're doing can't use interrupts because computers are
76  generally terrible.
77
78  Back in 3db397ea5892a (circa 2015), we changed skiboot so that we'd run
79  pollers only on the boot CPU, and not if we held any locks. This was to
80  reduce the chance of programming code that could deadlock, as well as to
81  ensure that we didn't just thrash all the cachelines for running pollers
82  all over a large system during boot, or hard spin on the same locks on
83  all secondary CPUs.
84
85  The problem arises if the OS we're booting makes an OPAL call early on,
86  with interrupts disabled, that requires a poller to run to make forward
87  progress. An example of this would be OPAL_WRITE_NVRAM early in Linux
88  boot (where Linux sets up the partitions it wants) - something that
89  occurs iff we've had to reformat NVRAM this boot (i.e. first boot or
90  corrupted NVRAM).
91
92  The hiomap implementation should arguably *not* rely on synchronous IPMI
93  messages, but this is a future improvement (as was for mbox before it).
94  The mbox-flash code solved this problem by spinning on check_timers().
95
96  More generically though, the approach of running the pollers when no
97  longer booting means we behave more in line with what the API is meant
98  to be, rather than have this odd case of "time_wait() for a condition
99  that could also be tripped by an interrupt works fine unless the OS is
100  up and running but hasn't set interrupts up yet".
101- ipmi: Reduce ipmi_queue_msg_sync() polling loop time to 10ms
102
103  On a plain boot, this reduces the time spent in OPAL by ~170ms on
104  p9dsu. This is due to hiomap (currently) using synchronous IPMI
105  messages.
106
107  It will also *significantly* reduce latency on runtime flash
108  operations for hiomap, as we'll spend typically 10-20ms in OPAL
109  rather than 100-200ms. It's not an ideal solution to that, but
110  it's a quick and obvious win for jitter.
111- core/device: NULL pointer dereference fix
112- core/flash: NULL pointer dereference fixes
113- core/cpu: Call memset with proper cpu_thread offset
114- libflash: Add ipmi-hiomap, and prefer it for PNOR access
115
116  ipmi-hiomap implements the PNOR access control protocol formerly known
117  as "the mbox protocol" but uses IPMI instead of the AST LPC mailbox as a
118  transport. As there is no-longer any mailbox involved in this alternate
119  implementation the old protocol name is quite misleading, and so it has
120  been renamed to "the hiomap protoocol" (Host I/O Mapping protocol). The
121  same commands and events are used though this client-side implementation
122  assumes v2 of the protocol is supported by the BMC.
123
124  The code is a heavily-reworked copy of the mbox-flash source and is
125  introduced this way to allow for the mbox implementation's eventual
126  removal.
127
128  mbox-flash should in theory be renamed to mbox-hiomap for consistency,
129  but as it is on life-support effective immediately we may as well just
130  remove it entirely when the time is right.
131- opal/hmi: Handle early HMIs on thread0 when secondaries are still in OPAL.
132
133  When primary thread receives a CORE level HMI for timer facility errors
134  while secondaries are still in OPAL, thread 0 ends up in rendez-vous
135  waiting for secondaries to get into hmi handling. This is because OPAL
136  runs with MSR(EE=0) and hence HMIs are delayed on secondary threads until
137  they are given to Linux OS. Fix this by adding a check for secondary
138  state and force them in hmi handling by queuing job on secondary threads.
139
140  I have tested this by injecting HDEC parity error very early during Linux
141  kernel boot. Recovery works fine for non-TB errors. But if TB is bad at
142  this very eary stage we already doomed.
143
144  Without this patch we see: ::
145
146    [  285.046347408,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c
147    [  285.051160609,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c
148    [  285.055359021,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
149    [  285.055361439,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e14000) Timer Facility Error
150    [  286.232183823,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc1)
151    [  287.409002056,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc1)
152    [  289.073820164,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc1)
153    [  290.250638683,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc2)
154    [  291.427456821,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc2)
155    [  293.092274807,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc2)
156    [  294.269092904,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 1 (sptr=0000ccc3)
157    [  295.445910944,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 2 (sptr=0000ccc3)
158    [  297.110728970,3] HMI: Rendez-vous stage 1 timeout, CPU 0x844 waiting for thread 3 (sptr=0000ccc3)
159
160  After this patch: ::
161
162    [  259.401719351,7] OPAL: Start CPU 0x0841 (PIR 0x0841) -> 0x000000000000a83c
163    [  259.406259572,7] OPAL: Start CPU 0x0842 (PIR 0x0842) -> 0x000000000000a83c
164    [  259.410615534,7] OPAL: Start CPU 0x0843 (PIR 0x0843) -> 0x000000000000a83c
165    [  259.415444519,7] OPAL: Start CPU 0x0844 (PIR 0x0844) -> 0x000000000000a83c
166    [  259.419641401,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
167    [  259.419644124,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:0: TFMR(2e12002870e04000) Timer Facility Error
168    [  259.419650678,7] HMI: Sending hmi job to thread 1
169    [  259.419652744,7] HMI: Sending hmi job to thread 2
170    [  259.419653051,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
171    [  259.419654725,7] HMI: Sending hmi job to thread 3
172    [  259.419654916,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
173    [  259.419658025,7] HMI: Received HMI interrupt: HMER = 0x0840000000000000
174    [  259.419658406,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:2: TFMR(2e12002870e04000) Timer Facility Error
175    [  259.419663095,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:3: TFMR(2e12002870e04000) Timer Facility Error
176    [  259.419655234,7] HMI: [Loc: U78D3.ND1.WZS004A-P1-C48]: P:8 C:17 T:1: TFMR(2e12002870e04000) Timer Facility Error
177    [  259.425109779,7] OPAL: Start CPU 0x0845 (PIR 0x0845) -> 0x000000000000a83c
178    [  259.429870681,7] OPAL: Start CPU 0x0846 (PIR 0x0846) -> 0x000000000000a83c
179    [  259.434549250,7] OPAL: Start CPU 0x0847 (PIR 0x0847) -> 0x000000000000a83c
180
181- core/cpu: Fix memory allocation for job array
182
183  fixes: 7a3f307e core/cpu: parallelise global CPU register setting jobs
184
185  This bug would result in boot-hang on some configurations due to
186  cpu_wait_job() endlessly waiting for the last bogus jobs[cpu->pir] pointer.
187- i2c: Fix multiple-enqueue of the same request on NACK
188
189  i2c_request_send() will retry the request if the error is a NAK,
190  however it forgets to clear the "ud.done" flag. It will thus
191  loop again and try to re-enqueue the same request causing internal
192  request list corruption.
193- i2c: Ensure ordering between i2c_request_send() and completion
194
195  i2c_request_send loops waiting for a flag "uc.done" set by
196  the completion routine, and then look for a result code
197  also set by that same completion.
198
199  There is no synchronization, the completion can happen on another
200  processor, so we need to order the stores to uc and the reads
201  from uc so that uc.done is stored last and tested first using
202  memory barriers.
203- pci: Clarify power down logic
204
205  Currently pci_scan_bus() unconditionally calls pci_slot_set_power_state()
206  when it's finished scanning a bus. This is one of those things that
207  makes you go "WHAT?" when you first see it and frankly the skiboot PCI
208  code could do with less of that.
209
210Fast Reboot
211^^^^^^^^^^^
212
213- fast-reboot: parallel memory clearing
214
215  Arbitrarily pick 16GB as the unit of parallelism, and
216  split up clearing memory into jobs and schedule them
217  node-local to the memory (or on node 0 if we can't
218  work that out because it's the memory up to SKIBOOT_BASE)
219
220  This seems to cut at least ~40% time from memory zeroing on
221  fast-reboot on a 256GB Boston system.
222
223  For many systems, scanning PCI takes about as much time as
224  zeroing all of RAM, so we may as well do them at the same time
225  and cut a few seconds off the total fast reboot time.
226- fast-reboot: verify firmware "romem" checksum
227
228  This takes a checksum of skiboot memory after boot that should be
229  unchanged during OS operation, and verifies it before allowing a
230  fast reboot.
231
232  This is not read-only memory from skiboot's point of view, beause
233  it includes things like the opal branch table that gets populated
234  during boot.
235
236  This helps to improve the integrity of firmware against host and
237  runtime firmware memory scribble bugs.
238
239- core/fast-reboot: print the fast reboot disable reason
240
241  Once things start to go wrong, disable_fast_reboot can be called a
242  number of times, so make the first reason sticky, and also print it
243  to the console at disable time. This helps with making sense of
244  fast reboot disables.
245- Add fast-reboot property to /ibm,opal DT node
246
247  this means that if it's permanently disabled on boot, the test suite can
248  pick that up and not try a fast reboot test.
249
250Utilities
251---------
252
253Since v6.2-rc2:
254
255- opal-prd: hservice: Enable hservice->wakeup() in BMC
256
257  This patch enables HBRT to use HYP special wakeup register in openBMC
258  which until now was only used in FSP based machines.
259
260  This patch also adds a capability check for opal-prd so that HBRT can
261  decide if the host special wakeup register can be used.
262- ffspart: Support flashing already ECC protected images
263
264  We do this by assuming filenames with '.ecc' in them are already ECC
265  protected.
266
267  This solves a practical problem in transitioning op-build to use ffspart
268  for pnor assembly rather than three perl scripts and a lot of XML.
269
270  We also update the ffspart tests to take into account ECC requirements.
271- ffspart: Increase MAX_LINE to above PATH_MAX
272
273  Otherwise we saw failures in CI and the ~221 character paths Jankins
274  likes to have.
275- libflash/file: greatly increase perf of file_erase()
276
277  Do 4096 byte chunks not 8 byte chunks. A ffspart invocation constructing
278  a 64MB PNOR goes from a couple of seconds to ~0.1seconds with this
279  patch.
280
281Since v6.2-rc1:
282- libflash: Don't merge ECC-protected ranges
283
284  Libflash currently merges contiguous ECC-protected ranges, but doesn't
285  check that the ECC bytes at the end of the first and start of the second
286  range actually match sanely. More importantly, if blocklevel_read() is
287  called with a position at the start of a partition that is contained
288  somewhere within a region that has been merged it will update the
289  position assuming ECC wasn't being accounted for. This results in the
290  position being somewhere well after the actual start of the partition
291  which is incorrect.
292
293  For now, remove the code merging ranges. This means more ranges must be
294  held and checked however it prevents incorrectly reading ECC-correct
295  regions like below: ::
296
297    [  174.334119453,7] FLASH: CAPP partition has ECC
298    [  174.437349574,3] ECC: uncorrectable error: ffffffffffffffff ff
299    [  174.437426306,3] FLASH: failed to read the first 0x1000 from CAPP partition, rc 14
300    [  174.439919343,3] CAPP: Error loading ucode lid. index=201d1
301
302- libflash: Restore blocklevel tests
303
304  This fell out in f58be46 "libflash/test: Rewrite Makefile.check to
305  improve scalability". Add it back in as test-blocklevel.
306
307Since v6.1:
308
309- pflash: Add --skip option for reading
310
311  Add a --skip=N option to pflash to skip N number of bytes when reading.
312  This would allow users to print the VERSION partition without the STB
313  header by specifying the --skip=4096 argument, and it's a more generic
314  solution rather than making pflash depend on secure/trusted boot code.
315- xscom-utils: Rework getsram
316
317  Allow specifying a file on the command line to read OCC SRAM data into.
318  If no file is specified then we print it to stdout as text. This is a
319  bit inconsistent, but it retains compatibility with the existing tool.
320- xscom-utils/getsram: Make it work on P9
321
322  The XSCOM base address of the OCC control registers changed slightly
323  between P8 and P9. Fix this up and add a bit of PVR checking so we look
324  in the right place.
325- opal-prd: Fix opal-prd crash
326
327  Presently callback function from HBRT uses r11 to point to target function
328  pointer. r12 is garbage. This works fine when we compile with "-no-pie" option
329  (as we don't use r12 to calculate TOC).
330
331  As per ABIv2 : "r12 : Function entry address at global entry point"
332
333  With "-pie" compilation option, we have to set r12 to point to global function
334  entry point. So that we can calculate TOC properly.
335
336  Crash log without this patch: ::
337
338      opal-prd[2864]: unhandled signal 11 at 0000000000029320 nip 00000 00102012830 lr 0000000102016890 code 1
339
340
341Development and Debugging
342-------------------------
343
344Since v6.1-rc1:
345- Warn on long OPAL calls
346
347  Measure entry/exit time for OPAL calls and warn appropriately if the
348  calls take too long (>100ms gets us a DEBUG log, > 1000ms gets us a
349  warning).
350
351Since v6.1:
352
353- core/lock: Use try_lock_caller() in lock_caller() to capture owner
354
355  Otherwise we can get reports of core/lock.c owning the lock, which is
356  not helpful when tracking down ownership issues.
357- core/flash: Emit a warning if Skiboot version doesn't match
358
359  This means you'll get a warning that you've modified skiboot separately
360  to the rest of the PNOR image, which can be useful in determining what
361  firmware is actually running on a machine.
362- gcov: link in ctors* as newer GCC doesn't group them all
363
364  It seems that newer toolchains get us multiple ctors sections to link in
365  rather than just one. If we discard them (as we were doing), then we
366  don't have a working gcov build (and we get the "doesn't look sane"
367  warning on boot).
368- core/flash: Log return code when ffs_init() fails
369
370  Knowing the return code is at least better than not knowing the return
371  code.
372- gcov: Fix building with GCC8
373- travis/ci: rework Dockerfiles to produce build artifacts
374
375  ubuntu-latest was also missing clang, as ubuntu-latest is closer to
376  ubuntu 18.04 than 16.04
377- cpu: add cpu_queue_job_on_node()
378
379  Add a job scheduling API which will run the job on the requested
380  chip_id (or return failure).
381- opal-ci: Build old dtc version for fedora 28
382
383  There are patches that will go into dtc to fix the issues we hit, but
384  for the moment let's just build and use a slightly older version.
385- mem_region: Merge similar allocations when dumping
386
387  Currently we print one line for each allocation done at runtime when
388  dumping the memory allocations. We do a few thousand allocations at
389  boot so this can result in a huge amount of text being printed which
390  is a) slow to print, and b) Can result in the log buffer overflowing
391  which destroys otherwise useful information.
392
393  This patch adds a de-duplication to this memory allocation dump by
394  merging "similar" allocations (same location, same size) into one.
395
396  Unfortunately, the algorithm used to do the de-duplication is quadratic,
397  but considering we only dump the allocations in the event of a fatal
398  error I think this is acceptable. I also did some benchmarking and found
399  that on a ZZ it takes ~3ms to do a dump with 12k allocations. On a Zaius
400  it's slightly longer at about ~10ms for 10k allocs. However, the
401  difference there was due to the output being written to the UART.
402
403  This patch also bumps the log level to PR_NOTICE. PR_INFO messages are
404  suppressed at the default log level, which probably isn't something you
405  want considering we only dump the allocations when we run out of skiboot
406  heap space.
407- core/lock: fix timeout warning causing a deadlock false positive
408
409  If a lock waiter exceeds the warning timeout, it prints a message
410  while still registered as requesting the lock. Printing the message
411  can take locks, so if one is held when the owner of the original
412  lock tries to print a message, it will get a false positive deadlock
413  detection, which brings down the system.
414
415  This can easily be hit when there is a lot of HMI activity from a
416  KVM guest, where the timebase was not returned to host timebase
417  before calling the HMI handler.
418- hw/p8-i2c: Print the set error bits
419
420  This is purely to save me from having to look it up every time someone
421  gets an I2C error.
422- init: Fix starting stripped kernel
423
424  Currently if we try to run a raw/stripped binary kernel (ie. without
425  the elf header) we crash with: ::
426
427      [    0.008757768,5] INIT: Waiting for kernel...
428      [    0.008762937,5] INIT: platform wait for kernel load failed
429      [    0.008768171,5] INIT: Assuming kernel at 0x20000000
430      [    0.008779241,3] INIT: ELF header not found. Assuming raw binary.
431      [    0.017047348,5] INIT: Starting kernel at 0x0, fdt at 0x3044b230 14339 bytes
432      [    0.017054251,0] FATAL: Kernel is zeros, can't execute!
433      [    0.017059054,0] Assert fail: core/init.c:590:0
434      [    0.017065371,0] Aborting!
435
436  This is because we haven't set kernel_entry correctly in this path.
437  This fixes it.
438- cpu: Better output when waiting for a very long job
439
440  Instead of printing at the end if the job took more than 1s,
441  print in the loop every 30s along with a backtrace. This will
442  give us some output if the job is deadlocked.
443- lock: Fix interactions between lock dependency checker and stack checker
444
445  The lock dependency checker does a few nasty things that can cause
446  re-entrancy deadlocks in conjunction with the stack checker or
447  in fact other debug tests.
448
449  A lot of it revolves around taking a new lock (dl_lock) as part
450  of the locking process.
451
452  This tries to fix it by making sure we do not hit the stack
453  checker while holding dl_lock.
454
455  We achieve that in part by directly using the low-level __try_lock
456  and manually unlocking on the dl_lock, and making some functions
457  "nomcount".
458
459  In addition, we mark the dl_lock as being in the console path to
460  avoid deadlocks with the UART driver.
461
462  We move the enabling of the deadlock checker to a separate config
463  option from DEBUG_LOCKS as well, in case we chose to disable it
464  by default later on.
465- xscom-utils/adu_scoms.py: run 2to3 over it
466- clang: -Wno-error=ignored-attributes
467
468CI, testing, and utilities
469--------------------------
470
471Since v6.1-rc2:
472
473- opal-ci: Drop fedora27, add fedora29
474- ci: Bump Qemu version
475
476  This moves the qemu version to qemu-powernv-for-skiboot-7 which is based
477  on upstream's 3.1.0, and supports a Power9 machine.
478
479  It also includes a fix for the skiboot XSCOM errors: ::
480
481     XSCOM: read error gcid=0x0 pcb_addr=0x1020013 stat=0x0
482
483  There is no modelling of the xscom behaviour but the reads/writes
484  now succeed which is enough for skiboot to not error out.
485- test: Update qemu arguments to use bmc simulator
486
487  THe qemu skiboot platform as of 8340a9642bba ("plat/qemu: use the common
488  OpenPOWER routines to initialize") uses the common aspeed BMC setup
489  routines. This means a BT interface is always set up, and if the
490  corresponding Qemu model is not present the timeout is 30 seconds.
491
492  It looks like this every time an IPMI message is sent: ::
493
494     BT: seq 0x9e netfn 0x06 cmd 0x31: Maximum queue length exceeded
495     BT: seq 0x9d netfn 0x06 cmd 0x31: Removed from queue
496     BT: seq 0x9f netfn 0x06 cmd 0x31: Maximum queue length exceeded
497     BT: seq 0x9e netfn 0x06 cmd 0x31: Removed from queue
498     BT: seq 0xa0 netfn 0x06 cmd 0x31: Maximum queue length exceeded
499     BT: seq 0x9f netfn 0x06 cmd 0x31: Removed from queue
500
501  Avoid this by adding the bmc simulator model to the Qemu powernv
502  machine.
503- ci: Add opal-utils to Debian unstable
504
505  This puts a 'pflash' in the users PATH, allowing more test coverage of
506  ffspart.
507- ci: Drop P8 mambo from Debian unstable
508
509  Debian Unstable has removed OpenSSL 1.0.0 from the repository so mambo
510  no longer runs: ::
511
512      /opt/ibm/systemsim-p8/bin/systemsim-pegasus: error while loading shared
513      libraries: libcrypto.so.1.0.0: cannot open shared object file: No such
514      file or directory
515
516  By removing it from the container these tests will be automatically
517  skipped.
518
519  Tracked in https://github.com/open-power/op-build/issues/2519
520- ci: Add dtc dependencies for rawhide
521
522  Both F28 and Rawhide build their own dtc version. Rawhide was missing
523  the required build deps.
524- ci: Update Debian unstable packages
525
526  This syncs Debian unstable with Ubuntu 18.04 in order to get the clang
527  package. It also adds qemu to the Debian install, which makes sense
528  Debian also has 2.12.
529- ci: Use Ubuntu latest config for Debian unstable
530
531  Debian unstable has the same GCOV issue with 8.2 as Ubuntu latest so it
532  makes sense to share configurations there.
533- ci: Disable GCOV builds in ubuntu-latest
534
535  They are known to be broken with GCC 8.2:
536  https://github.com/open-power/skiboot/issues/206
537- ci: Update gcov comment in Fedora 28
538- plat/qemu: fix platform initialization when the BT device is not present
539
540  A QEMU PowerNV machine does not necessarily have a BT device. It needs
541  to be defined on the command line with : ::
542
543      -device ipmi-bmc-sim,id=bmc0 -device isa-ipmi-bt,bmc=bmc0,irq=10
544
545  When the QEMU platform is initialized by skiboot, we need to check
546  that such a device is present and if not, skip the AST initialization.
547
548Since v6.1-rc1:
549
550- travis: Coverity fixed their SSL cert
551- opal-ci: Use ubuntu:rolling for Ubuntu latest image
552- ffspart: Add test for eraseblock size
553- ffspart: Add toc test
554- hdata/test: workaround dtc bugs
555
556  In dtc v1.4.5 to at least v1.4.7 there have been a few bugs introduced
557  that change the layout of what's produced in the dts. In order to be
558  immune from them, we should use the (provided) dtdiff utility, but we
559  also need to run the dts we're diffing against through a dtb cycle in
560  order to ensure we get the same format as what the hdat_to_dt to dts
561  conversion will.
562
563  This fixes a bunch of unit test failures on the version of dtc shipped
564  with recent Linux distros such as Fedora 29.
565
566
567Mambo Platform
568^^^^^^^^^^^^^^
569
570- mambo: Merge PMEM_DISK and PMEM_VOLATILE code
571
572  PMEM_VOLATILE and PMEM_DISK can't be used together and are basically
573  copies of the same code.
574
575  This merges the two and allows them used together.  Same API is kept.
576- hw/chiptod: test QUIRK_NO_CHIPTOD in opal_resync_timebase
577
578  This allows some test coverage of deep stop states in Linux with
579  Mambo.
580- core/mem_region: mambo reserve kernel payload areas
581
582  Mambo image payloads get overwritten by the OS and by
583  fast reboot memory clearing because they have no region
584  defined. Add them, which allows fast reboot to work.
585
586Qemu platform
587^^^^^^^^^^^^^
588
589Since v6.2-rc2:
590- plat/qemu: use the common OpenPOWER routines to initialize
591
592  Back in 2016, we did not have a large support of the PowerNV devices
593  under QEMU and we were using our own custom ones. This has changed and
594  we can now use all the common init routines of the OpenPOWER
595  platforms.
596
597Since v6.1:
598
599- nx: Don't abort on missing NX when using a QEMU machine
600
601  These don't have an NX node (and probably never will) as they
602  don't provide any coprocessor. However, the DARN instruction
603  works so this abort is unnecessary.
604
605POWER8 Platforms
606----------------
607- SBE-p8: Do all sbe timer update with xscom lock held
608
609  Without this, on some P8 platforms, we could (falsely) think the SBE timer
610  had stalled getting the dreaded "timer stuck" message.
611
612  The code was doing the mftb() to set the start of the timeout period while
613  *not* holding the lock, so the 1ms timeout started sometime when somebody
614  else had the xscom lock.
615
616  The simple solution is to just do the whole routine holding the xscom lock,
617  so do it that way.
618
619Vesnin Platform
620^^^^^^^^^^^^^^^
621- platforms/astbmc/vesnin: Send list of PCI devices to BMC through IPMI
622
623  Implements sending a list of installed PCI devices through IPMI protocol.
624  Each PCI device description is sent as a standalone IPMI message.
625  A list of devices can be gathered from separate messages using the
626  session identifier. The session Id is an incremental counter that is
627  updated at the start of synchronization session.
628
629
630POWER9 Platforms
631----------------
632
633- STOP API: API conditionally supports 255 SCOM restore entries for each quad.
634- hdata/i2c: Skip unknown device type
635
636  Do not add unknown I2C devices to device tree.
637- hdata/i2c: Add whitelisting for Host I2C devices
638
639  Many of the devices that we get information about through HDAT are for
640  use by firmware rather than the host operating system. This patch adds
641  a boolean flag to hdat_i2c_info structure that indicates whether devices
642  with a given purpose should be reserved for use inside of OPAL (or some
643  other firmware component, such as the OCC).
644- hdata/iohub: Fix Cumulus Hub ID number
645- opal/hmi: Wakeup the cpu before reading core_fir
646
647  When stop state 5 is enabled, reading the core_fir during an HMI can
648  result in a xscom read error with xscom_read() returning an
649  OPAL_XSCOM_PARTIAL_GOOD error code and core_fir value of all FFs. At
650  present this return error code is not handled in decode_core_fir()
651  hence the invalid core_fir value is sent to the kernel where it
652  interprets it as a FATAL hmi causing a system check-stop.
653
654  This can be prevented by forcing the core to wake-up using before
655  reading the core_fir. Hence this patch wraps the call to
656  read_core_fir() within calls to dctl_set_special_wakeup() and
657  dctl_clear_special_wakeup().
658- xive: Disable block tracker
659
660  Due to some HW errata, the block tracking facility (performance optimisation
661  for large systems) should be disabled on Nimbus chips. Disable it unconditionally
662  for now.
663- opal/hmi: Ignore debug trigger inject core FIR.
664
665  Core FIR[60] is a side effect of the work around for the CI Vector Load
666  issue in DD2.1. Usually this gets delivered as HMI with HMER[17] where
667  Linux already ignores it. But it looks like in some cases we may happen
668  to see CORE_FIR[60] while we are already in Malfunction Alert HMI
669  (HMER[0]) due to other reasons e.g. CAPI recovery or NPU xstop. If that
670  happens then just ignore it instead of crashing kernel as not recoverable.
671- hdata: Make sure reserved node name starts with "ibm, "
672
673  HDAT does not provide consistent label format for reserved memory label.
674  Few starts with "ibm," while few other starts with component name.
675- hdata: Fix dtc warnings
676
677  Fix dtc warnings related to mcbist node. ::
678
679    Warning (reg_format): "reg" property in /xscom@623fc00000000/mcbist@1 has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1)
680    Warning (reg_format): "reg" property in /xscom@623fc00000000/mcbist@2 has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1)
681    Warning (reg_format): "reg" property in /xscom@603fc00000000/mcbist@1 has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1)
682    Warning (reg_format): "reg" property in /xscom@603fc00000000/mcbist@2 has invalid length (4 bytes) (#address-cells == 1, #size-cells == 1)
683
684  Ideally we should add proper xscom range here... but we are not getting that
685  information in HDAT today. Lets fix warning until we get proper data in HDAT.
686
687PHB4
688^^^^
689
690- phb4: Generate checkstop on AIB ECC corr/uncorr for DD2.0 parts
691
692  On DD2.0 parts, PCIe ECC protection is not warranted in the response
693  data path. Thus, for these parts, we need to flag any ECC errors
694  detected from the adjacent AIB RX Data path so the part can be
695  replaced.
696
697  This patch configures the FIRs so that we escalate these AIB ECC
698  errors to a checkstop so the parts can be replaced.
699- phb4: Reset pfir and nfir if new errors reported during ETU reset
700
701  During fast-reboot new PEC errors can be latched even after ETU-Reset
702  is asserted. This will result in values of variables nfir_cache and
703  pfir_cache to be out of sync.
704
705  During step-2 of CRESET nfir_cache and pfir_cache values are used to
706  bring the PHB out of reset state. However if these variables are out
707  as noted above of date the nfir/pfir registers are never reset
708  completely and ETU still remains frozen.
709
710  Hence this patch updates step-2 of phb4_creset to re-read the values of
711  nfir/pfir registers to check if any new errors were reported after
712  ETU-reset was asserted, report these new errors and reset the
713  nfir/pfir registers. This should bring the ETU out of reset
714  successfully.
715- phb4: Disable nodal scoped DMA accesses when PB pump mode is enabled
716
717  By default when a PCIe device issues a read request via the PHB it is first
718  issued with nodal scope. When accessing GPU memory the NPU does not know at the
719  time of response if the requested memory page is off node or not. Therefore
720  every read of GPU memory by a PHB is retried with larger scope which introduces
721  bandwidth and latency issues.
722
723  On smaller boxes which have pump mode enabled nodal and group scoped reads are
724  treated the same and both types of request are broadcast to one chip. Therefore
725  we can avoid the retry by disabling nodal scope on the PHB for these boxes. On
726  larger boxes nodal (single chip) and group (multiple chip) scoped reads are
727  treated differently. Therefore we avoid disabling nodal scope on large boxes
728  which have pump mode disabled to avoid all PHB requests being broadcast to
729  multiple chips.
730- phb4/capp: Only reset FIR bits that cause capp machine check
731
732  During CAPP recovery do_capp_recovery_scoms() will reset the CAPP Fir
733  register just after CAPP recovery is completed. This has an
734  unintentional side effect of preventing PRD from analyzing and
735  reporting this error. If PRD tries to read the CAPP FIR after opal has
736  already reset it, then it logs a critical error complaining "No active
737  error bits found".
738
739  To prevent this from happening we update do_capp_recovery_scoms() to
740  only reset fir bits that cause CAPP machine check (local xstop). This
741  is done by reading the CAPP Fir Action0/1 & Mask registers and
742  generating a mask which is then written on CAPP_FIR_CLEAR register.
743
744- phb4: Check for RX errors after link training
745
746  Some PHB4 PHYs can get stuck in a bad state where they are constantly
747  retraining the link. This happens transparently to skiboot and Linux
748  but will causes PCIe to be slow. Resetting the PHB4 clears the
749  problem.
750
751  We can detect this case by looking at the RX errors count where we
752  check for link stability. This patch does this by modifying the link
753  optimal code to check for RX errors. If errors are occurring we
754  retrain the link irrespective of the chip rev or card.
755
756  Normally when this problem occurs, the RX error count is maxed out at
757  255. When there is no problem, the count is 0. We chose 8 as the max
758  rx errors value to give us some margin for a few errors. There is also
759  a knob that can be used to set the error threshold for when we should
760  retrain the link. ie ::
761
762      nvram -p ibm,skiboot --update-config phb-rx-err-max=8
763
764- hw/phb4: Add a helper to dump the PELT-V
765
766  The "Partitionable Endpoint Lookup Table (Vector)" is used by the PHB
767  when processing EEH events. The PELT-V defines which PEs should be
768  additionally frozen in the event of an error being flagged on a
769  given PE. Knowing the state of the PELT-V is sometimes useful for
770  debugging PHB issues so this patch adds a helper to dump it.
771
772- hw/phb4: Print the PEs in the EEH dump in hex
773
774  Linux always displays the PE number in hexidecimal while skiboot
775  displays the PEST index (PE number) in decimal. This makes correlating
776  errors between Skiboot and Linux more annoying than it should be so
777  this patch makes Skiboot print the PEST number in hex.
778
779- phb4: Reallocate PEC2 DMA-Read engines to improve GPU-Direct bandwidth
780
781  We reallocate additional 16/8 DMA-Read engines allocated to stack0/1
782  on PEC2 respectively. This is needed to improve bandwidth available to
783  the Mellanox CX5 adapter when trying to read GPU memory (GPU-Direct).
784
785  If kernel cxl driver indicates a request to allocate maximum possible
786  DMA read engines when calling enable_capi_mode() and card is attached
787  to PEC2/stack0 slot then we assume its a Mellanox CX5 adapter. We then
788  allocate additional 16/8 extra DMA read engines to stack0 and stack1
789  respectively on PEC2. This is done by populating the
790  XPEC_PCI_PRDSTKOVR and XPEC_NEST_READ_STACK_OVERRIDE as suggested by
791  the h/w team.
792- phb4: Enable PHB MMIO-0/1 Bars only when mmio window exists
793
794  Presently phb4_probe_stack() will always enable PHB MMIO0/1 windows
795  even if they doesn't exist in phy_map. Hence we do some minor shuffling
796  in the phb4_probe_stack() so that MMIO-0/1 Bars are only enabled if
797  there corresponding MMIO window exists in the phy_map. In case phy_map
798  for an mmio window is '0' we set the corresponding BAR register to
799  '0'.
800- hw/phb4: Use local_alloc for phb4 structures
801
802  Struct phb4 is fairly heavyweight at 283664 bytes. On systems with
803  6x PHBs per socket this results in using 3.2MB of heap space the PHB
804  structures alone. This is a fairly large chunk of our 12MB heap and
805  on systems with particularly large PCIe topologies, or additional
806  PHBs we can fail to boot because we cannot allocate space for the
807  FDT blob.
808
809  This patch switches to using local_alloc() for the PHB structures
810  so they don't consume too large a portion of our 12MB heap space.
811- phb4: Fix typo in disable lane eq code
812
813  In this commit ::
814
815      commit 737c0ba3d72b8aab05a765a9fc111a48faac0f75
816      Author: Michael Neuling <mikey@neuling.org>
817      Date:   Thu Feb 22 10:52:18 2018 +1100
818      phb4: Disable lane eq when retrying some nvidia GEN3 devices
819
820  We made a typo and set PH2 twice. This fixes it.
821
822  It worked previously as if only phase 2 (PH2) is set it, skips phase 2
823  and phase 3 (PH3).
824- phb4: Don't probe a PHB if its garded
825
826  Presently phb4_probe_stack() causes an exception while trying to probe
827  a PHB if its garded. This causes skiboot to go into a reboot loop with
828  following exception log: ::
829
830     ***********************************************
831     Fatal MCE at 000000003006ecd4   .probe_phb4+0x570
832     CFAR : 00000000300b98a0
833     <snip>
834     Aborting!
835    CPU 0018 Backtrace:
836     S: 0000000031cc37e0 R: 000000003001a51c   ._abort+0x4c
837     S: 0000000031cc3860 R: 0000000030028170   .exception_entry+0x180
838     S: 0000000031cc3a40 R: 0000000000001f10 *
839     S: 0000000031cc3c20 R: 000000003006ecb0   .probe_phb4+0x54c
840     S: 0000000031cc3e30 R: 0000000030014ca4   .main_cpu_entry+0x5b0
841     S: 0000000031cc3f00 R: 0000000030002700   boot_entry+0x1b8
842
843  This is caused as phb4_probe_stack() will ignore all xscom read/write
844  errors to enable PHB Bars and then tries to perform an mmio to read
845  PHB Version registers that cause the fatal MCE.
846
847  We fix this by ignoring the PHB probe if the first xscom_write() to
848  populate the PHB Bar register fails, which indicates that there is
849  something wrong with the PHB.
850- phb4: Workaround PHB errata with CFG write UR/CA errors
851
852  If the PHB encounters a UR or CA status on a CFG write, it will
853  incorrectly freeze the wrong PE. Instead of using the PE# specified
854  in the CONFIG_ADDRESS register, it will use the PE# of whatever
855  MMIO occurred last.
856
857  Work around this disabling freeze on such errors
858- phb4: Handle allocation errors in phb4_eeh_dump_regs()
859
860  If the zalloc fails (and it can be a rather large allocation),
861  we will overwite memory at 0 instead of failing.
862- phb4: Don't try to access non-existent PEST entries
863
864  In a POWER9 chip, some PHB4s have 256 PEs, some have 512.
865
866  Currently, the diagnostics code retrieves 512 unconditionally,
867  which is wrong and causes us to incorrectly report bogus values
868  for the "high" PEs on the small PHBs.
869
870  Use the actual number of implemented PEs instead
871
872CAPI2
873^^^^^
874
875- phb4/capp: Use link width to allocate STQ engines to CAPP
876
877  Update phb4_init_capp_regs() to allocates STQ Engines to CAPP/PEC2
878  based on link width instead of always assuming it to x8.
879
880  Also re-factor the function slightly to evaluate the link-width only
881  once and cache it so that it can also be used to allocate DMA read
882  engines.
883- phb4/capp: Update DMA read engines set in APC_FSM_READ_MASK based on link-width
884
885  Commit 47c09cdfe7a3("phb4/capp: Calculate STQ/DMA read engines based
886  on link-width for PEC") update the CAPP init sequence by calculating
887  the needed STQ/DMA-read engines based on link width and populating it
888  in XPEC_NEST_CAPP_CNTL register. This however needs to be synchronized
889  with the value set in CAPP APC FSM Read Machine Mask Register.
890
891  Hence this patch update phb4_init_capp_regs() to calculate the link
892  width of the stack on PEC2 and populate the same values as previously
893  populated in PEC CAPP_CNTL register.
894- capp: Fix the capp recovery timeout comparison
895
896  The current capp recovery timeout control loop in
897  do_capp_recovery_scoms() uses a wrong comparison for return value of
898  tb_compare(). This may cause do_capp_recovery_scoms() to report an
899  timeout earlier than the 168ms stipulated time.
900
901  The patch fixes this by updating the loop timeout control branch in
902  do_capp_recovery_scoms() to use the correct enum tb_cmpval.
903- phb4: Disable 32-bit MSI in capi mode
904
905  If a capi device does a DMA write targeting an address lower than 4GB,
906  it does so through a 32-bit operation, per the PCI spec. In capi mode,
907  the first TVE entry is configured in bypass mode, so the address is
908  valid. But with any (bad) luck, the address could be 0xFFFFxxxx, thus
909  looking like a 32-bit MSI.
910
911  We currently enable both 32-bit and 64-bit MSIs, so the PHB will
912  interpret the DMA write as a MSI, which very likely results in an EEH
913  (MSI with a bad payload size).
914
915  We can fix it by disabling 32-bit MSI when switching the PHB to capi
916  mode. Capi devices are 64-bit.
917
918NVLINK2
919^^^^^^^
920
921Since v6.2-rc2:
922- Add purging CPU L2 and L3 caches into NPU hreset.
923
924  If a GPU is passed through to a guest and the guest unexpectedly terminates,
925  there can be cache lines in CPUs that belong to the GPU. So purge the caches
926  as part of the reset sequence. L1 is write through, so doesn't need to be purged.
927
928  The sequence to purge the L2 and L3 caches from the hw team:
929
930  L2 purge:
931  1. initiate purge ::
932
933      putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TYPE L2CAC_FLUSH -all
934      putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER ON -all
935
936  2. check this is off in all caches to know purge completed ::
937
938      getspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_REG_BUSY -all
939
940  3. ::
941
942      putspy pu.ex EXP.L2.L2MISC.L2CERRS.PRD_PURGE_CMD_TRIGGER OFF -all
943
944  L3 purge:
945  1. Start the purge: ::
946
947      putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_TTYPE FULL_PURGE -all
948      putspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ ON -all
949
950  2. Ensure that the purge has completed by checking the status bit: ::
951
952      getspy pu.ex EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ -all
953
954     You should see it say OFF if it's done: ::
955
956       p9n.ex k0:n0:s0:p00:c0
957       EXP.L3.L3_MISC.L3CERRS.L3_PRD_PURGE_REQ
958       OFF
959
960- npu2: Return sensible PCI error when not frozen
961
962  The current kernel calls OPAL_PCI_EEH_FREEZE_STATUS with an uninitialized
963  @pci_error_type parameter and then analyzes it even if the OPAL call
964  returned OPAL_SUCCESS. This is results in unexpected EEH events and NPU
965  freezes.
966
967  This initializes @pci_error_type and @severity to known safe values.
968
969- npu2: Advertise correct TCE page size
970
971  The P9 NPU workbook says that only 4K/64K/16M/256M page size are supported
972  and in fact npu2_map_pe_dma_window() supports just these but in absence of
973  the "ibm,supported-tce-sizes" property Linux assumes the default P9 PHB4
974  page sizes - 4K/64K/2M/1G - so when Linux tries 2M/1G TCEs, we get lots of
975  "Unexpected TCE size" from npu2_tce_kill().
976
977  This advertises TCE page sizes so Linux could handle it correctly, i.e.
978  fall back to 4K/64K TCEs.
979
980Since v6.1:
981
982- npu2: Add support for relaxed-ordering mode
983
984  Some device drivers support out of order access to GPU memory. This does
985  not affect the CPU view of memory but it does affect the GPU view of
986  memory. It should only be enabled if the GPU driver has requested it.
987
988  Add OPAL APIs allowing the driver to query relaxed ordering state or
989  request it to be set for a device. Current hardware only allows relaxed
990  ordering to be enabled per PCIe root port. So the code here doesn't
991  enable relaxed ordering until it has been explicitly requested for every
992  device on the port.
993- Add the other 7 ATSD registers to the device tree.
994- npu2/hw-procedures: Don't open code NPU2_NTL_MISC_CFG2_BRICK_ENABLE
995
996  Name this bit properly. There's a lot more cleanup like this to be done,
997  but I'm catching this one now as part of some related changes.
998- npu2/hw-procedures: Enable parity and credit overflow checks
999
1000  Enable these error checking features by setting the appropriate bits in
1001  our one-off initialization of each "NTL Misc Config 2" register.
1002
1003  The exception is NDL RX parity checking, which should be disabled during
1004  the link training procedures.
1005- npu2: Use correct kill type for TCE invalidation
1006
1007  kill_type is enum of OPAL_PCI_TCE_KILL_PAGES, OPAL_PCI_TCE_KILL_PE,
1008  OPAL_PCI_TCE_KILL_ALL and phb4_tce_kill() gets it right but
1009  npu2_tce_kill() uses OPAL_PCI_TCE_KILL which is an OPAL API token.
1010
1011  This fixes an obvious mistype.
1012
1013OpenCAPI
1014^^^^^^^^
1015
1016Since v6.2-rc1:
1017
1018- npu2-opencapi: Log extra information on link training failure
1019- npu2-opencapi: Detect if link trained in degraded mode
1020
1021Since v6.1:
1022
1023- Support OpenCAPI on Witherspoon platform
1024- npu2-opencapi: Enable presence detection on ZZ
1025
1026  Presence detection for opencapi adapters was broken for ZZ planars v3
1027  and below. All ZZ systems currently used in the lab have had their
1028  planar upgraded, so we can now remove the override we had to force
1029  presence and activate presence detection. Which should improve boot
1030  time.
1031
1032  Considering the state of opal support on ZZ, this is really only for
1033  lab usage on BML. The opencapi enablement team has okay'd the
1034  change. In the unlikely case somebody tries opencapi on an old ZZ, the
1035  presence detection through i2c will show that no adapter is present
1036  and skiboot won't try to access or train the link.
1037- npu2-opencapi: Don't send commands to NPU when link is down
1038
1039  Even if an opencapi link is down, we currently always try to issue a
1040  config read operation when probing for PCI devices, because of the
1041  default scan map used for an opencapi PHB. The config operation fails,
1042  as expected, but it can also raise a FIR bit and trigger an HMI.
1043
1044  For opencapi, there's no root device like for a "normal" PCI PHB, so
1045  there's no reason to do the config operation. To fix it, we keep the
1046  scan map blank by default, and only add a device once the link is
1047  trained.
1048- opal/hmi: Catch NPU2 HMIs for opencapi
1049
1050  HMIs for NPU2 are filtered with the 'compatible' string of the PHB, so
1051  add opencapi to the mix.
1052- occ: Wait if OCC GPU presence status not immediately available
1053
1054  It takes a few seconds for the OCC to set everything up in order to read
1055  GPU presence. At present, we try to kick off OCC initialisation as early as
1056  possible to maximise the time it has to read GPU presence.
1057
1058  Unfortunately sometimes that's not enough, so add a loop in
1059  occ_get_gpu_presence() so that on the first time we try to get GPU presence
1060  we keep trying for up to 2 seconds. Experimentally this seems to be
1061  adequate.
1062- hw/npu2-hw-procedures: Enable RX auto recal on OpenCAPI links
1063
1064  The RX_RC_ENABLE_AUTO_RECAL flag is required on OpenCAPI but not NVLink.
1065
1066  Traditionally, Hostboot sets this value according to the machine type.
1067  However, now that Witherspoon supports both NVLink and OpenCAPI, it can't
1068  tell whether or not a link is OpenCAPI.
1069
1070  So instead, set it in skiboot, where it will only be triggered after we've
1071  done device detection and found an OpenCAPI device.
1072- hw/npu2-opencapi: Fix setting of supported OpenCAPI templates
1073
1074  In opal_npu_tl_set(), we made a typo that means the OPAL_NPU_TL_SET call
1075  may not clear the enable bits for templates that were previously enabled
1076  but are now disabled.
1077
1078  Fix the typo so we clear NPU2_OTL_CONFIG1_TX_TEMP2_EN as well as
1079  TEMP{1,3}_EN.
1080
1081Barreleye G2 and Zaius platforms
1082^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1083
1084- zaius: Add a slot table
1085- zaius: Add slots for the Barreleye G2 HDD rack
1086
1087  The Barreleye G2 is distinct from the Zaius in that it features a 24
1088  Bay NVMe/SATA HDD rack. To provide meaningful slot names for each NVMe
1089  device we need to define a slot table for the NVMe capable HDD bays.
1090
1091  Unfortunately this isn't straightforward because the PCIe path to the
1092  NVMe devices isn't fixed. The PCIe topology is something like:
1093  P9 -> HBA card -> 9797 switch -> 20x NVMe HDD slots
1094
1095  The 9797 switch is partitioned into two (or four) virtual switches which
1096  allow multiple HBA cards to be used (e.g. one per socket). As a result
1097  the exact BDFN of the ports will vary depending on how the system is
1098  configured.
1099
1100  That said, the virtual switch configuration of the 9797 does not change
1101  the device and function numbers of the switch downports. This means that
1102  we can define a single slot table that maps switch ports to the NVMe bay
1103  names.
1104
1105  Unfortunately we still need to guess which bus to use this table on, so
1106  we assume that any switch downport we find with the PEX9797 VDID is part
1107  of the 9797 that supports the HDD rack.
1108
1109FSP based platforms (firenze and ZZ)
1110^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1111
1112Since v6.2-rc1:
1113- platform/firenze: Fix branch-to-null crash
1114
1115  When the bus alloc and free methods were removed we missed a case in the
1116  Firenze platform slot code that relied on the the bus-specific method to
1117  the bus pointer in the request structure. This results in a
1118  branch-to-null during boot and a crash. This patch fixes it by
1119  initialising it manually here.
1120
1121
1122Since v6.1:
1123
1124- phb4/capp: Update the expected Eye-catcher for CAPP ucode lid
1125
1126  Currently on a FSP based P9 system load_capp_code() expects CAPP ucode
1127  lid header to have eye-catcher magic of 'CAPPPSLL'. However skiboot
1128  currently supports CAPP ucode only lids that have a eye-catcher magic
1129  of 'CAPPLIDH'. This prevents skiboot from loading the ucode with this
1130  error message: ::
1131
1132    CAPP: ucode header invalid
1133
1134  We fix this issue by updating load_capp_ucode() to use the eye-catcher
1135  value of 'CAPPLIDH' instead of 'CAPPPSLL'.
1136
1137- FSP: Improve Reset/Reload log message
1138
1139  Below message is confusing. Lets make it clear.
1140
1141  FSP sends "R/R complete notification" whenever there is a dump. We use `flag`
1142  to identify whether its its R/R completion -OR- just new dump notification. ::
1143
1144    [  483.406351956,6] FSP: SP says Reset/Reload complete
1145    [  483.406354278,5] DUMP: FipS dump available. ID = 0x1a00001f [size: 6367640 bytes]
1146    [  483.406355968,7]   A Reset/Reload was NOT done
1147
1148Witherspoon platform
1149^^^^^^^^^^^^^^^^^^^^
1150
1151- platforms/astbmc/witherspoon: Implement OpenCAPI support
1152
1153  OpenCAPI on Witherspoon is slightly more involved than on Zaius and ZZ, due
1154  to the OpenCAPI links using the SXM2 connectors that are used for NVLink
1155  GPUs.
1156
1157  This patch adds the regular OpenCAPI platform information, and also a
1158  Witherspoon-specific presence detection callback that uses the previously
1159  added OCC GPU presence detection to figure out the device types plugged
1160  into each SXM2 socket.
1161
1162  The SXM2 connectors are capable of carrying 2 OpenCAPI links, and future
1163  OpenCAPI devices are expected to make use of this. However, we don't yet
1164  support ganged links and the various implications that has for handling
1165  things like device reset, so for now, we only enable 1 brick per device.
1166
1167Contributors
1168------------
1169
1170The v6.2 release of skiboot contains 240 changesets from 28 developers, working for 2 employers.
1171A total of 9146 lines were added, and 2610 removed (delta 6536).
1172
1173Developers with the most changesets
1174^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1175=========================== == =======
1176Developer                   #  %
1177=========================== == =======
1178Stewart Smith               58 (24.2%)
1179Andrew Jeffery              30 (12.5%)
1180Oliver O'Halloran           27 (11.2%)
1181Joel Stanley                17 (7.1%)
1182Vaibhav Jain                14 (5.8%)
1183Benjamin Herrenschmidt      12 (5.0%)
1184Frederic Barrat             11 (4.6%)
1185Nicholas Piggin             11 (4.6%)
1186Andrew Donnellan            10 (4.2%)
1187Vasant Hegde                 9 (3.8%)
1188Reza Arbab                   8 (3.3%)
1189Samuel Mendoza-Jonas         5 (2.1%)
1190Alexey Kardashevskiy         4 (1.7%)
1191Michael Neuling              4 (1.7%)
1192Prem Shanker Jha             3 (1.2%)
1193Cédric Le Goater             2 (0.8%)
1194Rashmica Gupta               2 (0.8%)
1195Mahesh J Salgaonkar          2 (0.8%)
1196Alistair Popple              2 (0.8%)
1197Shilpasri G Bhat             1 (0.4%)
1198Adriana Kobylak              1 (0.4%)
1199Madhavan Srinivasan          1 (0.4%)
1200Artem Senichev               1 (0.4%)
1201Russell Currey               1 (0.4%)
1202Vaidyanathan Srinivasan      1 (0.4%)
1203Cyril Bur                    1 (0.4%)
1204Jeremy Kerr                  1 (0.4%)
1205Michael Ellerman             1 (0.4%)
1206=========================== == =======
1207
1208
1209Developers with the most changed lines
1210^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1211
1212========================= ==== =======
1213Developer                    # %
1214========================= ==== =======
1215Andrew Jeffery            2861 (29.3%)
1216Stewart Smith             1891 (19.4%)
1217Prem Shanker Jha          1046 (10.7%)
1218Andrew Donnellan           799 (8.2%)
1219Oliver O'Halloran          649 (6.6%)
1220Reza Arbab                 441 (4.5%)
1221Nicholas Piggin            412 (4.2%)
1222Vaibhav Jain               278 (2.8%)
1223Cédric Le Goater           250 (2.6%)
1224Frederic Barrat            168 (1.7%)
1225Rashmica Gupta             161 (1.6%)
1226Joel Stanley               152 (1.6%)
1227Benjamin Herrenschmidt     138 (1.4%)
1228Artem Senichev             101 (1.0%)
1229Samuel Mendoza-Jonas        83 (0.9%)
1230Michael Neuling             82 (0.8%)
1231Michael Ellerman            61 (0.6%)
1232Mahesh J Salgaonkar         50 (0.5%)
1233Vasant Hegde                44 (0.5%)
1234Alexey Kardashevskiy        32 (0.3%)
1235Adriana Kobylak             29 (0.3%)
1236Alistair Popple             18 (0.2%)
1237Shilpasri G Bhat             4 (0.0%)
1238Madhavan Srinivasan          3 (0.0%)
1239Cyril Bur                    3 (0.0%)
1240Jeremy Kerr                  3 (0.0%)
1241Russell Currey               2 (0.0%)
1242Vaidyanathan Srinivasan      2 (0.0%)
1243========================= ==== =======
1244
1245
1246Developers with the most lines removed
1247^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1248
1249========================= ==== =======
1250Developer                    # %
1251========================= ==== =======
1252Cédric Le Goater           205 (7.9%)
1253Samuel Mendoza-Jonas         8 (0.3%)
1254Shilpasri G Bhat             1 (0.0%)
1255========================= ==== =======
1256
1257Developers with the most signoffs
1258^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1259
1260========================= ==== =======
1261Developer                    # %
1262========================= ==== =======
1263Stewart Smith              182 (95.3%)
1264Alistair Popple              3 (1.6%)
1265Akshay Adiga                 2 (1.0%)
1266Christophe Lombard           1 (0.5%)
1267Ryan Grimm                   1 (0.5%)
1268Michael Neuling              1 (0.5%)
1269Mahesh J Salgaonkar          1 (0.5%)
1270Total                      191
1271========================= ==== =======
1272
1273Developers with the most reviews
1274^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1275
1276================================ ==== =======
1277Developer                           # %
1278================================ ==== =======
1279Andrew Donnellan                   15 (19.7%)
1280Frederic Barrat                    11 (14.5%)
1281Oliver O'Halloran                   9 (11.8%)
1282Alistair Popple                     8 (10.5%)
1283Vasant Hegde                        5 (6.6%)
1284Samuel Mendoza-Jonas                4 (5.3%)
1285Christophe Lombard                  3 (3.9%)
1286Gregory S. Still                    3 (3.9%)
1287Mahesh J Salgaonkar                 2 (2.6%)
1288RANGANATHPRASAD G. BRAHMASAMUDRA    2 (2.6%)
1289Jennifer A. Stofer                  2 (2.6%)
1290AMIT J. TENDOLKAR                   2 (2.6%)
1291Christian R. Geddes                 2 (2.6%)
1292Cédric Le Goater                    1 (1.3%)
1293Shilpasri G Bhat                    1 (1.3%)
1294Daniel M. Crowell                   1 (1.3%)
1295Alexey Kardashevskiy                1 (1.3%)
1296Joel Stanley                        1 (1.3%)
1297Vaibhav Jain                        1 (1.3%)
1298Nicholas Piggin                     1 (1.3%)
1299Andrew Jeffery                      1 (1.3%)
1300Total                              76
1301================================ ==== =======
1302
1303Developers with the most test credits
1304^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1305
1306========================= ==== =======
1307Developer                    # %
1308========================= ==== =======
1309Jenkins Server               3 (12.0%)
1310Cronus HW CI                 3 (12.0%)
1311Hostboot CI                  3 (12.0%)
1312Jenkins OP Build CI          3 (12.0%)
1313FSP CI Jenkins               3 (12.0%)
1314Jenkins OP HW                3 (12.0%)
1315Vasant Hegde                 2 (8.0%)
1316Andrew Donnellan             1 (4.0%)
1317Oliver O'Halloran            1 (4.0%)
1318Andrew Jeffery               1 (4.0%)
1319HWSV CI                      1 (4.0%)
1320Artem Senichev               1 (4.0%)
1321Total                       25
1322========================= ==== =======
1323
1324Developers who gave the most tested-by credits
1325^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1326
1327========================= ==== =======
1328Developer                    # %
1329========================= ==== =======
1330Prem Shanker Jha            19 (76.0%)
1331Frederic Barrat              2 (8.0%)
1332Andrew Jeffery               1 (4.0%)
1333Vaibhav Jain                 1 (4.0%)
1334Stewart Smith                1 (4.0%)
1335Benjamin Herrenschmidt       1 (4.0%)
1336========================= ==== =======
1337
1338Developers with the most report credits
1339^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1340
1341========================= ==== =======
1342Developer                    # %
1343========================= ==== =======
1344Vasant Hegde                 2 (25.0%)
1345Frederic Barrat              1 (12.5%)
1346Dawn Sylvia                  1 (12.5%)
1347Meng Li                      1 (12.5%)
1348Tyler Seredynski             1 (12.5%)
1349Pridhiviraj Paidipeddi       1 (12.5%)
1350Stephanie Swanson            1 (12.5%)
1351========================= ==== =======
1352
1353Developers who gave the most report credits
1354^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1355
1356========================= ==== =======
1357Developer                    # %
1358========================= ==== =======
1359Stewart Smith                2 (25.0%)
1360Vaidyanathan Srinivasan      2 (25.0%)
1361Vasant Hegde                 1 (12.5%)
1362Vaibhav Jain                 1 (12.5%)
1363Andrew Donnellan             1 (12.5%)
1364Michael Neuling              1 (12.5%)
1365========================= ==== =======
1366
1367Employers with the most hackers
1368^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1369
1370========================= ==== =======
1371Developer                    # %
1372========================= ==== =======
1373IBM                         27 (96.4%)
1374YADRO                        1 (3.6%)
1375========================= ==== =======
1376