1.. _skiboot-5.7-rc1: 2 3skiboot-5.7-rc1 4=============== 5 6skiboot v5.7-rc1 was released on Monday July 3rd 2017. It is the first 7release candidate of skiboot 5.7, which will become the new stable release 8of skiboot following the 5.6 release, first released 24th May 2017. 9 10skiboot v5.7-rc1 contains all bug fixes as of :ref:`skiboot-5.4.6` 11and :ref:`skiboot-5.1.19` (the currently maintained stable releases). We 12do not currently expect to do any 5.6.x stable releases. 13 14For how the skiboot stable releases work, see :ref:`stable-rules` for details. 15 16The current plan is to cut the final 5.7 by July 12th, with skiboot 5.7 17being for all POWER8 and POWER9 platforms in op-build v1.18 (Due July 12th). 18This is a short cycle as this release is mainly targetted towards POWER9 19bringup efforts. 20 21This is the second release using the new regular six week release cycle, 22similar to op-build, but slightly offset to allow for a short stabilisation 23period. Expected release dates and contents are tracked using GitHub milestone 24and issues: https://github.com/open-power/skiboot/milestones 25 26Over skiboot-5.6, we have the following changes: 27 28New Features 29------------ 30 31New features in this release for POWER9 systems: 32 33- In Memory Counters (IMC) (See :ref:`imc` for details) 34- phb4: Activate shared PCI slot on witherspoon (see :ref:`Shared Slot <shared-slot-5.7-rc1-rn>`) 35- phb4 capi (i.e. CAPI2): Enable capi mode for PHB4 (see :ref:`CAPI on PHB4 <capi2-5.7-rc1-rn>`) 36 37New feature for IBM FSP based systems: 38 39- fsp/tpo: Provide support for disabling TPO alarm 40 41 This patch adds support for disabling a preconfigured 42 Timed-Power-On(TPO) alarm on FSP based systems. Presently once a TPO alarm 43 is configured from the kernel it will be triggered even if its 44 subsequently disabled. 45 46 With this patch a TPO alarm can be disabled by passing 47 y_m_d==hr_min==0 to fsp_opal_tpo_write(). A branch is added to the 48 function to handle this case by sending FSP_CMD_TPO_DISABLE message to 49 the FSP instead of usual FSP_CMD_TPO_WRITE message. The kernel is 50 expected to call opal_tpo_write() with y_m_d==hr_min==0 to request 51 opal to disable TPO alarm. 52 53POWER9 54------ 55 56Development on POWER9 systems continues in earnest. 57 58This release includes the first support for POWER9 DD2 chips. Future releases 59will likely contain more bug fixes, this release has booted on real hardware. 60 61- hdata: Reserve Trace Areas 62 63 When hostboot is configured to setup in memory tracing it will reserve 64 some memory for use by the hardware tracing facility. We need to mark 65 these areas as off limits to the operating system and firmware. 66- hdata: Make out-of-range idata print at PR_DEBUG 67 68 Some fields just aren't populated on some systems. 69 70- hdata: Ignore unnamed memory reservations. 71 72 Hostboot should name any and all memory reservations that it provides. 73 Currently some hostboots export a broken reservation covering the first 74 256MB of memory and this causes the system to crash at boot due to an 75 invalid free because this overlaps with the static "ibm,os-reserve" 76 region (which covers the first 768MB of memory). 77 78 According to the hostboot team unnamed reservations are invalid and can 79 be ignored. 80 81- hdata: Check the Host I2C devices array version 82 83 Currently this is not populated on FSP machines which causes some 84 obnoxious errors to appear in the boot log. We also only want to 85 parse version 1 of this structure since future versions will completely 86 change the array item format. 87 88- Ensure P9 DD1 workarounds apply only to Nimbus 89 90 The workarounds for P9 DD1 are only needed for Nimbus. P9 Cumulus will 91 be DD1 but don't need these same workarounds. 92 93 This patch ensures the P9 DD1 workarounds only apply to Nimbus. It 94 also renames some things to make clear what's what. 95 96- cpu: Cleanup AMR and IAMR when re-initializing CPUs 97 98 There's a bug in current Linux kernels leaving crap in those registers 99 accross kexec and not sanitizing them on boot. This breaks kexec under 100 some circumstances (such as booting a hash kernel from a radix one 101 on P9 DD2.0). 102 103 The long term fix is in Linux, but this workaround is a reasonable 104 way of "sanitizing" those SPRs when Linux calls opal_reinit_cpus() 105 and shouldn't have adverse effects. 106 107 We could also use that same mechanism to cleanup other things as 108 well such as restoring some other SPRs to their default value in 109 the future. 110 111- Set POWER9 RPR SPR to 0x00000103070F1F3F. Same value as P8. 112 113 Without this, thread priorities inside a core don't work. 114 115- cpu: Support setting HID[RADIX] and set it by default on P9 116 117 This adds new opal_reinit_cpus() flags to setup radix or hash 118 mode in HID[8] on POWER9. 119 120 By default HID[8] will be set. On P9 DD1.0, Linux will change 121 it as needed. On P9 DD2.0 hash works in radix mode (radix is 122 really "dual" mode) so KVM won't break and existing kernels 123 will work. 124 125 Newer kernels built for hash will call this to clear the HID bit 126 and thus get the full size of the TLB as an optimization. 127 128- Add "cleanup_global_tlb" for P9 and later 129 130 Uses broadcast TLBIE's to cleanup the TLB on all cores and on 131 the nest MMU 132 133- xive: DD2.0 updates 134 135 Add support for StoreEOI, fix StoreEOI MMIO offset in ESB page, 136 and other cleanups 137 138- Update default TSCR value for P9 as recommended by HW folk. 139 140- xive: Fix initialisation of xive_cpu_state struct 141 142 When using XIVE emulation with DEBUG=1, we run into crashes in log_add() 143 due to the xive_cpu_state->log_pos being uninitialised (and thus, with 144 DEBUG enabled, initialised to the poison value of 0x99999999). 145 146OCC/Power Management 147^^^^^^^^^^^^^^^^^^^^ 148 149With this release, it's possible to boot POWER9 systems with the OCC 150enabled and change CPU frequencies. Doing so does require other firmware 151components to also support this (otherwise the frequency will not be set). 152 153- occ: Skip setting cores to nominal frequency in P9 154 155 In P9, once OCC is up, it is supposed to setup the cores to nominal 156 frequency. So skip this step in OPAL. 157- occ: Fix Pstate ordering for P9 158 159 In P9 the pstate values are positive. They are continuous set of 160 unsigned integers [0 to +N] where Pmax is 0 and Pmin is N. The 161 linear ordering of pstates for P9 has changed compared to P8. 162 P8 has neagtive pstate values advertised as [0 to -N] where Pmax 163 is 0 and Pmin is -N. This patch adds helper routines to abstract 164 pstate comparison with pmax and adds sanity pstate limit checks. 165 This patch also fixes pstate arithmetic by using labs(). 166- p8-i2c: occ: Add support for OCC to use I2C engines 167 168 This patch adds support to share the I2C engines with host and OCC. 169 OCC uses I2C engines to read DIMM temperatures and to communicate with 170 GPU. OCC Flag register is used for locking between host and OCC. Host 171 requests for the bus by setting a bit in OCC Flag register. OCC sends 172 an interrupt to indicate the change in ownership. 173 174opal-prd/PRD 175^^^^^^^^^^^^ 176 177- opal-prd: Handle SBE passthrough message passing 178 179 This patch adds support to send SBE pass through command to HBRT. 180- SBE: Add passthrough command support 181 182 SBE sends passthrough command. We have to capture this interrupt and 183 send event to HBRT via opal-prd (user space daemon). 184- opal-prd: hook up reset_pm_complex 185 186 This change provides the facility to invoke HBRT's reset_pm_complex, in 187 the same manner is done with process_occ_reset previously. 188 189 We add a control command for `opal-prd pm-complex reset`, which is just 190 an alias for occ_reset at this stage. 191 192- prd: Implement firmware side of opaque PRD channel 193 194 This change introduces the firmware side of the opaque HBRT <--> OPAL 195 message channel. We define a base message format to be shared with HBRT 196 (in include/prd-fw-msg.h), and allow firmware requests and responses to 197 be sent over this channel. 198 199 We don't currently have any notifications defined, so have nothing to do 200 for firmware_notify() at this stage. 201 202- opal-prd: Add firmware_request & firmware_notify implementations 203 204 This change adds the implementation of firmware_request() and 205 firmware_notify(). To do this, we need to add a message queue, so that 206 we can properly handle out-of-order messages coming from firmware. 207 208- opal-prd: Add support for variable-sized messages 209 210 With the introductuion of the opaque firmware channel, we want to 211 support variable-sized messages. Rather than expecting to read an 212 entire 'struct opal_prd_msg' in one read() call, we can split this 213 over mutiple reads, potentially expanding our message buffer. 214 215- opal-prd: Sync hostboot interfaces with HBRT 216 217 This change adds new callbacks defined for p9, and the base thunks for 218 the added calls. 219 220- opal-prd: interpret log level prefixes from HBRT 221 222 Interpret the (optional) \*_MRK log prefixes on HBRT messages, and set 223 the syslog log priority to suit. 224 225- opal-prd: Add occ reset to usage text 226- opal-prd: allow different chips for occ control actions 227 228 The `occ reset` and `occ error` actions can both take a chip id 229 argument, but we're currently just using zero. This change changes the 230 control message format to pass the chip ID from the control process to 231 the opal-prd daemon. 232 233 234PCI/PHB4 235^^^^^^^^ 236 237- phb4: Fix number of index bits in IODA tables 238 239 On PHB4 the number of index bits in the IODA table address register 240 was bumped to 10 bits to accomodate for 1024 MSIs and 1024 TVEs (DD2). 241 242 However our macro only defined the field to be 9 bits, thus causing 243 "interesting" behaviours on some systems. 244 245- phb4: Harden init with bad PHBs 246 247 Currently if we read all 1's from the EEH or IRQ capabilities, we end 248 up train wrecking on some other random code (eg. an assert() in xive). 249 250 This hardens the PHB4 code to look for these bad reads and more 251 gracefully fails the init for that PHB alone. This allows the rest of 252 the system to boot and ignore those bad PHBs. 253 254- phb4 capi (i.e. CAPI2): Handle HMI events 255 256 Find the CAPP on the chip associated with the HMI event for PHB4. 257 The recovery mode (re-initialization of the capp, resume of functional 258 operations) is only available with P9 DD2. A new patch will be provided 259 to support this feature. 260 261.. _capi2-5.7-rc1-rn: 262 263- phb4 capi (i.e. CAPI2): Enable capi mode for PHB4 264 265 Enable the Coherently attached processor interface. The PHB is used as 266 a CAPI interface. 267 CAPI Adapters can be connected to either PEC0 or PEC2. Single port 268 CAPI adapter can be connected to either PEC0 or PEC2, but Dual-Port 269 Adapter can be only connected to PEC2 270 * CAPP0 attached to PHB0(PEC0 - single port) 271 * CAPP1 attached to PHB3(PEC2 - single or dual port) 272 273- hw/phb4: Rework phb4_get_presence_state() 274 275 There are two issues in current implementation: It should return errcode 276 visibile to Linux, which has prefix OPAL_*. The code isn't very obvious. 277 278 This returns OPAL_HARDWARE when the PHB is broken. Otherwise, OPAL_SUCCESS 279 is always returned. In the mean while, It refactors the code to make it 280 obvious: OPAL_PCI_SLOT_PRESENT is returned when the presence signal (low active) 281 or PCIe link is active. Otherwise, OPAL_PCI_SLOT_EMPTY is returned. 282 283- phb4: Error injection for config space 284 285 Implement CFG (config space) error injection. 286 287 This works the same as PHB3. MMIO and DMA error injection require a 288 rewrite, so they're unsupported for now. 289 290 While it's not feature complete, this at least provides an easy way to 291 inject an error that will trigger EEH. 292 293- phb4: Error clear implementation 294- phb4: Mask link down errors during reset 295 296 During a hot reset the PCI link will drop, so we need to mask link down 297 events to prevent unnecessary errors. 298- phb4: Implement root port initialization 299 300 phb4_root_port_init() was a NOP before, so fix that. 301- phb4: Complete reset implementation 302 303 This implements complete reset (creset) functionality for POWER9 DD1. 304 305 Only partially tested and contends with some DD1 errata, but it's a start. 306 307.. _shared-slot-5.7-rc1-rn: 308 309- phb4: Activate shared PCI slot on witherspoon 310 311 Witherspoon systems come with a 'shared' PCI slot: physically, it 312 looks like a x16 slot, but it's actually two x8 slots connected to two 313 PHBs of two different chips. Taking advantage of it requires some 314 logic on the PCI adapter. Only the Mellanox CX5 adapter is known to 315 support it at the time of this writing. 316 317 This patch enables support for the shared slot on witherspoon if a x16 318 adapter is detected. Each x8 slot has a presence bit, so both bits 319 need to be set for the activation to take place. Slot sharing is 320 activated through a gpio. 321 322 Note that there's no easy way to be sure that the card is indeed a 323 shared-slot compatible PCI adapter and not a normal x16 card. Plugging 324 a normal x16 adapter on the shared slot should be avoided on 325 witherspoon, as the link won't train on the second slot, resulting in 326 a timeout and a longer boot time. Only the first slot is usable and 327 the x16 adapter will end up using only half the lines. 328 329 If the PCI card plugged on the physical slot is only x8 (or less), 330 then the presence bit of the second slot is not set, so this patch 331 does nothing. The x8 (or less) adapter should work like on any other 332 physical slot. 333 334- phb4: Block D-state power management on direct slots 335 336 As current revisions of PHB4 don't properly handle the resulting 337 L1 link transition. 338 339- phb4: Call pci config filters 340 341- phb4: Mask out write-1-to-clear registers in RC cfg 342 343 The root complex config space only supports 4-byte accesses. Thus, when 344 the client requests a smaller size write, we do a read-modify-write to 345 the register. 346 347 However, some register have bits defined as "write 1 to clear". 348 349 If we do a RMW cycles on such a register and such bits are 1 in the 350 part that the client doesn't intend to modify, we will accidentally 351 write back those 1's and clear the corresponding bit. 352 353 This avoids it by masking out those magic bits from the "old" value 354 read from the register. 355 356- phb4: Properly mask out link down errors during reset 357- phb3/4: Silence a useless warning 358 359 PHB's don't have base location codes on non-FSP systems and it's 360 normal. 361 362- phb4: Workaround bug in spec 053 363 364 Wait for DLP PGRESET to clear *after* lifting the PCIe core reset 365 366- phb4: DD2.0 updates 367 368 Support StoreEOI, full complements of PEs (twice as big TVT) 369 and other updates. 370 371 Also renumber init steps to match spec 063 372 373 374NPU2 375^^^^ 376 377Note that currently NPU2 support is limited to POWER9 DD1 hardware. 378 379- platforms/astbmc/witherspoon.c: Add NPU2 slot mappings 380 381 For NVLink2 to function PCIe devices need to be associated with the right 382 NVLinks. This association is supposed to be passed down to Skiboot via HDAT but 383 those fields are still not correctly filled out. To work around this we add slot 384 tables for the NVLinks similar to what we have for P8+. 385 386- hw/npu2.c: Fix device aperture calculation 387 388 The POWER9 NPU2 implements an address compression scheme to compress 56-bit P9 389 physical addresses to 47-bit GPU addresses. System software needs to know both 390 addresses, unfortunately the calculation of the compressed address was 391 incorrect. Fix it here. 392 393- hw/npu2.c: Change MCD BAR allocation order 394 395 MCD BARs need to be correctly aligned to the size of the region. As GPU 396 memory is allocated from the top of memory down we should start allocating 397 from the highest GPU memory address to the lowest to ensure correct 398 alignment. 399 400- NPU2: Add flag to nvlink config space indicating DL reset state 401 402 Device drivers need to be able to determine if the DL is out of reset or 403 not so they can safely probe to see if links have already been trained. 404 This patch adds a flag to the vendor specific config space indicating if 405 the DL is out of reset. 406 407- hw/npu2.c: Hardcode MSR_SF when setting up npu XTS contexts 408 409 We don't support anything other than 64-bit mode for address translations so we 410 can safely hardcode it. 411 412- hw/npu2-hw-procedures.c: Add nvram option to override zcal calculations 413 414 In some rare cases the zcal state machine may fail and flag an error. According 415 to hardware designers it is sometimes ok to ignore this failure and use nominal 416 values for the calculations. In this case we add a nvram variable 417 (nv_zcal_override) which will cause skiboot to ignore the failure and use the 418 nominal value specified in nvram. 419- npu2: Fix npu2_{read,write}_4b() 420 421 When writing or reading 4-byte values, we need to use the upper half of 422 the 64-bit SCOM register. 423 424 Fix npu2_{read,write}_4b() and their callers to use uint32_t, and 425 appropriately shift the value being written or returned. 426 427 428- hw/npu2.c: Fix opal_npu_map_lpar to search for existing BDF 429- hw/npu2-hw-procedures.c: Fix running of zcal procedure 430 431 The zcal procedure should only be run once per obus (ie. once per group of 3 432 links). Clean up the code and fix the potential buffer overflow due to a typo. 433 Also updates the zcal settings to their proper values. 434- hw/npu2.c: Add memory coherence directory programming 435 436 The memory coherence directory (MCD) needs to know which system memory addresses 437 belong to the GPU. This amounts to setting a BAR and a size in the MCD to cover 438 the addresses assigned to each of the GPUs. To ease assignment we assume GPUs 439 are assigned memory in a contiguous block per chip. 440 441 442pflash/libflash 443--------------- 444 445- libflash/libffs: Zero checksum words 446 447 On writing ffs entries to flash libffs doesn't zero checksum words 448 before calculating the checksum across the entire structure. This causes 449 an inaccurate calculation of the checksum as it may calculate a checksum 450 on non-zero checksum bytes. 451 452- libffs: Fix ffs_lookup_part() return value 453 454 It would return success when the part wasn't found 455- libflash/libffs: Correctly update the actual size of the partition 456 457 libffs has been updating FFS partition information in the wrong place 458 which leads to incomplete erases and corruption. 459- libflash: Initialise entries list earlier 460 461 In the bail-out path we call ffs_close() to tear down the partially 462 initialised ffs_handle. ffs_close() expects the entries list to be 463 initialised so we need to do that earlier to prevent a null pointer 464 dereference. 465 466mbox-flash 467---------- 468 469mbox-flash is the emerging standard way of talking to host PNOR flash 470on POWER9 systems. 471 472- libflash/mbox-flash: Implement MARK_WRITE_ERASED mbox call 473 474 Version two of the mbox-flash protocol defines a new command: 475 MARK_WRITE_ERASED. 476 477 This command provides a simple way to mark a region of flash as all 0xff 478 without the need to go and write all 0xff. This is an optimisation as 479 there is no need for an erase before a write, it is the responsibility of 480 the BMC to deal with the flash correctly, however in v1 it was ambiguous 481 what a client should do if the flash should be erased but not actually 482 written to. This allows of a optimal path to resolve this problem. 483 484- libflash/mbox-flash: Update to V2 of the protocol 485 486 Updated version 2 of the protocol can be found at: 487 https://github.com/openbmc/mboxbridge/blob/master/Documentation/mbox_protocol.md 488 489 This commit changes mbox-flash such that it will preferentially talk 490 version 2 to any capable daemon but still remain capable of talking to 491 v1 daemons. 492 493 Version two changes some of the command definitions for increased 494 consistency and usability. 495 Version two includes more attention bits - these are now dealt with at a 496 simple level. 497- libflash/mbox-flash: Implement MARK_WRITE_ERASED mbox call 498 499 Version two of the mbox-flash protocol defines a new command: 500 MARK_WRITE_ERASED. 501 502 This command provides a simple way to mark a region of flash as all 0xff 503 without the need to go and write all 0xff. This is an optimisation as 504 there is no need for an erase before a write, it is the responsibility of 505 the BMC to deal with the flash correctly, however in v1 it was ambiguous 506 what a client should do if the flash should be erased but not actually 507 written to. This allows of a optimal path to resolve this problem. 508 509- libflash/mbox-flash: Update to V2 of the protocol 510 511 Updated version 2 of the protocol can be found at: 512 https://github.com/openbmc/mboxbridge/blob/master/Documentation/mbox_protocol.md 513 514 This commit changes mbox-flash such that it will preferentially talk 515 version 2 to any capable daemon but still remain capable of talking to 516 v1 daemons. 517 518 Version two changes some of the command definitions for increased 519 consistency and usability. 520 Version two includes more attention bits - these are now dealt with at a 521 simple level. 522 523- hw/lpc-mbox: Use message registers for interrupts 524 525 Currently the BMC raises the interrupt using the BMC control register. 526 It does so on all accesses to the 16 'data' registers meaning that when 527 the BMC only wants to set the ATTN (on which we have interrupts enabled) 528 bit we will also get a control register based interrupt. 529 530 The solution here is to mask that interrupt permanantly and enable 531 interrupts on the protocol defined 'response' data byte. 532 533General fixes 534------------- 535 536- Reduce log level on non-error log messages 537 538 90% of what we print isn't useful to a normal user. This 539 dramatically reduces the amount of messages printed by 540 OPAL in normal circumstances. 541 542- init: Silence messages and call ourselves "OPAL" 543- psi: Switch to ESB mode later 544 545 There's an errata, if we switch to ESB mode before setting up 546 the various ESB mode related registers, a pending interrupts 547 can go wrong. 548 549- lpc: Enable "new" SerIRQ mode 550- hw/ipmi/ipmi-sel: missing newline in prlog warning 551 552- p8-i2c OCC lock: fix locking in p9_i2c_bus_owner_change 553- Convert important polling loops to spin at lowest SMT priority 554 555 The pattern of calling cpu_relax() inside a polling loop does 556 not suit the powerpc SMT priority instructions. Prefrred is to 557 set a low priority then spin until break condition is reached, 558 then restore priority. 559 560- Improve cpu_idle when PM is disabled 561 562 Split cpu_idle() into cpu_idle_delay() and cpu_idle_job() rather than 563 requesting the idle type as a function argument. Have those functions 564 provide a default polling (non-PM) implentation which spin at the 565 lowest SMT priority. 566 567- core/fdt: Always add a reserve map 568 569 Currently we skip adding the reserved ranges block to the generated 570 FDT blob if we are excluding the root node. This can result in a DTB 571 that dtc will barf on because the reserved memory ranges overlap with 572 the start of the dt_struct block. As an example: :: 573 574 $ fdtdump broken.dtb -d 575 /dts-v1/; 576 // magic: 0xd00dfeed 577 // totalsize: 0x7f3 (2035) 578 // off_dt_struct: 0x30 <----\ 579 // off_dt_strings: 0x7b8 | this is bad! 580 // off_mem_rsvmap: 0x30 <----/ 581 // version: 17 582 // last_comp_version: 16 583 // boot_cpuid_phys: 0x0 584 // size_dt_strings: 0x3b 585 // size_dt_struct: 0x788 586 587 /memreserve/ 0x100000000 0x300000004; 588 /memreserve/ 0x3300000001 0x169626d2c; 589 /memreserve/ 0x706369652d736c6f 0x7473000000000003; 590 *continues* 591 592 With this patch: :: 593 594 $ fdtdump working.dtb -d 595 /dts-v1/; 596 // magic: 0xd00dfeed 597 // totalsize: 0x803 (2051) 598 // off_dt_struct: 0x40 599 // off_dt_strings: 0x7c8 600 // off_mem_rsvmap: 0x30 601 // version: 17 602 // last_comp_version: 16 603 // boot_cpuid_phys: 0x0 604 // size_dt_strings: 0x3b 605 // size_dt_struct: 0x788 606 607 // 0040: tag: 0x00000001 (FDT_BEGIN_NODE) 608 / { 609 // 0048: tag: 0x00000003 (FDT_PROP) 610 // 07fb: string: phandle 611 // 0054: value 612 phandle = <0x00000001>; 613 *continues* 614 615- hw/lpc-mbox: Use message registers for interrupts 616 617 Currently the BMC raises the interrupt using the BMC control register. 618 It does so on all accesses to the 16 'data' registers meaning that when 619 the BMC only wants to set the ATTN (on which we have interrupts enabled) 620 bit we will also get a control register based interrupt. 621 622 The solution here is to mask that interrupt permanantly and enable 623 interrupts on the protocol defined 'response' data byte. 624 625 626PCI 627--- 628- pci: Wait 20ms before checking presence detect on PCIe 629 630 As the PHB presence logic has a debounce timer that can take 631 a while to settle. 632 633- phb3+iov: Fixup support for config space filters 634 635 The filter should be called before the HW access and its 636 return value control whether to perform the access or not 637- core/pci: Use PCI slot's power facality in pci_enable_bridge() 638 639 The current implmentation has incorrect assumptions: there is 640 always a PCI slot associated with root port and PCIe switch 641 downstream port and all of them are capable to change its 642 power state by register PCICAP_EXP_SLOTCTL. Firstly, there 643 might not a PCI slot associated with the root port or PCIe 644 switch downstream port. Secondly, the power isn't controlled 645 by standard config register (PCICAP_EXP_SLOTCTL). There are 646 I2C slave devices used to control the power states on Tuleta. 647 648 In order to use the PCI slot's methods to manage the power 649 states, this does: 650 651 * Introduce PCI_SLOT_FLAG_ENFORCE, indicates the request operation 652 is enforced to be applied. 653 * pci_enable_bridge() is split into 3 functions: pci_bridge_power_on() 654 to power it on; pci_enable_bridge() as a place holder and 655 pci_bridge_wait_link() to wait the downstream link to come up. 656 * In pci_bridge_power_on(), the PCI slot's specific power management 657 methods are used if there is a PCI slot associated with the PCIe 658 switch downstream port or root port. 659- platforms/astbmc/slots.c: Allow comparison of bus numbers when matching slots 660 661 When matching devices on multiple down stream PLX busses we need to compare more 662 than just the device-id of the PCIe BDFN, so increase the mask to do so. 663 664Tests and simulators 665-------------------- 666 667- boot-tests: add OpenBMC support 668- boot_test.sh: Add SMC BMC support 669 670 Your BMC needs a special debug image flashed to use this, the exact 671 image and methods aren't something I can publish here, but if you work 672 for IBM or SMC you can find out from the right sources. 673 674 A few things are needed to move around to be able to flash to a SMC BMC. 675 676 For a start, the SSH daemon will only accept connections after a special 677 incantation (which I also can't share), but you should put that in the 678 ~/.skiboot_boot_tests file along with some other default login information 679 we don't publicise too broadly (because Security Through Obscurity is 680 *obviously* a good idea....) 681 682 We also can't just directly "ssh /bin/true", we need an expect script, 683 and we can't scp, but we can anonymous rsync! 684 685 You also need a pflash binary to copy over. 686- hdata_to_dt: Add PVR overrides to the usage text 687- mambo: Add a reservation for the initramfs 688 689 On most systems the initramfs is loaded inside the part of memory 690 reserved for the OS [0x0-0x30000000] and skiboot will never touch it. 691 On mambo it's loaded at 0x80000000 and if you're unlucky skiboot can 692 allocate over the top of it and corrupt the initramfs blob. 693 694 There might be the downside that the kernel cannot re-use the initramfs 695 memory since it's marked as reserved, but the kernel might also free it 696 anyway. 697- mambo: Update P9 PVR to reflect Scale out 24 core chips 698 699 The P9 PVR bits 48:51 don't indicate a revision but instead different 700 configurations. From BookIV we have: 701 702 ==== =================== 703 Bits Configuration 704 ==== =================== 705 0 Scale out 12 cores 706 1 Scale out 24 cores 707 2 Scale up 12 cores 708 3 Scale up 24 cores 709 ==== =================== 710 711 Skiboot will mostly the use "Scale out 24 core" configuration 712 (ie. SMT4 not SMT8) so reflect this in mambo. 713- core: Move enable_mambo_console() into chip initialisation 714 715 Rather than having a wart in main_cpu_entry() that initialises the mambo 716 console, we can move it into init_chips() which is where we discover that we're 717 on mambo. 718 719- mambo: Create multiple chips when we have multiple CPUs 720 721 Currently when we boot mambo with multiple CPUs, we create multiple CPU nodes in 722 the device tree, and each claims to be on a separate chip. 723 724 However we don't create multiple xscom nodes, which means skiboot only knows 725 about a single chip, and all CPUs end up on it. At the moment mambo is not able 726 to create multiple xscom controllers. We can create fake ones, just by faking 727 the device tree up, but that seems uglier than this solution. 728 729 So create a mambo-chip for each CPU other than 0, to tell skiboot we want a 730 separate chip created. This then enables Linux to see multiple chips: :: 731 732 smp: Brought up 2 nodes, 2 CPUs 733 numa: Node 0 CPUs: 0 734 numa: Node 1 CPUs: 1 735 736- chip: Add support for discovering chips on mambo 737 738 Currently the only way for skiboot to discover chips is by looking for xscom 739 nodes. But on mambo it's currently not possible to create multiple xscom nodes, 740 which means we can only simulate a single chip system. 741 742 However it seems we can fairly cleanly add support for a special mambo chip 743 node, and use that to instantiate multiple chips. 744 745 Add a check in init_chip() that we're not clobbering an already initialised 746 chip, now that we have two places that initialise chips. 747- mambo: Make xscom claim to be DD 2.0 748 749 In the mambo tcl we set the CPU version to DD 2.0, because mambo is not 750 bug compatible with DD 1. 751 752 But in xscom_read_cfam_chipid() we have a hard coded value, to work 753 around the lack of the f000f register, which claims to be P9 DD 1.0. 754 755 This doesn't seem to cause crashes or anything, but at boot we do see: :: 756 757 [ 0.003893084,5] XSCOM: chip 0x0 at 0x1a0000000000 [P9N DD1.0] 758 759 So fix it to claim that the xscom is also DD 2.0 to match the CPU. 760 761- mambo: Match whole string when looking up symbols with linsym/skisym 762 763 linsym/skisym use a regex to match the symbol name, and accepts a 764 partial match against the entry in the symbol map, which can lead to 765 somewhat confusing results, eg: :: 766 767 systemsim % linsym early_setup 768 0xc000000000027890 769 systemsim % linsym early_setup$ 770 0xc000000000aa8054 771 systemsim % linsym early_setup_secondary 772 0xc000000000027890 773 774 I don't think that's the behaviour we want, so append a $ to the name so 775 that the symbol has to match against the whole entry, eg: :: 776 777 systemsim % linsym early_setup 778 0xc000000000aa8054 779 780- Disable nap on P8 Mambo, public release has bugs 781- mambo: Allow loading multiple CPIOs 782 783 Currently we have support for loading a single CPIO and telling Linux to 784 use it as the initrd. But the Linux code actually supports having 785 multiple CPIOs contiguously in memory, between initrd-start and end, and 786 will unpack them all in order. That is a really nice feature as it means 787 you can have a base CPIO with your root filesystem, and then tack on 788 others as you need for various tests etc. 789 790 So expand the logic to handle SKIBOOT_INITRD, and treat it as a comma 791 separated list of CPIOs to load. I chose comma as it's fairly rare in 792 filenames, but we could make it space, colon, whatever. Or we could add 793 a new environment variable entirely. The code also supports trimming 794 whitespace from the values, so you can have "cpio1, cpio2". 795- hdata/test: Add memory reservations to hdata_to_dt 796 797 Currently memory reservations are parsed, but since they are not 798 processed until mem_region_init() they don't appear in the output 799 device tree blob. Several bugs have been found with memory reservations 800 so we want them to be part of the test output. 801 802 Add them and clean up several usages of printf() since we want only the 803 dtb to appear in standard out. 804 805IBM FSP systems 806--------------- 807 808- FSP/CONSOLE: Fix possible NULL dereference 809- platforms/ibm-fsp/firenze: Fix PCI slot power-off pattern 810 811 When powering off the PCI slot, the corresponding bits should 812 be set to 0bxx00xx00 instead of 0bxx11xx11. Otherwise, the 813 specified PCI slot can't be put into power-off state. Fortunately, 814 it didn't introduce any side-effects so far. 815- FSP/CONSOLE: Workaround for unresponsive ipmi daemon 816 817 We use TCE mapped area to write data to console. Console header 818 (fsp_serbuf_hdr) is modified by both FSP and OPAL (OPAL updates 819 next_in pointer in fsp_serbuf_hdr and FSP updates next_out pointer). 820 821 Kernel makes opal_console_write() OPAL call to write data to console. 822 OPAL write data to TCE mapped area and sends MBOX command to FSP. 823 If our console becomes full and we have data to write to console, 824 we keep on waiting until FSP reads data. 825 826 In some corner cases, where FSP is active but not responding to 827 console MBOX message (due to buggy IPMI) and we have heavy console 828 write happening from kernel, then eventually our console buffer 829 becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to 830 kernel. Kernel will keep on retrying. This is creating kernel soft 831 lockups. In some extreme case when every CPU is trying to write to 832 console, user will not be able to ssh and thinks system is hang. 833 834 If we reset FSP or restart IPMI daemon on FSP, system recovers and 835 everything becomes normal. 836 837 This patch adds workaround to above issue by returning OPAL_HARDWARE 838 when cosole is full. Side effect of this patch is, we may endup dropping 839 latest console data. But better to drop console data than system hang. 840 841- FSP: Set status field in response message for timed out message 842 843 For timed out FSP messages, we set message status as "fsp_msg_timeout". 844 But most FSP driver users (like surviellance) are ignoring this field. 845 They always look for FSP returned status value in callback function 846 (second byte in word1). So we endup treating timed out message as success 847 response from FSP. 848 849 Sample output: :: 850 851 [69902.432509048,7] SURV: Sending the heartbeat command to FSP 852 [70023.226860117,4] FSP: Response from FSP timed out, word0 = d66a00d7, word1 = 0 state: 3 853 .... 854 [70023.226901445,7] SURV: Received heartbeat acknowledge from FSP 855 [70023.226903251,3] FSP: fsp_trigger_reset() entry 856 857 Here SURV code thought it got valid response from FSP. But actually we didn't 858 receive response from FSP. 859 860 This patch fixes above issue by updating status field in response structure. 861 862- FSP: Improve timeout message 863 864- FSP/RTC: Fix possible FSP R/R issue in rtc write path 865- hw/fsp/rtc: read/write cached rtc tod on fsp hir. 866 867 Currently fsp-rtc reads/writes the cached RTC TOD on an fsp 868 reset. Use latest fsp_in_rr() function to properly read the cached rtc 869 value when fsp reset initiated by the hir. 870 871 Below is the kernel trace when we set hw clock, when hir process starts. :: 872 873 [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688] 874 [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc 875 [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu 876 [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000 877 [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70 878 [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901 Not tainted (4.10.0-14-generic) 879 [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> 880 [ 1727.775889] CR: 28024442 XER: 20000000 881 [ 1727.775890] CFAR: c00000000008472c SOFTE: 1 882 GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4 883 GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000 884 GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003 885 GPR12: c0000000000846e8 c00000000fba0100 886 [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 887 [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48 888 [ 1727.775899] Call Trace: 889 [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable) 890 [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0 891 [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630 892 [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0 893 [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0 894 [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0 895 [ 1727.775908] Instruction dump: 896 [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020 897 [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4 898 899 This is found when executing the testcase 900 https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py 901 902 With this fix ran fsp hir torture testcase in the above test 903 which is working fine. 904- occ: Set return variable to correct value 905 906 When entering this section of code rc will be zero. If fsp_mkmsg() fails 907 the code responsible for printing an error message won't be set. 908 Resetting rc should allow for the error case to trigger if fsp_mkmsg 909 fails. 910- capp: Fix hang when CAPP microcode LID is missing on FSP machine 911 912 When the LID is absent, we fail early with an error from 913 start_preload_resource. In that case, capp_ucode_info.load_result 914 isn't set properly causing a subsequent capp_lid_download() to 915 call wait_for_resource_loaded() on something that isn't being 916 loaded, thus hanging. 917 918- FSP: Add check to detect FSP R/R inside fsp_sync_msg() 919 920 OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued 921 -> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response 922 from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue 923 including inflight message (fsp_reset_cmdclass()). But we are not resetting 924 inflight message state. 925 926 In extreme croner case where we sent message to FSP via fsp_sync_msg() path 927 and FSP R/R happens before getting respose from FSP, then we will endup waiting 928 in fsp_sync_msg() until everything becomes normal. 929 930 This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller 931 if FSP is in R/R. 932- FSP: Add check to detect FSP R/R inside fsp_sync_msg() 933 934 OPAL sends MBOX message to FSP and updates message state from fsp_msg_queued 935 -> fsp_msg_sent. fsp_sync_msg() queues message and waits until we get response 936 from FSP. During FSP R/R we move outstanding MBOX messages from msgq to rr_queue 937 including inflight message (fsp_reset_cmdclass()). But we are not resetting 938 inflight message state. 939 940 In extreme croner case where we sent message to FSP via fsp_sync_msg() path 941 and FSP R/R happens before getting respose from FSP, then we will endup waiting 942 in fsp_sync_msg() until everything becomes normal. 943 944 This patch adds fsp_in_rr() check to fsp_sync_msg() and return error to caller 945 if FSP is in R/R. 946- capp: Fix hang when CAPP microcode LID is missing on FSP machine 947 948 When the LID is absent, we fail early with an error from 949 start_preload_resource. In that case, capp_ucode_info.load_result 950 isn't set properly causing a subsequent capp_lid_download() to 951 call wait_for_resource_loaded() on something that isn't being 952 loaded, thus hanging. 953- FSP/CONSOLE: Do not free fsp_msg in error path 954 955 as we reuse same msg to send next output message. 956 957- platform/zz: Acknowledge OCC_LOAD mbox message in ZZ 958 959 In P9 FSP box, OCC image is pre-loaded. So do not handle the load 960 command and send SUCCESS to FSP on recieving OCC_LOAD mbox message. 961 962- FSP/RTC: Improve error log 963 964astbmc systems 965-------------- 966 967- platforms/astbmc: Don't validate model on palmetto 968 969 The platform isn't compatible with palmetto until the root device-tree 970 node's "model" property is NULL or "palmetto". However, we could have 971 "TN71-BP012" for the property on palmetto. :: 972 973 linux# cat /proc/device-tree/model 974 TN71-BP012 975 976 This skips the validation on root device-tree node's "model" property 977 on palmetto, meaning we check the "compatible" property only. 978 979 980