1.. _skiboot-6.4: 2 3skiboot-6.4 4=========== 5 6skiboot v6.4 was released on Tuesday July 16th 2019. It is the first 7release of skiboot 6.4, which becomes the new stable release 8of skiboot following the 6.3 release, first released May 3rd 2019. 9 10Skiboot 6.4 will mark the basis for op-build v2.4. 11 12skiboot v6.4 contains all bug fixes as of :ref:`skiboot-6.0.20`, 13and :ref:`skiboot-6.3.2` (the currently maintained stable releases). 14 15For how the skiboot stable releases work, see :ref:`stable-rules` for details. 16 17Over skiboot 6.3, we have the following changes: 18 19.. _skiboot-6.4-new-features: 20 21New features 22------------ 23 24Since skiboot v6.4-rc1: 25 26- npu2-opencapi: Add opencapi support on ZZ 27 28 This patch adds opencapi support on ZZ. It hard-codes the required 29 device tree entries for the NPU and links. The alternative was to use 30 HDAT, but it somehow proved too painful to do. 31 32 The new device tree entries activate the npu2 init code on ZZ. On 33 systems with no opencapi adapters, it should go unnoticed, as presence 34 detection will skip link training. 35 36Since skiboot v6.3: 37 38- platforms/nicole: Add new platform 39 40 The platform is a new platform from YADRO, it's a storage controller for 41 TATLIN server. It's Based on IBM Romulus reference design (POWER9). 42 43- platform/zz: Add new platform type 44 45 We have new platform type under ZZ. Lets add them. With this fix 46- nvram: Flag dangerous NVRAM options 47 48 Most nvram options used by skiboot are just for debug or testing for 49 regressions. They should never be used long term. 50 51 We've hit a number of issues in testing and the field where nvram 52 options have been set "temporarily" but haven't been properly cleared 53 after, resulting in crashes or real bugs being masked. 54 55 This patch marks most nvram options used by skiboot as dangerous and 56 prints a chicken to remind users of the problem. 57 58- hw/phb3: Add verbose EEH output 59 60 Add support for the pci-eeh-verbose NVRAM flag on PHB3. We've had this 61 on PHB4 since forever and it has proven very useful when debugging EEH 62 issues. When testing changes to the Linux kernel's EEH implementation 63 it's fairly common for the kernel to crash before printing the EEH log 64 so it's helpful to have it in the OPAL log where it can be dumped from 65 XMON. 66 67 Note that unlike PHB4 we do not enable verbose mode by default. The 68 nvram option must be used to explicitly enable it. 69 70- Experimental support for building without FSP code 71 72 Now, with CONFIG_FSP=0/1 we have: 73 74 - 1.6M/1.4M skiboot.lid 75 - 323K/375K skiboot.lid.xz 76 77- doc: travis-ci deploy docs! 78 79 Documentation is now automatically deployed if you configure Travis CI 80 appropriately (we have done this for the open-power branch of skiboot) 81 82- Big OPAL API Documentation improvement 83 84 A lot more OPAL API calls are now (at least somewhat) documented. 85- opal/hmi: Report NPU2 checkstop reason 86 87 The NPU2 is currently not passing any information to linux to explain 88 the cause of an HMI. NPU2 has three Fault Isolation Registers and over 89 30 of those FIR bits are configured to raise an HMI by default. We 90 won't be able to fit all possible state in the 32-bit xstop_reason 91 field of the HMI event, but we can still try to encode up to 4 HMI 92 reasons. 93- opal-msg: Enhance opal-get-msg API 94 95 Linux uses :ref:`OPAL_GET_MSG` API to get OPAL messages. This interface 96 supports upto 8 params (64 bytes). We have a requirement to send bigger data to 97 Linux. This patch enhances OPAL to send bigger data to Linux. 98 99 - Linux will use "opal-msg-size" device tree property to allocate memory for 100 OPAL messages (previous patch increased "opal-msg-size" to 64K). 101 - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux 102 side opal_get_msg user can detect actual data size. 103 - If buffer size < actual message size, then opal_get_msg will copy partial 104 data and return OPAL_PARTIAL to Linux. 105 - Add new variable "extended" to "opal_msg_entry" structure to keep track 106 of messages that has more than 64byte data. We will allocate separate 107 memory for these messages and once kernel consumes message we will 108 release that memory. 109- core/opal: Increase opal-msg-size size 110 111 Kernel will use `opal-msg-size` property to allocate memory for opal_msg. 112 We want to send bigger data from OPAL to kernel. Hence increase 113 opal-msg-size to 64K. 114- hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memory 115 116 Lowest Point of Coherency (LPC) memory allows the host to access memory on 117 an OpenCAPI device. 118 119 Define 2 OPAL calls, :ref:`OPAL_NPU_MEM_ALLOC` and :ref:`OPAL_NPU_MEM_RELEASE`, for 120 assigning and clearing the memory BAR. (We try to avoid using the term 121 "LPC" to avoid confusion with Low Pin Count.) 122 123 At present, we use a fixed location in the address space, which means we 124 are restricted to a single range of 4TB, on a single OpenCAPI device per 125 chip. In future, we'll use some chip ID extension magic to give us more 126 space, and some sort of allocator to assign ranges to more than one device. 127- core/fast-reboot: Add im-feeling-lucky option 128 129 Fast reboot gets disabled for a number of reasons e.g. the availability 130 of nvlink. However this doesn't actually affect the ability to perform fast 131 reboot if no nvlink device is actually present. 132 133 Add a nvram option for fast-reset where if it's set to 134 "im-feeling-lucky" then perform the fast-reboot irrespective of if it's 135 previously been disabled. 136 137- platforms/astbmc: Check for SBE validation step 138 139 On some POWER8 astbmc systems an update to the SBE requires pausing at 140 runtime to ensure integrity of the SBE. If this is required the BMC will 141 set a chassis boot option IPMI flag using the OEM parameter 0x62. If 142 Skiboot sees this flag is set it waits until the SBE update is complete 143 and the flag is cleared. 144 145 Unfortunately the mystery operation that validates the SBE also leaves 146 it in a bad state and unable to be used for timer operations. To 147 workaround this the flag is checked as soon as possible (ie. when IPMI 148 and the console are set up), and once complete the system is rebooted. 149- Add P9 DIO interrupt support 150 151 On P9 there are GPIO port 0, 1, 2 for GPIO interrupt, and DIO interrupt 152 is used to handle the interrupts. 153 154 Add support to the DIO interrupts: 155 156 1. Add dio_interrupt_register(chip, port, callback) to register the 157 interrupt 158 2. Add dio_interrupt_deregister(chip, port, callback) to deregister; 159 3. When interrupt on the port occurs, callback is invoked, and the 160 interrupt status is cleared. 161 162 163Removed features 164---------------- 165 166Since skiboot v6.3: 167 168- pci/iov: Remove skiboot VF tracking 169 170 This feature was added a few years ago in response to a request to make 171 the MaxPayloadSize (MPS) field of a Virtual Function match the MPS of the 172 Physical Function that hosts it. 173 174 The SR-IOV specification states the the MPS field of the VF is "ResvP". 175 This indicates the VF will use whatever MPS is configured on the PF and 176 that the field should be treated as a reserved field in the config space 177 of the VF. In other words, a SR-IOV spec compliant VF should always return 178 zero in the MPS field. Adding hacks in OPAL to make it non-zero is... 179 misguided at best. 180 181 Additionally, there is a bug in the way pci_device structures are handled 182 by VFs that results in a crash on fast-reboot that occurs if VFs are 183 enabled and then disabled prior to rebooting. This patch fixes the bug by 184 removing the code entirely. This patch has no impact on SR-IOV support on 185 the host operating system. 186- Remove POWER7 and POWER7+ support 187 188 It's been a good long while since either OPAL POWER7 user touched a 189 machine, and even longer since they'd have been okay using an old 190 version rather than tracking master. 191 192 There's also been no testing of OPAL on POWER7 systems for an awfully 193 long time, so it's pretty safe to assume that it's very much bitrotted. 194 195 It also saves a whole 14kb of xz compressed payload space. 196- Remove remnants of :ref:`OPAL_PCI_GET_PHB_DIAG_DATA` 197 198 Never present in a public OPAL release, and only kernels prior to 3.11 199 would ever attempt to call it. 200- Remove unused :ref:`OPAL_GET_XIVE_SOURCE` 201 202 While this call was technically implemented by skiboot, no code has ever called 203 it, and it was only ever implemented for the p7ioc-phb back-end (i.e. POWER7). 204 Since this call was unused in Linux, and that POWER7 with OPAL was only ever 205 available internally, so it should be safe to remove the call. 206- Remove unused :ref:`OPAL_PCI_GET_XIVE_REISSUE` and :ref:`OPAL_PCI_SET_XIVE_REISSUE` 207 208 These seem to be remnants of one of the OPAL incarnations prior to 209 OPALv3. These calls have never been implemented in skiboot, and never 210 used by an upstream kernel (nor a PowerKVM kernel). 211 212 It's rather safe to just document them as never existing. 213- Remove never implemented :ref:`OPAL_PCI_SET_PHB_TABLE_MEMORY` and document why 214 215 Not ever used by upstream linux or PowerKVM tree. Never implemented in 216 skiboot (not even in ancient internal only tree). 217 218 So, it's incredibly safe to remove. 219- Remove unused :ref:`OPAL_PCI_EEH_FREEZE_STATUS2` 220 221 This call was introduced all the way back at the end of 2012, before 222 OPAL was public. The #define for the OPAL call was introduced to the 223 Linux kernel in June 2013, and the call was never used in any kernel 224 tree ever (as far as we can find). 225 226 Thus, it's quite safe to remove this completely unused and completely 227 untested OPAL call. 228- Document the long removed :ref:`OPAL_REGISTER_OPAL_EXCEPTION_HANDLER` call 229 230 I'm pretty sure this was removed in one of our first ever service packs. 231 232 Fixes: https://github.com/open-power/skiboot/issues/98 233- Remove last remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY` and :ref:`OPAL_PCI_SET_HUB_TCE_MEMORY` 234 235 Since we have not supported p5ioc systems since skiboot 5.2, it's pretty 236 safe to just wholesale remove these OPAL calls now. 237- Remove remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY` 238 239 There's no reason we need remnants hanging around that aren't used, so 240 remove them and save a handful of bytes at runtime. 241 242 Simultaneously, document the OPAL call removal. 243 244 245Secure and Trusted Boot 246----------------------- 247 248Since skiboot v6.3: 249 250- trustedboot: Change PCR and event_type for the skiboot events 251 252 The existing skiboot events are being logged as EV_ACTION, however, the 253 TCG PC Client spec says that EV_ACTION events should have one of the 254 pre-defined strings in the event field recorded in the event log. For 255 instance: 256 257 - "Calling Ready to Boot", 258 - "Entering ROM Based Setup", 259 - "User Password Entered", and 260 - "Start Option ROM Scan. 261 262 None of the EV_ACTION pre-defined strings are applicable to the existing 263 skiboot events. Based on recent discussions with other POWER teams, this 264 patch proposes a convention on what PCR and event types should be used 265 for skiboot events. This also changes the skiboot source code to follow 266 the convention. 267 268 The TCG PC Client spec defines several event types, other than 269 EV_ACTION. However, many of them are specific to UEFI events and some 270 others are related to platform or CRTM events, which is more applicable 271 to hostboot events. 272 273 Currently, most of the hostboot events are extended to PCR[0,1] and 274 logged as either EV_PLATFORM_CONFIG_FLAGS, EV_S_CRTM_CONTENTS or 275 EV_POST_CODE. The "Node Id" and "PAYLOAD" events, though, are extended 276 to PCR[4,5,6] and logged as EV_COMPACT_HASH. 277 278 For the lack of an event type that fits the specific purpose, 279 EV_COMPACT_HASH seems to be the most adequate one due to its 280 flexibility. According to the TCG PC Client spec: 281 282 - May be used for any PCR except 0, 1, 2 and 3. 283 - The event field may be informative or may be hashed to generate the 284 digest field, depending on the component recording the event. 285 286 Additionally, the PCR[4,5] seem to be the most adequate PCRs. They would 287 be used for skiboot and some skiroot events. According to the TCG PC 288 Client, PCR[4] is intended to represent the entity that manages the 289 transition between the pre-OS and OS-present state of the platform. 290 PCR[4], along with PCR[5], identifies the initial OS loader. 291 292 In summary, for skiboot events: 293 294 - Events that represents data should be extended to PCR 4. 295 - Events that represents config should be extended to PCR 5. 296 - For the lack of an event type that fits the specific purpose, 297 both data and config events should be logged as EV_COMPACT_HASH. 298 299Sensors 300------- 301 302Since skiboot v6.3: 303 304- occ-sensors: Check if OCC is reset while reading inband sensors 305 306 OCC may not be able to mark the sensor buffer as invalid while going 307 down RESET. If OCC never comes back we will continue to read the stale 308 sensor data. So verify if OCC is reset while reading the sensor values 309 and propagate the appropriate error. 310 311IPMI 312---- 313 314Since skiboot v6.3: 315 316- ipmi: ensure forward progress on ipmi_queue_msg_sync() 317 318 BT responses are handled using a timer doing the polling. To hope to 319 get an answer to an IPMI synchronous message, the timer needs to run. 320 321 We can't just check all timers though as there may be a timer that 322 wants a lock that's held by a code path calling ipmi_queue_msg_sync(), 323 and if we did enforce that as a requirement, it's a pretty subtle 324 API that is asking to be broken. 325 326 So, if we just run a poll function to crank anything that the IPMI 327 backend needs, then we should be fine. 328 329 This issue shows up very quickly under QEMU when loading the first 330 flash resource with the IPMI HIOMAP backend. 331 332NPU2 333---- 334 335Since skiboot v6.4-rc1: 336 337- witherspoon: Add nvlink peers in finalise_dt() 338 339 This information is consumed by Linux so it needs to be in the DT. Move 340 it to finalise_dt(). 341 342Since skiboot v6.3: 343 344- npu2: Increase timeout for L2/L3 cache purging 345 346 On NVLink2 bridge reset, we purge all L2/L3 caches in the system. 347 This is an asynchronous operation, we have a 2ms timeout here. There are 348 reports that this is not enough and "PURGE L3 on core xxx timed out" 349 messages appear (for the reference: on the test setup this takes 350 280us..780us). 351 352 This defines the timeout as a macro and changes this from 2ms to 20ms. 353 354 This adds a tracepoint to tell how long it took to purge all the caches. 355- npu2: Purge cache when resetting a GPU 356 357 After putting all a GPU's links in reset, do a cache purge in case we 358 have CPU cache lines belonging to the now-unaccessible GPU memory. 359- npu2-opencapi: Mask 2 XSL errors 360 361 Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some 362 opencapi FIRs") converted some FIR bits default action from system 363 checkstop to raising an error interrupt. For 2 XSL error events that 364 can be triggered by a misbehaving AFU, the error interrupt is raised 365 twice, once for each link (the XSL logic in the NPU is shared between 366 2 links). So a badly behaving AFU could impact another, unsuspecting 367 opencapi adapter. 368 369 It doesn't look good and it turns out we can do better. We can mask 370 those 2 XSL errors. The error will also be picked up by the OTL logic, 371 which is per link. So we'll still get an error interrupt, but only on 372 the relevant link, and the other opencapi adapter can stay functional. 373- npu2: Clear fence state for a brick being reset 374 375 Resetting a GPU before resetting an NVLink leads to occasional HMIs 376 which fence some bricks and prevent the "reset_ntl" procedure from 377 succeeding at the "reset_ntl_release" step - the host system requires 378 reboot; there may be other cases like this as well. 379 380 This adds clearing of the fence bit in NPU.MISC.FENCE_STATE for 381 the NVLink which we are about to reset. 382- npu2: Fix clearing the FIR bits 383 384 FIR registers are SCOM-only so they cannot be accesses with the indirect 385 write, and yet we use SCOM-based addresses for these; fix this. 386 387- npu2: Reset NVLinks when resetting a GPU 388 389 Resetting a V100 GPU brings its NVLinks down and if an NPU tries using 390 those, an HMI occurs. We were lucky not to observe this as the bare metal 391 does not normally reset a GPU and when passed through, GPUs are usually 392 before NPUs in QEMU command line or Libvirt XML and because of that NPUs 393 are naturally reset first. However simple change of the device order 394 brings HMIs. 395 396 This defines a bus control filter for a PCI slot with a GPU with NVLinks 397 so when the host system issues secondary bus reset to the slot, it resets 398 associated NVLinks. 399- npu2: Reset PID wildcard and refcounter when mapped to LPID 400 401 Since 105d80f85b "npu2: Use unfiltered mode in XTS tables" we do not 402 register every PID in the XTS table so the table has one entry per LPID. 403 Then we added a reference counter to keep track of the entry use when 404 switching GPU between the host and guest systems (the "Fixes:" tag below). 405 406 The POWERNV platform setup creates such entries and references them 407 at the boot time when initializing IOMMUs and only removes it when 408 a GPU is passed through to a guest. This creates a problem as POWERNV 409 boots via kexec and no defererencing happens; the XTS table state remains 410 undefined. So when the host kernel boots, skiboot thinks there are valid 411 XTS entries and does not update the XTS table which breaks ATS. 412 413 This adds the reference counter and the XTS entry reset when a GPU is 414 assigned to LPID and we cannot rely on the kernel to clean that up. 415 416PHB4 417---- 418 419Since skiboot v6.3: 420 421- hw/phb4: Make phb4_training_trace() more general 422 423 phb4_training_trace() is used to monitor the Link Training Status 424 State Machine (LTSSM) of the PHB's data link layer. Currently it is only 425 used to observe the LTSSM while bringing up the link, but sometimes it's 426 useful to see what's occurring in other situations (e.g. link disable, or 427 secondary bus reset). This patch renames it to phb4_link_trace() and 428 allows the target LTSSM state and a flexible timeout to help in these 429 situations. 430- hw/phb4: Make pci-tracing print at PR_NOTICE 431 432 When pci-tracing is enabled we print each trace status message and the 433 final trace status at PR_ERROR. The final status messages are similar to 434 those printed when we fail to train in the non-pci-tracing path and this 435 has resulted in spurious op-test failures. 436 437 This patch reduces the log-level of the tracing message to PR_NOTICE so 438 they're not accidently interpreted as actual error messages. PR_NOTICE 439 messages are still printed to the console during boot. 440- hw/phb4: Use read/write_reg in assert_perst 441 442 While the PHB is fenced we can't use the MMIO interface to access PHB 443 registers. While processing a complete reset we inject a PHB fence to 444 isolate the PHB from the rest of the system because the PHB won't 445 respond to MMIOs from the rest of the system while being reset. 446 447 We assert PERST after the fence has been erected which requires us to 448 use the XSCOM indirect interface to access the PHB registers rather than 449 the MMIO interface. Previously we did that when asserting PERST in the 450 CRESET path. However in b8b4c79d4419 ("hw/phb4: Factor out PERST 451 control"). This was re-written to use the raw in_be64() accessor. This 452 means that CRESET would not be asserted in the reset path. On some 453 Mellanox cards this would prevent them from re-loading their firmware 454 when the system was fast-reset. 455 456 This patch fixes the problem by replacing the raw {in|out}_be64() 457 accessors with the phb4_{read|write}_reg() functions. 458 459- hw/phb4: Assert Link Disable bit after ETU init 460 461 The cursed RAID card in ozrom1 has a bug where it ignores PERST being 462 asserted. The PCIe Base spec is a little vague about what happens 463 while PERST is asserted, but it does clearly specify that when 464 PERST is de-asserted the Link Training and Status State Machine 465 (LTSSM) of a device should return to the initial state (Detect) 466 defined in the spec and the link training process should restart. 467 468 This bug was worked around in 9078f8268922 ("phb4: Delay training till 469 after PERST is deasserted") by setting the link disable bit at the 470 start of the FRESET process and clearing it after PERST was 471 de-asserted. Although this fixed the bug, the patch offered no 472 explaination of why the fix worked. 473 474 In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable 475 workaround was moved into phb4_assert_perst(). This is called 476 always in the CRESET case, but a following patch resulted in 477 assert_perst() not being called if phb4_freset() was entered following a 478 CRESET since p->skip_perst was set in the CRESET handler. This is bad 479 since a side-effect of the CRESET is that the Link Disable bit is 480 cleared. 481 482 This, combined with the RAID card ignoring PERST results in the PCIe 483 link being trained by the PHB while we're waiting out the 100ms 484 ETU reset time. If we hack skiboot to print a DLP trace after returning 485 from phb4_hw_init() we get: :: 486 487 PHB#0001[0:1]: Initialization complete 488 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling 489 PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect 490 PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling 491 PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config 492 PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery 493 PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery 494 PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0 495 PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0 496 PHB#0001[0:1]: CRESET: wait_time = 100 497 PHB#0001[0:1]: FRESET: Starts 498 PHB#0001[0:1]: FRESET: Prepare for link down 499 PHB#0001[0:1]: FRESET: Assert skipped 500 PHB#0001[0:1]: FRESET: Deassert 501 PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0 502 PHB#0001[0:1]: TRACE: Reached target state 503 PHB#0001[0:1]: LINK: Start polling 504 PHB#0001[0:1]: LINK: Electrical link detected 505 PHB#0001[0:1]: LINK: Link is up 506 PHB#0001[0:1]: LINK: Went down waiting for stabilty 507 PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000 508 PHB#0001[0:1]: CRESET: Starts 509 510 What has happened here is that the link is trained to 8x Gen3 33ms after 511 we return from phb4_init_hw(), and before we've waitined to 100ms 512 that we normally wait after re-initialising the ETU. When we "deassert" 513 PERST later on in the FRESET handler the link in L0 (normal) state. At 514 this point we try to read from the Vendor/Device ID register to verify 515 that the link is stable and immediately get a PHB fence due to a PCIe 516 Completion Timeout. Skiboot attempts to recover by doing another CRESET, 517 but this will encounter the same issue. 518 519 This patch fixes the problem by setting the Link Disable bit (by calling 520 phb4_assert_perst()) immediately after we return from phb4_init_hw(). 521 This prevents the link from being trained while PERST is asserted which 522 seems to avoid the Completion Timeout. With the patch applied we get: :: 523 524 PHB#0001[0:1]: Initialization complete 525 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling 526 PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect 527 PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling 528 PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled 529 PHB#0001[0:1]: CRESET: wait_time = 100 530 PHB#0001[0:1]: FRESET: Starts 531 PHB#0001[0:1]: FRESET: Prepare for link down 532 PHB#0001[0:1]: FRESET: Assert skipped 533 PHB#0001[0:1]: FRESET: Deassert 534 PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect 535 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling 536 PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect 537 PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling 538 PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config 539 PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery 540 PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery 541 PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0 542 PHB#0001[0:1]: TRACE: Reached target state 543 PHB#0001[0:1]: LINK: Start polling 544 PHB#0001[0:1]: LINK: Electrical link detected 545 PHB#0001[0:1]: LINK: Link is up 546 PHB#0001[0:1]: LINK: Link is stable 547 PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled 548 PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3 549 PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08 550 PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000 551 552 553Simulators 554---------- 555 556Since skiboot v6.3: 557 558- external/mambo: Bump default POWER9 to Nimbus DD2.3 559- external/mambo: fix tcl startup code for mambo bogus net (repost) 560 561 This fixes a couple issues with external/mambo/skiboot.tcl so I can use the 562 mambo bogus net. 563 564 * newer distros (ubuntu 18.04) allow tap device to have a user specified 565 name instead of just tapN so we need to pass in a name not a number. 566 * need some kind of default for net_mac, and need the mconfig for it 567 to be set from an env var. 568- skiboot.tcl: Add option to wait for GDB server connection 569 570 Add an environment variable which makes Mambo wait for a connection 571 from gdb prior to starting simulation. 572- mambo: Integrate addr2line into backtrace command 573 574 Gives nice output like this: :: 575 576 systemsim % bt 577 pc: 0xC0000000002BF3D4 _savegpr0_28+0x0 578 lr: 0xC00000000004E0F4 opal_call+0x10 579 stack:0x000000000041FAE0 0xC00000000004F054 opal_check_token+0x20 580 stack:0x000000000041FB50 0xC0000000000500CC __opal_flush_console+0x88 581 stack:0x000000000041FBD0 0xC000000000050BF8 opal_flush_console+0x24 582 stack:0x000000000041FC00 0xC0000000001F9510 udbg_opal_putc+0x88 583 stack:0x000000000041FC40 0xC000000000020E78 udbg_write+0x7c 584 stack:0x000000000041FC80 0xC0000000000B1C44 console_unlock+0x47c 585 stack:0x000000000041FD80 0xC0000000000B2424 register_console+0x320 586 stack:0x000000000041FE10 0xC0000000003A5328 register_early_udbg_console+0x98 587 stack:0x000000000041FE80 0xC0000000003A4F14 setup_arch+0x68 588 stack:0x000000000041FEF0 0xC0000000003A0880 start_kernel+0x74 589 stack:0x000000000041FF90 0xC00000000000AC60 start_here_common+0x1c 590 591- mambo: Add addr2func for symbol resolution 592 593 If you supply a VMLINUX_MAP/SKIBOOT_MAP/USER_MAP addr2func can guess 594 at your symbol name. i.e. :: 595 596 systemsim % p pc 597 0xC0000000002A68F8 598 systemsim % addr2func [p pc] 599 fdt_offset_ptr+0x78 600 601- lpc-port80h: Don't write port 80h when running under Simics 602 603 Simics doesn't model LPC port 80h. Writing to it terminates the 604 simulation due to an invalid LPC memory access. This patch adds a 605 check to ensure port 80h isn't accessed if we are running under 606 Simics. 607- device-tree: speed up fdt building on slow simulators 608 609 Trade size for speed and avoid de-duplicating strings in the fdt. 610 This costs about 2kB in fdt size, and saves about 8 million instructions 611 (almost half of all instructions) booting skiboot in mambo. 612- fast-reboot:: skip read-only memory checksum for slow simulators 613 614 Skip the fast reboot checksum, which costs about 4 million cycles 615 booting skiboot in mambo. 616- nx: remove check on the "qemu, powernv" property 617 618 commit 95f7b3b9698b ("nx: Don't abort on missing NX when using a QEMU 619 machine") introduced a check on the property "qemu,powernv" to skip NX 620 initialization when running under a QEMU machine. 621 622 The QEMU platforms now expose a QUIRK_NO_RNG in the chip. Testing the 623 "qemu,powernv" property is not necessary anymore. 624- plat/qemu: add a POWER8 and POWER9 platform 625 626 These new QEMU platforms have characteristics closer to real OpenPOWER 627 systems that we use today and define a different BMC depending on the 628 CPU type. New platform properties are introduced for each, 629 "qemu,powernv8", "qemu,powernv9" and these should be compatible with 630 existing QEMUs which only expose the "qemu,powernv" property 631- libc/string: speed up common string functions 632 633 Use compiler builtins for the string functions, and compile the 634 libc/string/ directory with -O2. 635 636 This reduces instructions booting skiboot in mambo by 2.9 million in 637 slow-sim mode, or 3.8 in normal mode, for less than 1kB image size 638 increase. 639 640 This can result in the compiler warning more cases of string function 641 problems. 642- external/mambo: Add an option to exit Mambo when the system is shutdown 643 644 Automatically exiting can be convenient for scripting. Will also exit 645 due to a HW crash (eg. unhandled exception). 646 647VESNIN platform 648--------------- 649 650Since skiboot v6.3: 651 652- platforms/vesnin: PCI inventory via IPMI OEM 653 654 Replace raw protocol with OEM message supported by OpenBMC's IPMI 655 plugins. 656 657 BMC-side implementation (IPMI plug-in): 658 https://github.com/YADRO-KNS/phosphor-pci-inventory 659 660Utilities 661--------- 662 663Since skiboot v6.3: 664 665- opal-gard: Account for ECC size when clearing partition 666 667 When 'opal-gard clear all' is run, it works by erasing the GUARD then 668 using blockevel_smart_write() to write nothing to the partition. This 669 second write call is needed because we rely on libflash to set the ECC 670 bits appropriately when the partition contained ECCed data. 671 672 The API for this is a little odd with the caller specifying how much 673 actual data to write, and libflash writing size + size/8 bytes 674 since there is one additional ECC byte for every eight bytes of data. 675 676 We currently do not account for the extra space consumed by the ECC data 677 in reset_partition() which is used to handle the 'clear all' command. 678 Which results in the paritition following the GUARD partition being 679 partially overwritten when the command is used. This patch fixes the 680 problem by reducing the length we would normally write by the number 681 of ECC bytes required. 682 683 684Build and debugging 685------------------- 686 687Since skiboot v6.3: 688 689- Disable -Waddress-of-packed-member for GCC9 690 691 We throw a bunch of errors in errorlog code otherwise, which we should 692 fix, but we don't *have* to yet. 693 694- Fix a lot of sparse warnings 695- With new GCC comes larger GCOV binaries 696 697 So we need to change our heap size to make more room for data/bss 698 without having to change where the console is or have more fun moving 699 things about. 700- Intentionally discard fini_array sections 701 702 Produced in a SKIBOOT_GCOV=1 build, and never called by skiboot. 703- external/trace: Add follow option to dump_trace 704 705 When monitoring traces, an option like the tail command's '-f' (follow) 706 is very useful. This option continues to append to the output as more 707 data arrives. Add an '-f' option to allow dump_trace to operate 708 similarly. 709 710 Tail also provides a '-s' (sleep time) option that 711 accompanies '-f'. This controls how often new input will be polled. Add 712 a '-s' option that will make dump_trace sleep for N milliseconds before 713 checking for new input. 714- external/trace: Add support for dumping multiple buffers 715 716 dump_trace only can dump one trace buffer at a time. It would be handy 717 to be able to dump multiple buffers and to see the entries from these 718 buffers displayed in correct timestamp order. Each trace buffer is 719 already sorted by timestamp so use a heap to implement an efficient 720 k-way merge. Use the CCAN heap to implement this sort. However the CCAN 721 heap does not have a 'heap_replace' operation. We need to 'heap_pop' 722 then 'heap_push' to replace the root which means rebalancing twice 723 instead of once. 724- external/trace: mmap trace buffers in dump_trace 725 726 The current lseek/read approach used in dump_trace does not correctly 727 handle certain aspects of the buffers. It does not use the start and end 728 position that is part of the buffer so it will not begin from the 729 correct location. It does not move back to the beginning of the trace 730 buffer file as the buffer wraps around. It also does not handle the 731 overflow case of the writer overwriting when the reader is up to. 732 733 Mmap the trace buffer file so that the existing reading functions in 734 extra/trace.c can be used. These functions already handle the cases of 735 wrapping and overflow. This reduces code duplication and uses functions 736 that are already unit tested. However this requires a kernel where the 737 trace buffer sysfs nodes are able to be mmaped (see 738 https://patchwork.ozlabs.org/patch/1056786/) 739- core/trace: Export trace buffers to sysfs 740 741 Every property in the device-tree under /ibm,opal/firmware/exports has a 742 sysfs node created in /firmware/opal/exports. Add properties with the 743 physical address and size for each trace buffer so they are exported. 744- core/trace: Add pir number to debug_descriptor 745 746 The names given to the trace buffers when exported to sysfs should show 747 what cpu they are associated with to make it easier to understand there 748 output. The debug_descriptor currently stores the address and length of 749 each trace buffer and this is used for adding properties to the device 750 tree. Extend debug_descriptor to include a cpu associated with each 751 trace. This will be used for creating properties in the device-tree 752 under /ibm,opal/firmware/exports/. 753- core/trace: Change trace buffer size 754 755 We want to be able to mmap the trace buffers to be used by the 756 dump_trace tool. As mmaping is done in terms of pages it makes sense 757 that the size of the trace buffers should be page aligned. This is 758 slightly complicated by the space taken up by the header at the 759 beginning of the trace and the room left for an extra trace entry at the 760 end of the buffer. Change the size of the buffer itself so that the 761 entire trace buffer size will be page aligned. 762- core/trace: Change buffer alignment from 4K to 64K 763 764 We want to be able to mmap the trace buffers to be used by the 765 dump_trace tool. This means that the trace bufferes must be page 766 aligned. Currently they are aligned to 4K. Most power systems have a 767 64K page size. On systems with a 4K page size, 64K aligned will still be 768 page aligned. Change the allocation of the trace buffers to be 64K 769 aligned. 770 771 The trace_info struct that contains the trace buffer is actually what is 772 allocated aligned memory. This means the trace buffer itself is not 773 actually aligned and this is the address that is currently exposed 774 through sysfs. To get around this change the address that is exposed to 775 sysfs to be the trace_info struct. This means the lock in trace_info is 776 now visible too. 777- external/trace: Use correct width integer byte swapping 778 779 The trace_repeat struct uses be16 for storing the number of repeats. 780 Currently be32_to_cpu conversion is used to display this member. This 781 produces an incorrect value. Use be16_to_cpu instead. 782- core/trace: Put boot_tracebuf in correct location. 783 784 A position for the boot_tracebuf is allocated in skiboot.lds.S. 785 However, without a __section attribute the boot trace buffer is not 786 placed in the correct location, meaning that it also will not be 787 correctly aligned. Add the __section attribute to ensure it will be 788 placed in its allocated position. 789- core/lock: Add debug options to store backtrace of where lock was taken 790 791 Contrary to popular belief, skiboot developers are imperfect and 792 occasionally write locking bugs. When we exit skiboot, we check if we're 793 still holding any locks, and if so, we print an error with a list of the 794 locks currently held and the locations where they were taken. 795 796 However, this only tells us the location where lock() was called, which may 797 not be enough to work out what's going on. To give us more to go on with, 798 we can store backtrace data in the lock and print that out when we 799 unexpectedly still hold locks. 800 801 Because the backtrace data is rather big, we only enable this if 802 DEBUG_LOCKS_BACKTRACE is defined, which in turn is switched on when 803 DEBUG=1. 804 805 (We disable DEBUG_LOCKS_BACKTRACE in some of the memory allocation tests 806 because the locks used by the memory allocator take up too much room in the 807 fake skiboot heap.) 808- libfdt: upgrade to upstream dtc.git 243176c 809 810 Upgrade libfdt/ to github.com/dgibson/dtc.git 243176c ("Fix bogus 811 error on rebuild") 812 813 This copies dtc/libfdt/ to skiboot/libfdt/, with the only change in 814 that directory being the addition of README.skiboot and Makefile.inc. 815 816 This adds about 14kB text, 2.5kB compressed xz. This could be reduced 817 or mostly eliminated by cutting out fdt version checks and unused 818 code, but tracking upstream is a bigger benefit at the moment. 819 820 This loses commits: 821 822 - 14ed2b842f61 ("libfdt: add basic sanity check to fdt_open_into") 823 - bc7bb3d12bc1 ("sparse: fix declaration of fdt_strerror") 824 825 As well as some prehistoric similar kinds of things, which is the 826 punishment for us not being good downstream citizens and sending 827 things upstream! Syncing to upstream will make that effort simpler 828 in future. 829 830General Fixes 831------------- 832 833Since skiboot v6.4-rc1: 834 835- libflash: Fix broken continuations 836 837 Some of the libflash debug messages don't print a newlines at the end of 838 the line and assume that the next print will be contigious with the 839 last. This isn't true in skiboot since log messages are prefixed with a 840 timestamp. This results in funny looking output such as: :: 841 842 LIBFLASH: Verifying... 843 LIBFLASH: reading page 0x01963000..0x01964000...[3.084846885,7] same ! 844 LIBFLASH: reading page 0x01964000..0x01965000...[3.086164489,7] same ! 845 846 Fix this by moving the "same !" debug message to a new line with the 847 prefix "LIBFLASH: ..." to indicate it's a continuation of the last 848 statement. 849 850 First reported in https://github.com/open-power/skiboot/issues/51 851