1.. _skiboot-6.4-rc1: 2 3skiboot-6.4-rc1 4=============== 5 6skiboot v6.4-rc1 was released on Monday July 8th 2019. It is the first 7release candidate of skiboot 6.4, which will become the new stable release 8of skiboot following the 6.3 release, first released May 3rd 2019. 9 10Skiboot 6.4 will mark the basis for op-build v2.4. I expect this to be a 11relatively short -rc cycle. 12 13skiboot v6.4-rc1 contains all bug fixes as of :ref:`skiboot-6.0.20`, 14and :ref:`skiboot-6.3.2` (the currently maintained 15stable releases). 16 17For how the skiboot stable releases work, see :ref:`stable-rules` for details. 18 19Over skiboot 6.3, we have the following changes: 20 21.. _skiboot-6.4-rc1-new-features: 22 23New features 24------------ 25 26- platforms/nicole: Add new platform 27 28 The platform is a new platform from YADRO, it's a storage controller for 29 TATLIN server. It's Based on IBM Romulus reference design (POWER9). 30 31- platform/zz: Add new platform type 32 33 We have new platform type under ZZ. Lets add them. With this fix 34- nvram: Flag dangerous NVRAM options 35 36 Most nvram options used by skiboot are just for debug or testing for 37 regressions. They should never be used long term. 38 39 We've hit a number of issues in testing and the field where nvram 40 options have been set "temporarily" but haven't been properly cleared 41 after, resulting in crashes or real bugs being masked. 42 43 This patch marks most nvram options used by skiboot as dangerous and 44 prints a chicken to remind users of the problem. 45 46- hw/phb3: Add verbose EEH output 47 48 Add support for the pci-eeh-verbose NVRAM flag on PHB3. We've had this 49 on PHB4 since forever and it has proven very useful when debugging EEH 50 issues. When testing changes to the Linux kernel's EEH implementation 51 it's fairly common for the kernel to crash before printing the EEH log 52 so it's helpful to have it in the OPAL log where it can be dumped from 53 XMON. 54 55 Note that unlike PHB4 we do not enable verbose mode by default. The 56 nvram option must be used to explicitly enable it. 57 58- Experimental support for building without FSP code 59 60 Now, with CONFIG_FSP=0/1 we have: 61 62 - 1.6M/1.4M skiboot.lid 63 - 323K/375K skiboot.lid.xz 64 65- doc: travis-ci deploy docs! 66 67 Documentation is now automatically deployed if you configure Travis CI 68 appropriately (we have done this for the open-power branch of skiboot) 69 70- Big OPAL API Documentation improvement 71 72 A lot more OPAL API calls are now (at least somewhat) documented. 73- opal/hmi: Report NPU2 checkstop reason 74 75 The NPU2 is currently not passing any information to linux to explain 76 the cause of an HMI. NPU2 has three Fault Isolation Registers and over 77 30 of those FIR bits are configured to raise an HMI by default. We 78 won't be able to fit all possible state in the 32-bit xstop_reason 79 field of the HMI event, but we can still try to encode up to 4 HMI 80 reasons. 81- opal-msg: Enhance opal-get-msg API 82 83 Linux uses :ref:`OPAL_GET_MSG` API to get OPAL messages. This interface 84 supports upto 8 params (64 bytes). We have a requirement to send bigger data to 85 Linux. This patch enhances OPAL to send bigger data to Linux. 86 87 - Linux will use "opal-msg-size" device tree property to allocate memory for 88 OPAL messages (previous patch increased "opal-msg-size" to 64K). 89 - Replaced `reserved` field in "struct opal_msg" with `size`. So that Linux 90 side opal_get_msg user can detect actual data size. 91 - If buffer size < actual message size, then opal_get_msg will copy partial 92 data and return OPAL_PARTIAL to Linux. 93 - Add new variable "extended" to "opal_msg_entry" structure to keep track 94 of messages that has more than 64byte data. We will allocate separate 95 memory for these messages and once kernel consumes message we will 96 release that memory. 97- core/opal: Increase opal-msg-size size 98 99 Kernel will use `opal-msg-size` property to allocate memory for opal_msg. 100 We want to send bigger data from OPAL to kernel. Hence increase 101 opal-msg-size to 64K. 102- hw/npu2-opencapi: Add initial support for allocating OpenCAPI LPC memory 103 104 Lowest Point of Coherency (LPC) memory allows the host to access memory on 105 an OpenCAPI device. 106 107 Define 2 OPAL calls, :ref:`OPAL_NPU_MEM_ALLOC` and :ref:`OPAL_NPU_MEM_RELEASE`, for 108 assigning and clearing the memory BAR. (We try to avoid using the term 109 "LPC" to avoid confusion with Low Pin Count.) 110 111 At present, we use a fixed location in the address space, which means we 112 are restricted to a single range of 4TB, on a single OpenCAPI device per 113 chip. In future, we'll use some chip ID extension magic to give us more 114 space, and some sort of allocator to assign ranges to more than one device. 115- core/fast-reboot: Add im-feeling-lucky option 116 117 Fast reboot gets disabled for a number of reasons e.g. the availability 118 of nvlink. However this doesn't actually affect the ability to perform fast 119 reboot if no nvlink device is actually present. 120 121 Add a nvram option for fast-reset where if it's set to 122 "im-feeling-lucky" then perform the fast-reboot irrespective of if it's 123 previously been disabled. 124 125- platforms/astbmc: Check for SBE validation step 126 127 On some POWER8 astbmc systems an update to the SBE requires pausing at 128 runtime to ensure integrity of the SBE. If this is required the BMC will 129 set a chassis boot option IPMI flag using the OEM parameter 0x62. If 130 Skiboot sees this flag is set it waits until the SBE update is complete 131 and the flag is cleared. 132 133 Unfortunately the mystery operation that validates the SBE also leaves 134 it in a bad state and unable to be used for timer operations. To 135 workaround this the flag is checked as soon as possible (ie. when IPMI 136 and the console are set up), and once complete the system is rebooted. 137- Add P9 DIO interrupt support 138 139 On P9 there are GPIO port 0, 1, 2 for GPIO interrupt, and DIO interrupt 140 is used to handle the interrupts. 141 142 Add support to the DIO interrupts: 143 144 1. Add dio_interrupt_register(chip, port, callback) to register the 145 interrupt 146 2. Add dio_interrupt_deregister(chip, port, callback) to deregister; 147 3. When interrupt on the port occurs, callback is invoked, and the 148 interrupt status is cleared. 149 150 151Removed features 152---------------- 153 154- pci/iov: Remove skiboot VF tracking 155 156 This feature was added a few years ago in response to a request to make 157 the MaxPayloadSize (MPS) field of a Virtual Function match the MPS of the 158 Physical Function that hosts it. 159 160 The SR-IOV specification states the the MPS field of the VF is "ResvP". 161 This indicates the VF will use whatever MPS is configured on the PF and 162 that the field should be treated as a reserved field in the config space 163 of the VF. In other words, a SR-IOV spec compliant VF should always return 164 zero in the MPS field. Adding hacks in OPAL to make it non-zero is... 165 misguided at best. 166 167 Additionally, there is a bug in the way pci_device structures are handled 168 by VFs that results in a crash on fast-reboot that occurs if VFs are 169 enabled and then disabled prior to rebooting. This patch fixes the bug by 170 removing the code entirely. This patch has no impact on SR-IOV support on 171 the host operating system. 172- Remove POWER7 and POWER7+ support 173 174 It's been a good long while since either OPAL POWER7 user touched a 175 machine, and even longer since they'd have been okay using an old 176 version rather than tracking master. 177 178 There's also been no testing of OPAL on POWER7 systems for an awfully 179 long time, so it's pretty safe to assume that it's very much bitrotted. 180 181 It also saves a whole 14kb of xz compressed payload space. 182- Remove remnants of :ref:`OPAL_PCI_GET_PHB_DIAG_DATA` 183 184 Never present in a public OPAL release, and only kernels prior to 3.11 185 would ever attempt to call it. 186- Remove unused :ref:`OPAL_GET_XIVE_SOURCE` 187 188 While this call was technically implemented by skiboot, no code has ever called 189 it, and it was only ever implemented for the p7ioc-phb back-end (i.e. POWER7). 190 Since this call was unused in Linux, and that POWER7 with OPAL was only ever 191 available internally, so it should be safe to remove the call. 192- Remove unused :ref:`OPAL_PCI_GET_XIVE_REISSUE` and :ref:`OPAL_PCI_SET_XIVE_REISSUE` 193 194 These seem to be remnants of one of the OPAL incarnations prior to 195 OPALv3. These calls have never been implemented in skiboot, and never 196 used by an upstream kernel (nor a PowerKVM kernel). 197 198 It's rather safe to just document them as never existing. 199- Remove never implemented :ref:`OPAL_PCI_SET_PHB_TABLE_MEMORY` and document why 200 201 Not ever used by upstream linux or PowerKVM tree. Never implemented in 202 skiboot (not even in ancient internal only tree). 203 204 So, it's incredibly safe to remove. 205- Remove unused :ref:`OPAL_PCI_EEH_FREEZE_STATUS2` 206 207 This call was introduced all the way back at the end of 2012, before 208 OPAL was public. The #define for the OPAL call was introduced to the 209 Linux kernel in June 2013, and the call was never used in any kernel 210 tree ever (as far as we can find). 211 212 Thus, it's quite safe to remove this completely unused and completely 213 untested OPAL call. 214- Document the long removed :ref:`OPAL_REGISTER_OPAL_EXCEPTION_HANDLER` call 215 216 I'm pretty sure this was removed in one of our first ever service packs. 217 218 Fixes: https://github.com/open-power/skiboot/issues/98 219- Remove last remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY` and :ref:`OPAL_PCI_SET_HUB_TCE_MEMORY` 220 221 Since we have not supported p5ioc systems since skiboot 5.2, it's pretty 222 safe to just wholesale remove these OPAL calls now. 223- Remove remnants of :ref:`OPAL_PCI_SET_PHB_TCE_MEMORY` 224 225 There's no reason we need remnants hanging around that aren't used, so 226 remove them and save a handful of bytes at runtime. 227 228 Simultaneously, document the OPAL call removal. 229 230 231Secure and Trusted Boot 232----------------------- 233 234- trustedboot: Change PCR and event_type for the skiboot events 235 236 The existing skiboot events are being logged as EV_ACTION, however, the 237 TCG PC Client spec says that EV_ACTION events should have one of the 238 pre-defined strings in the event field recorded in the event log. For 239 instance: 240 241 - "Calling Ready to Boot", 242 - "Entering ROM Based Setup", 243 - "User Password Entered", and 244 - "Start Option ROM Scan. 245 246 None of the EV_ACTION pre-defined strings are applicable to the existing 247 skiboot events. Based on recent discussions with other POWER teams, this 248 patch proposes a convention on what PCR and event types should be used 249 for skiboot events. This also changes the skiboot source code to follow 250 the convention. 251 252 The TCG PC Client spec defines several event types, other than 253 EV_ACTION. However, many of them are specific to UEFI events and some 254 others are related to platform or CRTM events, which is more applicable 255 to hostboot events. 256 257 Currently, most of the hostboot events are extended to PCR[0,1] and 258 logged as either EV_PLATFORM_CONFIG_FLAGS, EV_S_CRTM_CONTENTS or 259 EV_POST_CODE. The "Node Id" and "PAYLOAD" events, though, are extended 260 to PCR[4,5,6] and logged as EV_COMPACT_HASH. 261 262 For the lack of an event type that fits the specific purpose, 263 EV_COMPACT_HASH seems to be the most adequate one due to its 264 flexibility. According to the TCG PC Client spec: 265 266 - May be used for any PCR except 0, 1, 2 and 3. 267 - The event field may be informative or may be hashed to generate the 268 digest field, depending on the component recording the event. 269 270 Additionally, the PCR[4,5] seem to be the most adequate PCRs. They would 271 be used for skiboot and some skiroot events. According to the TCG PC 272 Client, PCR[4] is intended to represent the entity that manages the 273 transition between the pre-OS and OS-present state of the platform. 274 PCR[4], along with PCR[5], identifies the initial OS loader. 275 276 In summary, for skiboot events: 277 278 - Events that represents data should be extended to PCR 4. 279 - Events that represents config should be extended to PCR 5. 280 - For the lack of an event type that fits the specific purpose, 281 both data and config events should be logged as EV_COMPACT_HASH. 282 283Sensors 284------- 285 286- occ-sensors: Check if OCC is reset while reading inband sensors 287 288 OCC may not be able to mark the sensor buffer as invalid while going 289 down RESET. If OCC never comes back we will continue to read the stale 290 sensor data. So verify if OCC is reset while reading the sensor values 291 and propagate the appropriate error. 292 293IPMI 294---- 295 296- ipmi: ensure forward progress on ipmi_queue_msg_sync() 297 298 BT responses are handled using a timer doing the polling. To hope to 299 get an answer to an IPMI synchronous message, the timer needs to run. 300 301 We can't just check all timers though as there may be a timer that 302 wants a lock that's held by a code path calling ipmi_queue_msg_sync(), 303 and if we did enforce that as a requirement, it's a pretty subtle 304 API that is asking to be broken. 305 306 So, if we just run a poll function to crank anything that the IPMI 307 backend needs, then we should be fine. 308 309 This issue shows up very quickly under QEMU when loading the first 310 flash resource with the IPMI HIOMAP backend. 311 312NPU2 313---- 314 315- npu2: Increase timeout for L2/L3 cache purging 316 317 On NVLink2 bridge reset, we purge all L2/L3 caches in the system. 318 This is an asynchronous operation, we have a 2ms timeout here. There are 319 reports that this is not enough and "PURGE L3 on core xxx timed out" 320 messages appear (for the reference: on the test setup this takes 321 280us..780us). 322 323 This defines the timeout as a macro and changes this from 2ms to 20ms. 324 325 This adds a tracepoint to tell how long it took to purge all the caches. 326- npu2: Purge cache when resetting a GPU 327 328 After putting all a GPU's links in reset, do a cache purge in case we 329 have CPU cache lines belonging to the now-unaccessible GPU memory. 330- npu2-opencapi: Mask 2 XSL errors 331 332 Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some 333 opencapi FIRs") converted some FIR bits default action from system 334 checkstop to raising an error interrupt. For 2 XSL error events that 335 can be triggered by a misbehaving AFU, the error interrupt is raised 336 twice, once for each link (the XSL logic in the NPU is shared between 337 2 links). So a badly behaving AFU could impact another, unsuspecting 338 opencapi adapter. 339 340 It doesn't look good and it turns out we can do better. We can mask 341 those 2 XSL errors. The error will also be picked up by the OTL logic, 342 which is per link. So we'll still get an error interrupt, but only on 343 the relevant link, and the other opencapi adapter can stay functional. 344- npu2: Clear fence state for a brick being reset 345 346 Resetting a GPU before resetting an NVLink leads to occasional HMIs 347 which fence some bricks and prevent the "reset_ntl" procedure from 348 succeeding at the "reset_ntl_release" step - the host system requires 349 reboot; there may be other cases like this as well. 350 351 This adds clearing of the fence bit in NPU.MISC.FENCE_STATE for 352 the NVLink which we are about to reset. 353- npu2: Fix clearing the FIR bits 354 355 FIR registers are SCOM-only so they cannot be accesses with the indirect 356 write, and yet we use SCOM-based addresses for these; fix this. 357 358- npu2: Reset NVLinks when resetting a GPU 359 360 Resetting a V100 GPU brings its NVLinks down and if an NPU tries using 361 those, an HMI occurs. We were lucky not to observe this as the bare metal 362 does not normally reset a GPU and when passed through, GPUs are usually 363 before NPUs in QEMU command line or Libvirt XML and because of that NPUs 364 are naturally reset first. However simple change of the device order 365 brings HMIs. 366 367 This defines a bus control filter for a PCI slot with a GPU with NVLinks 368 so when the host system issues secondary bus reset to the slot, it resets 369 associated NVLinks. 370- npu2: Reset PID wildcard and refcounter when mapped to LPID 371 372 Since 105d80f85b "npu2: Use unfiltered mode in XTS tables" we do not 373 register every PID in the XTS table so the table has one entry per LPID. 374 Then we added a reference counter to keep track of the entry use when 375 switching GPU between the host and guest systems (the "Fixes:" tag below). 376 377 The POWERNV platform setup creates such entries and references them 378 at the boot time when initializing IOMMUs and only removes it when 379 a GPU is passed through to a guest. This creates a problem as POWERNV 380 boots via kexec and no defererencing happens; the XTS table state remains 381 undefined. So when the host kernel boots, skiboot thinks there are valid 382 XTS entries and does not update the XTS table which breaks ATS. 383 384 This adds the reference counter and the XTS entry reset when a GPU is 385 assigned to LPID and we cannot rely on the kernel to clean that up. 386 387PHB4 388---- 389- hw/phb4: Make phb4_training_trace() more general 390 391 phb4_training_trace() is used to monitor the Link Training Status 392 State Machine (LTSSM) of the PHB's data link layer. Currently it is only 393 used to observe the LTSSM while bringing up the link, but sometimes it's 394 useful to see what's occurring in other situations (e.g. link disable, or 395 secondary bus reset). This patch renames it to phb4_link_trace() and 396 allows the target LTSSM state and a flexible timeout to help in these 397 situations. 398- hw/phb4: Make pci-tracing print at PR_NOTICE 399 400 When pci-tracing is enabled we print each trace status message and the 401 final trace status at PR_ERROR. The final status messages are similar to 402 those printed when we fail to train in the non-pci-tracing path and this 403 has resulted in spurious op-test failures. 404 405 This patch reduces the log-level of the tracing message to PR_NOTICE so 406 they're not accidently interpreted as actual error messages. PR_NOTICE 407 messages are still printed to the console during boot. 408- hw/phb4: Use read/write_reg in assert_perst 409 410 While the PHB is fenced we can't use the MMIO interface to access PHB 411 registers. While processing a complete reset we inject a PHB fence to 412 isolate the PHB from the rest of the system because the PHB won't 413 respond to MMIOs from the rest of the system while being reset. 414 415 We assert PERST after the fence has been erected which requires us to 416 use the XSCOM indirect interface to access the PHB registers rather than 417 the MMIO interface. Previously we did that when asserting PERST in the 418 CRESET path. However in b8b4c79d4419 ("hw/phb4: Factor out PERST 419 control"). This was re-written to use the raw in_be64() accessor. This 420 means that CRESET would not be asserted in the reset path. On some 421 Mellanox cards this would prevent them from re-loading their firmware 422 when the system was fast-reset. 423 424 This patch fixes the problem by replacing the raw {in|out}_be64() 425 accessors with the phb4_{read|write}_reg() functions. 426 427- hw/phb4: Assert Link Disable bit after ETU init 428 429 The cursed RAID card in ozrom1 has a bug where it ignores PERST being 430 asserted. The PCIe Base spec is a little vague about what happens 431 while PERST is asserted, but it does clearly specify that when 432 PERST is de-asserted the Link Training and Status State Machine 433 (LTSSM) of a device should return to the initial state (Detect) 434 defined in the spec and the link training process should restart. 435 436 This bug was worked around in 9078f8268922 ("phb4: Delay training till 437 after PERST is deasserted") by setting the link disable bit at the 438 start of the FRESET process and clearing it after PERST was 439 de-asserted. Although this fixed the bug, the patch offered no 440 explaination of why the fix worked. 441 442 In b8b4c79d4419 ("hw/phb4: Factor out PERST control") the link disable 443 workaround was moved into phb4_assert_perst(). This is called 444 always in the CRESET case, but a following patch resulted in 445 assert_perst() not being called if phb4_freset() was entered following a 446 CRESET since p->skip_perst was set in the CRESET handler. This is bad 447 since a side-effect of the CRESET is that the Link Disable bit is 448 cleared. 449 450 This, combined with the RAID card ignoring PERST results in the PCIe 451 link being trained by the PHB while we're waiting out the 100ms 452 ETU reset time. If we hack skiboot to print a DLP trace after returning 453 from phb4_hw_init() we get: :: 454 455 PHB#0001[0:1]: Initialization complete 456 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling 457 PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect 458 PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling 459 PHB#0001[0:1]: TRACE:0x0000183101000000 29ms training GEN1:x16:config 460 PHB#0001[0:1]: TRACE:0x00001c5881000000 30ms training GEN1:x08:recovery 461 PHB#0001[0:1]: TRACE:0x00001c5883000000 30ms training GEN3:x08:recovery 462 PHB#0001[0:1]: TRACE:0x0000144883000000 33ms presence GEN3:x08:L0 463 PHB#0001[0:1]: TRACE:0x0000154883000000 33ms trained GEN3:x08:L0 464 PHB#0001[0:1]: CRESET: wait_time = 100 465 PHB#0001[0:1]: FRESET: Starts 466 PHB#0001[0:1]: FRESET: Prepare for link down 467 PHB#0001[0:1]: FRESET: Assert skipped 468 PHB#0001[0:1]: FRESET: Deassert 469 PHB#0001[0:1]: TRACE:0x0000154883000000 0ms trained GEN3:x08:L0 470 PHB#0001[0:1]: TRACE: Reached target state 471 PHB#0001[0:1]: LINK: Start polling 472 PHB#0001[0:1]: LINK: Electrical link detected 473 PHB#0001[0:1]: LINK: Link is up 474 PHB#0001[0:1]: LINK: Went down waiting for stabilty 475 PHB#0001[0:1]: LINK: DLP train control: 0x0000105101000000 476 PHB#0001[0:1]: CRESET: Starts 477 478 What has happened here is that the link is trained to 8x Gen3 33ms after 479 we return from phb4_init_hw(), and before we've waitined to 100ms 480 that we normally wait after re-initialising the ETU. When we "deassert" 481 PERST later on in the FRESET handler the link in L0 (normal) state. At 482 this point we try to read from the Vendor/Device ID register to verify 483 that the link is stable and immediately get a PHB fence due to a PCIe 484 Completion Timeout. Skiboot attempts to recover by doing another CRESET, 485 but this will encounter the same issue. 486 487 This patch fixes the problem by setting the Link Disable bit (by calling 488 phb4_assert_perst()) immediately after we return from phb4_init_hw(). 489 This prevents the link from being trained while PERST is asserted which 490 seems to avoid the Completion Timeout. With the patch applied we get: :: 491 492 PHB#0001[0:1]: Initialization complete 493 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling 494 PHB#0001[0:1]: TRACE:0x0000001101000000 23ms GEN1:x16:detect 495 PHB#0001[0:1]: TRACE:0x0000102101000000 23ms presence GEN1:x16:polling 496 PHB#0001[0:1]: TRACE:0x0000909101000000 29ms presence GEN1:x16:disabled 497 PHB#0001[0:1]: CRESET: wait_time = 100 498 PHB#0001[0:1]: FRESET: Starts 499 PHB#0001[0:1]: FRESET: Prepare for link down 500 PHB#0001[0:1]: FRESET: Assert skipped 501 PHB#0001[0:1]: FRESET: Deassert 502 PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect 503 PHB#0001[0:1]: TRACE:0x0000102101000000 0ms presence GEN1:x16:polling 504 PHB#0001[0:1]: TRACE:0x0000001101000000 24ms GEN1:x16:detect 505 PHB#0001[0:1]: TRACE:0x0000102101000000 36ms presence GEN1:x16:polling 506 PHB#0001[0:1]: TRACE:0x0000183101000000 97ms training GEN1:x16:config 507 PHB#0001[0:1]: TRACE:0x00001c5881000000 97ms training GEN1:x08:recovery 508 PHB#0001[0:1]: TRACE:0x00001c5883000000 97ms training GEN3:x08:recovery 509 PHB#0001[0:1]: TRACE:0x0000144883000000 99ms presence GEN3:x08:L0 510 PHB#0001[0:1]: TRACE: Reached target state 511 PHB#0001[0:1]: LINK: Start polling 512 PHB#0001[0:1]: LINK: Electrical link detected 513 PHB#0001[0:1]: LINK: Link is up 514 PHB#0001[0:1]: LINK: Link is stable 515 PHB#0001[0:1]: LINK: Card [9005:028c] Optimal Retry:disabled 516 PHB#0001[0:1]: LINK: Speed Train:GEN3 PHB:GEN4 DEV:GEN3 517 PHB#0001[0:1]: LINK: Width Train:x08 PHB:x08 DEV:x08 518 PHB#0001[0:1]: LINK: RX Errors Now:0 Max:8 Lane:0x0000 519 520 521Simulators 522---------- 523 524- external/mambo: Bump default POWER9 to Nimbus DD2.3 525- external/mambo: fix tcl startup code for mambo bogus net (repost) 526 527 This fixes a couple issues with external/mambo/skiboot.tcl so I can use the 528 mambo bogus net. 529 530 * newer distros (ubuntu 18.04) allow tap device to have a user specified 531 name instead of just tapN so we need to pass in a name not a number. 532 * need some kind of default for net_mac, and need the mconfig for it 533 to be set from an env var. 534- skiboot.tcl: Add option to wait for GDB server connection 535 536 Add an environment variable which makes Mambo wait for a connection 537 from gdb prior to starting simulation. 538- mambo: Integrate addr2line into backtrace command 539 540 Gives nice output like this: :: 541 542 systemsim % bt 543 pc: 0xC0000000002BF3D4 _savegpr0_28+0x0 544 lr: 0xC00000000004E0F4 opal_call+0x10 545 stack:0x000000000041FAE0 0xC00000000004F054 opal_check_token+0x20 546 stack:0x000000000041FB50 0xC0000000000500CC __opal_flush_console+0x88 547 stack:0x000000000041FBD0 0xC000000000050BF8 opal_flush_console+0x24 548 stack:0x000000000041FC00 0xC0000000001F9510 udbg_opal_putc+0x88 549 stack:0x000000000041FC40 0xC000000000020E78 udbg_write+0x7c 550 stack:0x000000000041FC80 0xC0000000000B1C44 console_unlock+0x47c 551 stack:0x000000000041FD80 0xC0000000000B2424 register_console+0x320 552 stack:0x000000000041FE10 0xC0000000003A5328 register_early_udbg_console+0x98 553 stack:0x000000000041FE80 0xC0000000003A4F14 setup_arch+0x68 554 stack:0x000000000041FEF0 0xC0000000003A0880 start_kernel+0x74 555 stack:0x000000000041FF90 0xC00000000000AC60 start_here_common+0x1c 556 557- mambo: Add addr2func for symbol resolution 558 559 If you supply a VMLINUX_MAP/SKIBOOT_MAP/USER_MAP addr2func can guess 560 at your symbol name. i.e. :: 561 562 systemsim % p pc 563 0xC0000000002A68F8 564 systemsim % addr2func [p pc] 565 fdt_offset_ptr+0x78 566 567- lpc-port80h: Don't write port 80h when running under Simics 568 569 Simics doesn't model LPC port 80h. Writing to it terminates the 570 simulation due to an invalid LPC memory access. This patch adds a 571 check to ensure port 80h isn't accessed if we are running under 572 Simics. 573- device-tree: speed up fdt building on slow simulators 574 575 Trade size for speed and avoid de-duplicating strings in the fdt. 576 This costs about 2kB in fdt size, and saves about 8 million instructions 577 (almost half of all instructions) booting skiboot in mambo. 578- fast-reboot:: skip read-only memory checksum for slow simulators 579 580 Skip the fast reboot checksum, which costs about 4 million cycles 581 booting skiboot in mambo. 582- nx: remove check on the "qemu, powernv" property 583 584 commit 95f7b3b9698b ("nx: Don't abort on missing NX when using a QEMU 585 machine") introduced a check on the property "qemu,powernv" to skip NX 586 initialization when running under a QEMU machine. 587 588 The QEMU platforms now expose a QUIRK_NO_RNG in the chip. Testing the 589 "qemu,powernv" property is not necessary anymore. 590- plat/qemu: add a POWER8 and POWER9 platform 591 592 These new QEMU platforms have characteristics closer to real OpenPOWER 593 systems that we use today and define a different BMC depending on the 594 CPU type. New platform properties are introduced for each, 595 "qemu,powernv8", "qemu,powernv9" and these should be compatible with 596 existing QEMUs which only expose the "qemu,powernv" property 597- libc/string: speed up common string functions 598 599 Use compiler builtins for the string functions, and compile the 600 libc/string/ directory with -O2. 601 602 This reduces instructions booting skiboot in mambo by 2.9 million in 603 slow-sim mode, or 3.8 in normal mode, for less than 1kB image size 604 increase. 605 606 This can result in the compiler warning more cases of string function 607 problems. 608- external/mambo: Add an option to exit Mambo when the system is shutdown 609 610 Automatically exiting can be convenient for scripting. Will also exit 611 due to a HW crash (eg. unhandled exception). 612 613VESNIN platform 614--------------- 615 616- platforms/vesnin: PCI inventory via IPMI OEM 617 618 Replace raw protocol with OEM message supported by OpenBMC's IPMI 619 plugins. 620 621 BMC-side implementation (IPMI plug-in): 622 https://github.com/YADRO-KNS/phosphor-pci-inventory 623 624Utilities 625--------- 626 627- opal-gard: Account for ECC size when clearing partition 628 629 When 'opal-gard clear all' is run, it works by erasing the GUARD then 630 using blockevel_smart_write() to write nothing to the partition. This 631 second write call is needed because we rely on libflash to set the ECC 632 bits appropriately when the partition contained ECCed data. 633 634 The API for this is a little odd with the caller specifying how much 635 actual data to write, and libflash writing size + size/8 bytes 636 since there is one additional ECC byte for every eight bytes of data. 637 638 We currently do not account for the extra space consumed by the ECC data 639 in reset_partition() which is used to handle the 'clear all' command. 640 Which results in the paritition following the GUARD partition being 641 partially overwritten when the command is used. This patch fixes the 642 problem by reducing the length we would normally write by the number 643 of ECC bytes required. 644 645 646Build and debugging 647------------------- 648 649- Disable -Waddress-of-packed-member for GCC9 650 651 We throw a bunch of errors in errorlog code otherwise, which we should 652 fix, but we don't *have* to yet. 653 654- Fix a lot of sparse warnings 655- With new GCC comes larger GCOV binaries 656 657 So we need to change our heap size to make more room for data/bss 658 without having to change where the console is or have more fun moving 659 things about. 660- Intentionally discard fini_array sections 661 662 Produced in a SKIBOOT_GCOV=1 build, and never called by skiboot. 663- external/trace: Add follow option to dump_trace 664 665 When monitoring traces, an option like the tail command's '-f' (follow) 666 is very useful. This option continues to append to the output as more 667 data arrives. Add an '-f' option to allow dump_trace to operate 668 similarly. 669 670 Tail also provides a '-s' (sleep time) option that 671 accompanies '-f'. This controls how often new input will be polled. Add 672 a '-s' option that will make dump_trace sleep for N milliseconds before 673 checking for new input. 674- external/trace: Add support for dumping multiple buffers 675 676 dump_trace only can dump one trace buffer at a time. It would be handy 677 to be able to dump multiple buffers and to see the entries from these 678 buffers displayed in correct timestamp order. Each trace buffer is 679 already sorted by timestamp so use a heap to implement an efficient 680 k-way merge. Use the CCAN heap to implement this sort. However the CCAN 681 heap does not have a 'heap_replace' operation. We need to 'heap_pop' 682 then 'heap_push' to replace the root which means rebalancing twice 683 instead of once. 684- external/trace: mmap trace buffers in dump_trace 685 686 The current lseek/read approach used in dump_trace does not correctly 687 handle certain aspects of the buffers. It does not use the start and end 688 position that is part of the buffer so it will not begin from the 689 correct location. It does not move back to the beginning of the trace 690 buffer file as the buffer wraps around. It also does not handle the 691 overflow case of the writer overwriting when the reader is up to. 692 693 Mmap the trace buffer file so that the existing reading functions in 694 extra/trace.c can be used. These functions already handle the cases of 695 wrapping and overflow. This reduces code duplication and uses functions 696 that are already unit tested. However this requires a kernel where the 697 trace buffer sysfs nodes are able to be mmaped (see 698 https://patchwork.ozlabs.org/patch/1056786/) 699- core/trace: Export trace buffers to sysfs 700 701 Every property in the device-tree under /ibm,opal/firmware/exports has a 702 sysfs node created in /firmware/opal/exports. Add properties with the 703 physical address and size for each trace buffer so they are exported. 704- core/trace: Add pir number to debug_descriptor 705 706 The names given to the trace buffers when exported to sysfs should show 707 what cpu they are associated with to make it easier to understand there 708 output. The debug_descriptor currently stores the address and length of 709 each trace buffer and this is used for adding properties to the device 710 tree. Extend debug_descriptor to include a cpu associated with each 711 trace. This will be used for creating properties in the device-tree 712 under /ibm,opal/firmware/exports/. 713- core/trace: Change trace buffer size 714 715 We want to be able to mmap the trace buffers to be used by the 716 dump_trace tool. As mmaping is done in terms of pages it makes sense 717 that the size of the trace buffers should be page aligned. This is 718 slightly complicated by the space taken up by the header at the 719 beginning of the trace and the room left for an extra trace entry at the 720 end of the buffer. Change the size of the buffer itself so that the 721 entire trace buffer size will be page aligned. 722- core/trace: Change buffer alignment from 4K to 64K 723 724 We want to be able to mmap the trace buffers to be used by the 725 dump_trace tool. This means that the trace bufferes must be page 726 aligned. Currently they are aligned to 4K. Most power systems have a 727 64K page size. On systems with a 4K page size, 64K aligned will still be 728 page aligned. Change the allocation of the trace buffers to be 64K 729 aligned. 730 731 The trace_info struct that contains the trace buffer is actually what is 732 allocated aligned memory. This means the trace buffer itself is not 733 actually aligned and this is the address that is currently exposed 734 through sysfs. To get around this change the address that is exposed to 735 sysfs to be the trace_info struct. This means the lock in trace_info is 736 now visible too. 737- external/trace: Use correct width integer byte swapping 738 739 The trace_repeat struct uses be16 for storing the number of repeats. 740 Currently be32_to_cpu conversion is used to display this member. This 741 produces an incorrect value. Use be16_to_cpu instead. 742- core/trace: Put boot_tracebuf in correct location. 743 744 A position for the boot_tracebuf is allocated in skiboot.lds.S. 745 However, without a __section attribute the boot trace buffer is not 746 placed in the correct location, meaning that it also will not be 747 correctly aligned. Add the __section attribute to ensure it will be 748 placed in its allocated position. 749- core/lock: Add debug options to store backtrace of where lock was taken 750 751 Contrary to popular belief, skiboot developers are imperfect and 752 occasionally write locking bugs. When we exit skiboot, we check if we're 753 still holding any locks, and if so, we print an error with a list of the 754 locks currently held and the locations where they were taken. 755 756 However, this only tells us the location where lock() was called, which may 757 not be enough to work out what's going on. To give us more to go on with, 758 we can store backtrace data in the lock and print that out when we 759 unexpectedly still hold locks. 760 761 Because the backtrace data is rather big, we only enable this if 762 DEBUG_LOCKS_BACKTRACE is defined, which in turn is switched on when 763 DEBUG=1. 764 765 (We disable DEBUG_LOCKS_BACKTRACE in some of the memory allocation tests 766 because the locks used by the memory allocator take up too much room in the 767 fake skiboot heap.) 768- libfdt: upgrade to upstream dtc.git 243176c 769 770 Upgrade libfdt/ to github.com/dgibson/dtc.git 243176c ("Fix bogus 771 error on rebuild") 772 773 This copies dtc/libfdt/ to skiboot/libfdt/, with the only change in 774 that directory being the addition of README.skiboot and Makefile.inc. 775 776 This adds about 14kB text, 2.5kB compressed xz. This could be reduced 777 or mostly eliminated by cutting out fdt version checks and unused 778 code, but tracking upstream is a bigger benefit at the moment. 779 780 This loses commits: 781 782 - 14ed2b842f61 ("libfdt: add basic sanity check to fdt_open_into") 783 - bc7bb3d12bc1 ("sparse: fix declaration of fdt_strerror") 784 785 As well as some prehistoric similar kinds of things, which is the 786 punishment for us not being good downstream citizens and sending 787 things upstream! Syncing to upstream will make that effort simpler 788 in future. 789