1.. _skiboot-5.4.6: 2 3============= 4skiboot-5.4.6 5============= 6 7skiboot-5.4.6 was released on Wednesday June 14th, 2017. It replaces 8:ref:`skiboot-5.4.5` as the current stable release in the 5.4.x series. 9 10Over :ref:`skiboot-5.4.5`, we have a small number of bug fixes for 11FSP based platforms: 12 13- FSP/CONSOLE: Workaround for unresponsive ipmi daemon 14 15 In some corner cases, where FSP is active but not responding to 16 console MBOX message (due to buggy IPMI) and we have heavy console 17 write happening from kernel, then eventually our console buffer 18 becomes full. At this point OPAL starts sending OPAL_BUSY_EVENT to 19 kernel. Kernel will keep on retrying. This is creating kernel soft 20 lockups. In some extreme case when every CPU is trying to write to 21 console, user will not be able to ssh and thinks system is hang. 22 23 If we reset FSP or restart IPMI daemon on FSP, system recovers and 24 everything becomes normal. 25 26 This patch adds workaround to above issue by returning OPAL_HARDWARE 27 when cosole is full. Side effect of this patch is, we may endup dropping 28 latest console data. But better to drop console data than system hang. 29 30 Alternative approach is to drop old data from console buffer, make space 31 for new data. But in normal condition only FSP can update 'next_out' 32 pointer and if we touch that pointer, it may introduce some other 33 race conditions. Hence we decided to just new console write request. 34 35- FSP: Set status field in response message for timed out message 36 37 For timed out FSP messages, we set message status as "fsp_msg_timeout". 38 But most FSP driver users (like surviellance) are ignoring this field. 39 They always look for FSP returned status value in callback function 40 (second byte in word1). So we endup treating timed out message as success 41 response from FSP. 42 43 Sample output: :: 44 45 [69902.432509048,7] SURV: Sending the heartbeat command to FSP 46 [70023.226860117,4] FSP: Response from FSP timed out, word0 = d66a00d7, word1 = 0 state: 3 47 .... 48 [70023.226901445,7] SURV: Received heartbeat acknowledge from FSP 49 [70023.226903251,3] FSP: fsp_trigger_reset() entry 50 51 Here SURV code thought it got valid response from FSP. But actually we didn't 52 receive response from FSP. 53 54- FSP: Improve timeout message 55 56 Presently we print word0 and word1 in error log. word0 contains 57 sequence number and command class. One has to understand word0 58 format to identify command class. 59 60 Lets explicitly print command class, sub command etc. 61 62- FSP/RTC: Remove local fsp_in_reset variable 63 64 Now that we are using fsp_in_rr() to detect FSP reset/reload, fsp_in_reset 65 become redundant. Lets remove this local variable. 66 67- FSP/RTC: Fix possible FSP R/R issue in rtc write path 68 69 fsp_opal_rtc_write() checks FSP status before queueing message to FSP. But if 70 FSP R/R starts before getting response to queued message then we will continue 71 to return OPAL_BUSY_EVENT to host. In some extreme condition host may 72 experience hang. Once FSP is back we will repost message, get response from FSP 73 and return OPAL_SUCCESS to host. 74 75 This patch caches new values and returns OPAL_SUCCESS if FSP R/R is happening. 76 And once FSP is back we will send cached value to FSP. 77 78- hw/fsp/rtc: read/write cached rtc tod on fsp hir. 79 80 Currently fsp-rtc reads/writes the cached RTC TOD on an fsp 81 reset. Use latest fsp_in_rr() function to properly read the cached rtc 82 value when fsp reset initiated by the hir. 83 84 Below is the kernel trace when we set hw clock, when hir process starts. :: 85 86 [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688] 87 [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc 88 [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu 89 [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000 90 [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70 91 [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901 Not tainted (4.10.0-14-generic) 92 [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> 93 [ 1727.775889] CR: 28024442 XER: 20000000 94 [ 1727.775890] CFAR: c00000000008472c SOFTE: 1 95 GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4 96 GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000 97 GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003 98 GPR12: c0000000000846e8 c00000000fba0100 99 [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 100 [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48 101 [ 1727.775899] Call Trace: 102 [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable) 103 [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0 104 [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630 105 [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0 106 [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0 107 [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0 108 [ 1727.775908] Instruction dump: 109 [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020 110 [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4 111 112 This is found when executing the `op-test-framework fspresetReload testcase <https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py>`_ 113 114 With this fix ran fsp hir torture testcase in the above test 115 which is working fine. 116 117- FSP/CHIPTOD: Return false in error path 118