1--- 2layout: page 3title: fi_rxm(7) 4tagline: Libfabric Programmer's Manual 5--- 6{% include JB/setup %} 7 8# NAME 9 10fi_rxm \- The RxM (RDM over MSG) Utility Provider 11 12# OVERVIEW 13 14The RxM provider (ofi_rxm) is an utility provider that supports FI_EP_RDM type 15endpoint emulated over FI_EP_MSG type endpoint(s) of an underlying core provider. 16FI_EP_RDM endpoints have a reliable datagram interface and RxM emulates this by 17hiding the connection management of underlying FI_EP_MSG endpoints from the user. 18Additionally, RxM can hide memory registration requirement from a core provider 19like verbs if the apps don't support it. 20 21# REQUIREMENTS 22 23## Requirements for core provider 24 25RxM provider requires the core provider to support the following features: 26 27 * MSG endpoints (FI_EP_MSG) 28 29 * RMA read/write (FI_RMA) - Used for implementing rendezvous protocol for 30 large messages. 31 32 * FI_OPT_CM_DATA_SIZE of at least 24 bytes. 33 34## Requirements for applications 35 36Since RxM emulates RDM endpoints by hiding connection management and connections 37are established only on-demand (when app tries to send data), the first several 38data transfer calls would return EAGAIN. Applications should be aware of this and 39retry until the operation succeeds. 40 41If an application has chosen manual progress for data progress, it should also 42read the CQ so that the connection establishment progresses. Not doing so would 43result in a stall. See also the ERRORS section in fi_msg(3). 44 45# SUPPORTED FEATURES 46 47The RxM provider currently supports *FI_MSG*, *FI_TAGGED*, *FI_RMA* and *FI_ATOMIC* capabilities. 48 49*Endpoint types* 50: The provider supports only *FI_EP_RDM*. 51 52*Endpoint capabilities* 53: The following data transfer interface is supported: *FI_MSG*, *FI_TAGGED*, *FI_RMA*, *FI_ATOMIC*. 54 55*Progress* 56: The RxM provider supports both *FI_PROGRESS_MANUAL* and *FI_PROGRESS_AUTO*. 57 Manual progress in general has better connection scale-up and lower CPU utilization 58 since there's no separate auto-progress thread. 59 60*Addressing Formats* 61: FI_SOCKADDR, FI_SOCKADDR_IN 62 63*Memory Region* 64: FI_MR_VIRT_ADDR, FI_MR_ALLOCATED, FI_MR_PROV_KEY MR mode bits would be 65 required from the app in case the core provider requires it. 66 67# LIMITATIONS 68 69When using RxM provider, some limitations from the underlying MSG provider could also show 70up. Please refer to the corresponding MSG provider man pages to find about those limitations. 71 72## Unsupported features 73 74RxM provider does not support the following features: 75 76 * op_flags: FI_FENCE. 77 78 * Scalable endpoints 79 80 * Shared contexts 81 82 * FABRIC_DIRECT 83 84 * FI_MR_SCALABLE 85 86 * Authorization keys 87 88 * Application error data buffers 89 90 * Multicast 91 92 * FI_SYNC_ERR 93 94 * Reporting unknown source addr data as part of completions 95 96 * Triggered operations 97 98## Progress limitations 99 100When sending large messages, an app doing an sread or waiting on the CQ file descriptor 101may not get a completion when reading the CQ after being woken up from the wait. 102The app has to do sread or wait on the file descriptor again. This is needed 103because RxM uses a rendezvous protocol for large message sends. An app would get 104woken up from waiting on CQ fd when rendezvous protocol request completes but it 105would have to wait again to get an ACK from the receiver indicating completion of 106large message transfer by remote RMA read. 107 108## FI_ATOMIC limitations 109 110The FI_ATOMIC capability will only be listed in the fi_info if the fi_info 111hints parameter specifies FI_ATOMIC. If FI_ATOMIC is requested, message order 112FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_WAR, FI_ORDER_WAW, FI_ORDER_SAR, and 113FI_ORDER_SAW can not be supported. 114 115## Miscellaneous limitations 116 * RxM protocol peers should have same endian-ness otherwise connections won't 117 successfully complete. This enables better performance at run-time as byte 118 order translations are avoided. 119 120# RUNTIME PARAMETERS 121 122The ofi_rxm provider checks for the following environment variables. 123 124*FI_OFI_RXM_BUFFER_SIZE* 125: Defines the transmit buffer size / inject size. Messages of size less than this 126 would be transmitted via an eager protocol and those above would be transmitted 127 via a rendezvous or SAR (Segmentation And Reassembly) protocol. Transmit data 128 would be copied up to this size (default: ~16k). 129 130*FI_OFI_RXM_COMP_PER_PROGRESS* 131: Defines the maximum number of MSG provider CQ entries (default: 1) that would 132 be read per progress (RxM CQ read). 133 134*FI_OFI_RXM_SAR_LIMIT* 135: Set this environment variable to control the RxM SAR (Segmentation And Reassembly) 136 protocol. Messages of size greater than this (default: 128 Kb) would be transmitted 137 via rendezvous protocol. 138 139*FI_OFI_RXM_USE_SRX* 140: Set this to 1 to use shared receive context from MSG provider. This reduces 141 overall memory usage but there may be a slight increase in latency (default: 0). 142 143*FI_OFI_RXM_TX_SIZE* 144: Defines default TX context size (default: 1024) 145 146*FI_OFI_RXM_RX_SIZE* 147: Defines default RX context size (default: 1024) 148 149*FI_OFI_RXM_MSG_TX_SIZE* 150: Defines FI_EP_MSG TX size that would be requested (default: 128). 151 152*FI_OFI_RXM_MSG_RX_SIZE* 153: Defines FI_EP_MSG RX size that would be requested (default: 128). 154 155*FI_UNIVERSE_SIZE* 156: Defines the expected number of ranks / peers an endpoint would communicate 157with (default: 256). 158 159*FI_OFI_RXM_CM_PROGRESS_INTERVAL* 160: Defines the duration of time in microseconds between calls to RxM CM progression 161 functions when using manual progress. Higher values may provide less noise for 162 calls to fi_cq read functions, but may increase connection setup time (default: 10000) 163 164*FI_OFI_RXM_CQ_EQ_FAIRNESS* 165: Defines the maximum number of message provider CQ entries that can be 166 consecutively read across progress calls without checking to see if the 167 CM progress interval has been reached (default: 128) 168 169# Tuning 170 171## Bandwidth 172 173To optimize for bandwidth, ensure you use higher values than default for 174FI_OFI_RXM_TX_SIZE, FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE, FI_OFI_RXM_MSG_RX_SIZE 175subject to memory limits of the system and the tx and rx sizes supported by the 176MSG provider. 177 178FI_OFI_RXM_SAR_LIMIT is another knob that can be experimented with to optimze for 179bandwidth. 180 181## Memory 182 183To conserve memory, ensure FI_UNIVERSE_SIZE set to what is required. Similarly 184check that FI_OFI_RXM_TX_SIZE, FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE and 185FI_OFI_RXM_MSG_RX_SIZE env variables are set to only required values. 186 187# NOTES 188 189The data transfer API may return -FI_EAGAIN during on-demand connection setup 190of the core provider FI_MSG_EP. See [`fi_msg`(3)](fi_msg.3.html) for a detailed 191description of handling FI_EAGAIN. 192 193# Troubleshooting / Known issues 194 195If an RxM endpoint is expected to communicate with more peers than the default 196value of FI_UNIVERSE_SIZE (256) CQ overruns can happen. To avoid this set a 197higher value for FI_UNIVERSE_SIZE. CQ overrun can make a MSG endpoint unusable. 198 199At higher # of ranks, there may be connection errors due to a node running out 200of memory. The workaround is to use shared receive contexts for the MSG provider 201(FI_OFI_RXM_USE_SRX=1) or reduce eager message size (FI_OFI_RXM_BUFFER_SIZE) and 202MSG provider TX/RX queue sizes (FI_OFI_RXM_MSG_TX_SIZE / FI_OFI_RXM_MSG_RX_SIZE). 203 204# SEE ALSO 205 206[`fabric`(7)](fabric.7.html), 207[`fi_provider`(7)](fi_provider.7.html), 208[`fi_getinfo`(3)](fi_getinfo.3.html) 209