1---
2layout: page
3title: fi_rxm(7)
4tagline: Libfabric Programmer's Manual
5---
6{% include JB/setup %}
7
8# NAME
9
10fi_rxm \- The RxM (RDM over MSG) Utility Provider
11
12# OVERVIEW
13
14The RxM provider (ofi_rxm) is an utility provider that supports FI_EP_RDM type
15endpoint emulated over FI_EP_MSG type endpoint(s) of an underlying core provider.
16FI_EP_RDM endpoints have a reliable datagram interface and RxM emulates this by
17hiding the connection management of underlying FI_EP_MSG endpoints from the user.
18Additionally, RxM can hide memory registration requirement from a core provider
19like verbs if the apps don't support it.
20
21# REQUIREMENTS
22
23## Requirements for core provider
24
25RxM provider requires the core provider to support the following features:
26
27  * MSG endpoints (FI_EP_MSG)
28
29  * RMA read/write (FI_RMA) - Used for implementing rendezvous protocol for
30    large messages.
31
32  * FI_OPT_CM_DATA_SIZE of at least 24 bytes.
33
34## Requirements for applications
35
36Since RxM emulates RDM endpoints by hiding connection management and connections
37are established only on-demand (when app tries to send data), the first several
38data transfer calls would return EAGAIN. Applications should be aware of this and
39retry until the operation succeeds.
40
41If an application has chosen manual progress for data progress, it should also
42read the CQ so that the connection establishment progresses. Not doing so would
43result in a stall. See also the ERRORS section in fi_msg(3).
44
45# SUPPORTED FEATURES
46
47The RxM provider currently supports *FI_MSG*, *FI_TAGGED*, *FI_RMA* and *FI_ATOMIC* capabilities.
48
49*Endpoint types*
50: The provider supports only *FI_EP_RDM*.
51
52*Endpoint capabilities*
53: The following data transfer interface is supported: *FI_MSG*, *FI_TAGGED*, *FI_RMA*, *FI_ATOMIC*.
54
55*Progress*
56: The RxM provider supports both *FI_PROGRESS_MANUAL* and *FI_PROGRESS_AUTO*.
57  Manual progress in general has better connection scale-up and lower CPU utilization
58  since there's no separate auto-progress thread.
59
60*Addressing Formats*
61: FI_SOCKADDR, FI_SOCKADDR_IN
62
63*Memory Region*
64: FI_MR_VIRT_ADDR, FI_MR_ALLOCATED, FI_MR_PROV_KEY MR mode bits would be
65  required from the app in case the core provider requires it.
66
67# LIMITATIONS
68
69When using RxM provider, some limitations from the underlying MSG provider could also show
70up. Please refer to the corresponding MSG provider man pages to find about those limitations.
71
72## Unsupported features
73
74RxM provider does not support the following features:
75
76  * op_flags: FI_FENCE.
77
78  * Scalable endpoints
79
80  * Shared contexts
81
82  * FABRIC_DIRECT
83
84  * FI_MR_SCALABLE
85
86  * Authorization keys
87
88  * Application error data buffers
89
90  * Multicast
91
92  * FI_SYNC_ERR
93
94  * Reporting unknown source addr data as part of completions
95
96  * Triggered operations
97
98## Progress limitations
99
100When sending large messages, an app doing an sread or waiting on the CQ file descriptor
101may not get a completion when reading the CQ after being woken up from the wait.
102The app has to do sread or wait on the file descriptor again. This is needed
103because RxM uses a rendezvous protocol for large message sends. An app would get
104woken up from waiting on CQ fd when rendezvous protocol request completes but it
105would have to wait again to get an ACK from the receiver indicating completion of
106large message transfer by remote RMA read.
107
108## FI_ATOMIC limitations
109
110The FI_ATOMIC capability will only be listed in the fi_info if the fi_info
111hints parameter specifies FI_ATOMIC. If FI_ATOMIC is requested, message order
112FI_ORDER_RAR, FI_ORDER_RAW, FI_ORDER_WAR, FI_ORDER_WAW, FI_ORDER_SAR, and
113FI_ORDER_SAW can not be supported.
114
115## Miscellaneous limitations
116 * RxM protocol peers should have same endian-ness otherwise connections won't
117   successfully complete. This enables better performance at run-time as byte
118   order translations are avoided.
119
120# RUNTIME PARAMETERS
121
122The ofi_rxm provider checks for the following environment variables.
123
124*FI_OFI_RXM_BUFFER_SIZE*
125: Defines the transmit buffer size / inject size. Messages of size less than this
126  would be transmitted via an eager protocol and those above would be transmitted
127  via a rendezvous or SAR (Segmentation And Reassembly) protocol. Transmit data
128  would be copied up to this size (default: ~16k).
129
130*FI_OFI_RXM_COMP_PER_PROGRESS*
131: Defines the maximum number of MSG provider CQ entries (default: 1) that would
132  be read per progress (RxM CQ read).
133
134*FI_OFI_RXM_SAR_LIMIT*
135: Set this environment variable to control the RxM SAR (Segmentation And Reassembly)
136  protocol. Messages of size greater than this (default: 128 Kb) would be transmitted
137  via rendezvous protocol.
138
139*FI_OFI_RXM_USE_SRX*
140: Set this to 1 to use shared receive context from MSG provider. This reduces
141  overall memory usage but there may be a slight increase in latency (default: 0).
142
143*FI_OFI_RXM_TX_SIZE*
144: Defines default TX context size (default: 1024)
145
146*FI_OFI_RXM_RX_SIZE*
147: Defines default RX context size (default: 1024)
148
149*FI_OFI_RXM_MSG_TX_SIZE*
150: Defines FI_EP_MSG TX size that would be requested (default: 128).
151
152*FI_OFI_RXM_MSG_RX_SIZE*
153: Defines FI_EP_MSG RX size that would be requested (default: 128).
154
155*FI_UNIVERSE_SIZE*
156: Defines the expected number of ranks / peers an endpoint would communicate
157with (default: 256).
158
159*FI_OFI_RXM_CM_PROGRESS_INTERVAL*
160: Defines the duration of time in microseconds between calls to RxM CM progression
161  functions when using manual progress. Higher values may provide less noise for
162  calls to fi_cq read functions, but may increase connection setup time (default: 10000)
163
164*FI_OFI_RXM_CQ_EQ_FAIRNESS*
165: Defines the maximum number of message provider CQ entries that can be
166  consecutively read across progress calls without checking to see if the
167  CM progress interval has been reached (default: 128)
168
169# Tuning
170
171## Bandwidth
172
173To optimize for bandwidth, ensure you use higher values than default for
174FI_OFI_RXM_TX_SIZE, FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE, FI_OFI_RXM_MSG_RX_SIZE
175subject to memory limits of the system and the tx and rx sizes supported by the
176MSG provider.
177
178FI_OFI_RXM_SAR_LIMIT is another knob that can be experimented with to optimze for
179bandwidth.
180
181## Memory
182
183To conserve memory, ensure FI_UNIVERSE_SIZE set to what is required. Similarly
184check that FI_OFI_RXM_TX_SIZE, FI_OFI_RXM_RX_SIZE, FI_OFI_RXM_MSG_TX_SIZE and
185FI_OFI_RXM_MSG_RX_SIZE env variables are set to only required values.
186
187# NOTES
188
189The data transfer API may return -FI_EAGAIN during on-demand connection setup
190of the core provider FI_MSG_EP. See [`fi_msg`(3)](fi_msg.3.html) for a detailed
191description of handling FI_EAGAIN.
192
193# Troubleshooting / Known issues
194
195If an RxM endpoint is expected to communicate with more peers than the default
196value of FI_UNIVERSE_SIZE (256) CQ overruns can happen. To avoid this set a
197higher value for FI_UNIVERSE_SIZE. CQ overrun can make a MSG endpoint unusable.
198
199At higher # of ranks, there may be connection errors due to a node running out
200of memory. The workaround is to use shared receive contexts for the MSG provider
201(FI_OFI_RXM_USE_SRX=1) or reduce eager message size (FI_OFI_RXM_BUFFER_SIZE) and
202MSG provider TX/RX queue sizes (FI_OFI_RXM_MSG_TX_SIZE / FI_OFI_RXM_MSG_RX_SIZE).
203
204# SEE ALSO
205
206[`fabric`(7)](fabric.7.html),
207[`fi_provider`(7)](fi_provider.7.html),
208[`fi_getinfo`(3)](fi_getinfo.3.html)
209