1---
2layout: page
3title: fi_poll(3)
4tagline: Libfabric Programmer's Manual
5---
6{% include JB/setup %}
7
8# NAME
9
10fi_poll \- Polling and wait set operations
11
12fi_poll_open / fi_close
13: Open/close a polling set
14
15fi_poll_add / fi_poll_del
16: Add/remove a completion queue or counter to/from a poll set.
17
18fi_poll
19: Poll for progress and events across multiple completion queues
20  and counters.
21
22fi_wait_open / fi_close
23: Open/close a wait set
24
25fi_wait
26: Waits for one or more wait objects in a set to be signaled.
27
28fi_trywait
29: Indicate when it is safe to block on wait objects using native OS calls.
30
31fi_control
32: Control wait set operation or attributes.
33
34# SYNOPSIS
35
36```c
37#include <rdma/fi_domain.h>
38
39int fi_poll_open(struct fid_domain *domain, struct fi_poll_attr *attr,
40    struct fid_poll **pollset);
41
42int fi_close(struct fid *pollset);
43
44int fi_poll_add(struct fid_poll *pollset, struct fid *event_fid,
45    uint64_t flags);
46
47int fi_poll_del(struct fid_poll *pollset, struct fid *event_fid,
48    uint64_t flags);
49
50int fi_poll(struct fid_poll *pollset, void **context, int count);
51
52int fi_wait_open(struct fid_fabric *fabric, struct fi_wait_attr *attr,
53    struct fid_wait **waitset);
54
55int fi_close(struct fid *waitset);
56
57int fi_wait(struct fid_wait *waitset, int timeout);
58
59int fi_trywait(struct fid_fabric *fabric, struct fid **fids, size_t count);
60
61int fi_control(struct fid *waitset, int command, void *arg);
62```
63
64# ARGUMENTS
65
66*fabric*
67: Fabric provider
68
69*domain*
70: Resource domain
71
72*pollset*
73: Event poll set
74
75*waitset*
76: Wait object set
77
78*attr*
79: Poll or wait set attributes
80
81*context*
82: On success, an array of user context values associated with
83  completion queues or counters.
84
85*fids*
86: An array of fabric descriptors, each one associated with a native
87  wait object.
88
89*count*
90: Number of entries in context or fids array.
91
92*timeout*
93: Time to wait for a signal, in milliseconds.
94
95*command*
96: Command of control operation to perform on the wait set.
97
98*arg*
99: Optional control argument.
100
101# DESCRIPTION
102
103
104## fi_poll_open
105
106fi_poll_open creates a new polling set.  A poll set enables an
107optimized method for progressing asynchronous operations across
108multiple completion queues and counters and checking for their completions.
109
110A poll set is defined with the following attributes.
111
112```c
113struct fi_poll_attr {
114	uint64_t             flags;     /* operation flags */
115};
116```
117
118*flags*
119: Flags that set the default operation of the poll set.  The use of
120  this field is reserved and must be set to 0 by the caller.
121
122## fi_close
123
124The fi_close call releases all resources associated with a poll set.
125The poll set must not be associated with any other resources prior to
126being closed, otherwise the call will return -FI_EBUSY.
127
128## fi_poll_add
129
130Associates a completion queue or counter with a poll set.
131
132## fi_poll_del
133
134Removes a completion queue or counter from a poll set.
135
136## fi_poll
137
138Progresses all completion queues and counters associated with a poll set
139and checks for events.  If events might have occurred, contexts associated
140with the completion queues and/or counters are returned.  Completion
141queues will return their context if they are not empty.  The context
142associated with a counter will be returned if the counter's success
143value or error value have changed since the last time fi_poll, fi_cntr_set,
144or fi_cntr_add were called.  The number of contexts is limited to the
145size of the context array, indicated by the count parameter.
146
147Note that fi_poll only indicates that events might be available.  In some
148cases, providers may consume such events internally, to drive progress, for
149example.  This can result in fi_poll returning false positives.  Applications
150should drive their progress based on the results of reading events from a
151completion queue or reading counter values.  The fi_poll function will always
152return all completion queues and counters that do have new events.
153
154## fi_wait_open
155
156fi_wait_open allocates a new wait set.  A wait set enables an
157optimized method of waiting for events across multiple completion queues
158and counters.  Where possible, a wait set uses a single underlying
159wait object that is signaled when a specified condition occurs on an
160associated completion queue or counter.
161
162The properties and behavior of a wait set are defined by struct
163fi_wait_attr.
164
165```c
166struct fi_wait_attr {
167	enum fi_wait_obj     wait_obj;  /* requested wait object */
168	uint64_t             flags;     /* operation flags */
169};
170```
171
172*wait_obj*
173: Wait sets are associated with specific wait object(s).  Wait objects
174  allow applications to block until the wait object is signaled,
175  indicating that an event is available to be read.  The following
176  values may be used to specify the type of wait object associated
177  with a wait set: FI_WAIT_UNSPEC, FI_WAIT_FD, FI_WAIT_MUTEX_COND,
178  and FI_WAIT_YIELD.
179
180- *FI_WAIT_UNSPEC*
181: Specifies that the user will only wait on the wait set using
182  fabric interface calls, such as fi_wait.  In this case, the
183  underlying provider may select the most appropriate or highest
184  performing wait object available, including custom wait mechanisms.
185  Applications that select FI_WAIT_UNSPEC are not guaranteed to
186  retrieve the underlying wait object.
187
188- *FI_WAIT_FD*
189: Indicates that the wait set should use a single file descriptor as
190  its wait mechanism, as exposed to the application.  Internally, this
191  may require the use of epoll in order to support waiting on a single
192  file descriptor.  File descriptor wait objects must be usable in the
193  POSIX select(2) and poll(2), and Linux epoll(7) routines (if
194  available).  Provider signal an FD wait object by marking it as
195  readable or with an error.
196
197- *FI_WAIT_MUTEX_COND*
198: Specifies that the wait set should use a pthread mutex and cond
199  variable as a wait object.
200
201- *FI_WAIT_POLLFD*
202: This option is similar to FI_WAIT_FD, but allows the wait mechanism to use
203  multiple file descriptors as its wait mechanism, as viewed by the
204  application.  The use of FI_WAIT_POLLFD can eliminate the need to use
205  epoll to abstract away needing to check multiple file descriptors when
206  waiting for events.  The file descriptors must be usable in the POSIX
207  select(2) and poll(2) routines, and match directly to being used with
208  poll.  See the NOTES section below for details on using pollfd.
209
210- *FI_WAIT_YIELD*
211: Indicates that the wait set will wait without a wait object but instead
212  yield on every wait.
213
214*flags*
215: Flags that set the default operation of the wait set.  The use of
216  this field is reserved and must be set to 0 by the caller.
217
218## fi_close
219
220The fi_close call releases all resources associated with a wait set.
221The wait set must not be bound to any other opened resources prior to
222being closed, otherwise the call will return -FI_EBUSY.
223
224## fi_wait
225
226Waits on a wait set until one or more of its underlying wait objects
227is signaled.
228
229## fi_trywait
230
231The fi_trywait call was introduced in libfabric version 1.3.  The behavior
232of using native wait objects without the use of fi_trywait is provider
233specific and should be considered non-deterministic.
234
235The fi_trywait() call is used in conjunction with native operating
236system calls to block on wait objects, such as file descriptors.  The
237application must call fi_trywait and obtain a return value of
238FI_SUCCESS prior to blocking on a native wait object.  Failure to
239do so may result in the wait object not being signaled, and the
240application not observing the desired events.  The following
241pseudo-code demonstrates the use of fi_trywait in conjunction with
242the OS select(2) call.
243
244```c
245fi_control(&cq->fid, FI_GETWAIT, (void *) &fd);
246FD_ZERO(&fds);
247FD_SET(fd, &fds);
248
249while (1) {
250	if (fi_trywait(&cq, 1) == FI_SUCCESS)
251		select(fd + 1, &fds, NULL, &fds, &timeout);
252
253	do {
254		ret = fi_cq_read(cq, &comp, 1);
255	} while (ret > 0);
256}
257```
258
259fi_trywait() will return FI_SUCCESS if it is safe to block on the wait object(s)
260corresponding to the fabric descriptor(s), or -FI_EAGAIN if there are
261events queued on the fabric descriptor or if blocking could hang the
262application.
263
264The call takes an array of fabric descriptors.  For each wait object
265that will be passed to the native wait routine, the corresponding
266fabric descriptor should first be passed to fi_trywait.  All fabric
267descriptors passed into a single fi_trywait call must make use of the
268same underlying wait object type.
269
270The following types of fabric descriptors may be passed into fi_trywait:
271event queues, completion queues, counters, and wait sets.  Applications
272that wish to use native wait calls should select specific wait objects
273when allocating such resources.  For example, by setting the item's
274creation attribute wait_obj value to FI_WAIT_FD.
275
276In the case the wait object to check belongs to a wait set, only
277the wait set itself needs to be passed into fi_trywait.  The fabric
278resources associated with the wait set do not.
279
280On receiving a return value of -FI_EAGAIN from fi_trywait, an application
281should read all queued completions and events, and call fi_trywait again
282before attempting to block.  Applications can make use of a fabric
283poll set to identify completion queues and counters that may require
284processing.
285
286## fi_control
287
288The fi_control call is used to access provider or implementation specific
289details of a fids that support blocking calls, such as wait sets, completion
290queues, counters, and event queues.  Access to the wait set or fid should be
291serialized across all calls when fi_control is invoked, as it may redirect
292the implementation of wait set operations. The following control commands
293are usable with a wait set or fid.
294
295*FI_GETWAIT (void \*\*)*
296: This command allows the user to retrieve the low-level wait object
297  associated with a wait set or fid. The format of the wait set is specified
298  during wait set creation, through the wait set attributes. The fi_control
299  arg parameter should be an address where a pointer to the returned wait
300  object will be written. This should be an 'int *' for FI_WAIT_FD,
301  'struct fi_mutex_cond' for FI_WAIT_MUTEX_COND, or 'struct fi_wait_pollfd'
302  for FI_WAIT_POLLFD. Support for FI_GETWAIT is provider specific.
303
304*FI_GETWAITOBJ (enum fi_wait_obj \*)*
305: This command returns the type of wait object associated with a wait set
306  or fid.
307
308# RETURN VALUES
309
310Returns FI_SUCCESS on success.  On error, a negative value corresponding to
311fabric errno is returned.
312
313Fabric errno values are defined in
314`rdma/fi_errno.h`.
315
316fi_poll
317: On success, if events are available, returns the number of entries
318  written to the context array.
319
320# NOTES
321
322In many situations, blocking calls may need to wait on signals sent
323to a number of file descriptors.  For example, this is the case for
324socket based providers, such as tcp and udp, as well as utility providers
325such as multi-rail.  For simplicity, when epoll is available, it can
326be used to limit the number of file descriptors that an application
327must monitor.  The use of epoll may also be required in order
328to support FI_WAIT_FD.
329
330However, in order to support waiting on multiple file descriptors on systems
331where epoll support is not available, or where epoll performance may
332negatively impact performance, FI_WAIT_POLLFD provides this mechanism.
333A significant different between using POLLFD versus FD wait objects
334is that with FI_WAIT_POLLFD, the file descriptors may change dynamically.
335As an example, the file descriptors associated with a completion queues'
336wait set may change as endpoint associations with the CQ are added and
337removed.
338
339Struct fi_wait_pollfd is used to retrieve all file descriptors for fids
340using FI_WAIT_POLLFD to support blocking calls.
341
342```c
343struct fi_wait_pollfd {
344    uint64_t      change_index;
345    size_t        nfds;
346    struct pollfd *fd;
347};
348```
349
350*change_index*
351: The change_index may be used to determine if there have been any changes
352  to the file descriptor list.  Anytime a file descriptor is added, removed,
353  or its events are updated, this field is incremented by the provider.
354  Applications wishing to wait on file descriptors directly should cache
355  the change_index value.  Before blocking on file descriptor events, the
356  app should use fi_control() to retrieve the current change_index and
357  compare that against its cached value.  If the values differ, then the
358  app should update its file descriptor list prior to blocking.
359
360*nfds*
361: On input to fi_control(), this indicates the number of entries in the
362  struct pollfd * array.  On output, this will be set to the number of
363  entries needed to store the current number of file descriptors.  If
364  the input value is smaller than the output value, fi_control() will
365  return the error -FI_ETOOSMALL.  Note that setting nfds = 0 allows
366  an efficient way of checking the change_index.
367
368*fd*
369: This points to an array of struct pollfd entries.  The number of entries
370  is specified through the nfds field.  If the number of needed entries
371  is less than or equal to the number of entries available, the struct
372  pollfd array will be filled out with a list of file descriptors and
373  corresponding events that can be used in the select(2) and poll(2)
374  calls.
375
376The change_index is updated only when the file descriptors associated with
377the pollfd file set has changed.  Checking the change_index is an additional
378step needed when working with FI_WAIT_POLLFD wait objects directly.  The use
379of the fi_trywait() function is still required if accessing wait objects
380directly.
381
382# SEE ALSO
383
384[`fi_getinfo`(3)](fi_getinfo.3.html),
385[`fi_domain`(3)](fi_domain.3.html),
386[`fi_cntr`(3)](fi_cntr.3.html),
387[`fi_eq`(3)](fi_eq.3.html)
388