1--- 2layout: page 3title: fi_poll(3) 4tagline: Libfabric Programmer's Manual 5--- 6{% include JB/setup %} 7 8# NAME 9 10fi_poll \- Polling and wait set operations 11 12fi_poll_open / fi_close 13: Open/close a polling set 14 15fi_poll_add / fi_poll_del 16: Add/remove a completion queue or counter to/from a poll set. 17 18fi_poll 19: Poll for progress and events across multiple completion queues 20 and counters. 21 22fi_wait_open / fi_close 23: Open/close a wait set 24 25fi_wait 26: Waits for one or more wait objects in a set to be signaled. 27 28fi_trywait 29: Indicate when it is safe to block on wait objects using native OS calls. 30 31fi_control 32: Control wait set operation or attributes. 33 34# SYNOPSIS 35 36```c 37#include <rdma/fi_domain.h> 38 39int fi_poll_open(struct fid_domain *domain, struct fi_poll_attr *attr, 40 struct fid_poll **pollset); 41 42int fi_close(struct fid *pollset); 43 44int fi_poll_add(struct fid_poll *pollset, struct fid *event_fid, 45 uint64_t flags); 46 47int fi_poll_del(struct fid_poll *pollset, struct fid *event_fid, 48 uint64_t flags); 49 50int fi_poll(struct fid_poll *pollset, void **context, int count); 51 52int fi_wait_open(struct fid_fabric *fabric, struct fi_wait_attr *attr, 53 struct fid_wait **waitset); 54 55int fi_close(struct fid *waitset); 56 57int fi_wait(struct fid_wait *waitset, int timeout); 58 59int fi_trywait(struct fid_fabric *fabric, struct fid **fids, size_t count); 60 61int fi_control(struct fid *waitset, int command, void *arg); 62``` 63 64# ARGUMENTS 65 66*fabric* 67: Fabric provider 68 69*domain* 70: Resource domain 71 72*pollset* 73: Event poll set 74 75*waitset* 76: Wait object set 77 78*attr* 79: Poll or wait set attributes 80 81*context* 82: On success, an array of user context values associated with 83 completion queues or counters. 84 85*fids* 86: An array of fabric descriptors, each one associated with a native 87 wait object. 88 89*count* 90: Number of entries in context or fids array. 91 92*timeout* 93: Time to wait for a signal, in milliseconds. 94 95*command* 96: Command of control operation to perform on the wait set. 97 98*arg* 99: Optional control argument. 100 101# DESCRIPTION 102 103 104## fi_poll_open 105 106fi_poll_open creates a new polling set. A poll set enables an 107optimized method for progressing asynchronous operations across 108multiple completion queues and counters and checking for their completions. 109 110A poll set is defined with the following attributes. 111 112```c 113struct fi_poll_attr { 114 uint64_t flags; /* operation flags */ 115}; 116``` 117 118*flags* 119: Flags that set the default operation of the poll set. The use of 120 this field is reserved and must be set to 0 by the caller. 121 122## fi_close 123 124The fi_close call releases all resources associated with a poll set. 125The poll set must not be associated with any other resources prior to 126being closed, otherwise the call will return -FI_EBUSY. 127 128## fi_poll_add 129 130Associates a completion queue or counter with a poll set. 131 132## fi_poll_del 133 134Removes a completion queue or counter from a poll set. 135 136## fi_poll 137 138Progresses all completion queues and counters associated with a poll set 139and checks for events. If events might have occurred, contexts associated 140with the completion queues and/or counters are returned. Completion 141queues will return their context if they are not empty. The context 142associated with a counter will be returned if the counter's success 143value or error value have changed since the last time fi_poll, fi_cntr_set, 144or fi_cntr_add were called. The number of contexts is limited to the 145size of the context array, indicated by the count parameter. 146 147Note that fi_poll only indicates that events might be available. In some 148cases, providers may consume such events internally, to drive progress, for 149example. This can result in fi_poll returning false positives. Applications 150should drive their progress based on the results of reading events from a 151completion queue or reading counter values. The fi_poll function will always 152return all completion queues and counters that do have new events. 153 154## fi_wait_open 155 156fi_wait_open allocates a new wait set. A wait set enables an 157optimized method of waiting for events across multiple completion queues 158and counters. Where possible, a wait set uses a single underlying 159wait object that is signaled when a specified condition occurs on an 160associated completion queue or counter. 161 162The properties and behavior of a wait set are defined by struct 163fi_wait_attr. 164 165```c 166struct fi_wait_attr { 167 enum fi_wait_obj wait_obj; /* requested wait object */ 168 uint64_t flags; /* operation flags */ 169}; 170``` 171 172*wait_obj* 173: Wait sets are associated with specific wait object(s). Wait objects 174 allow applications to block until the wait object is signaled, 175 indicating that an event is available to be read. The following 176 values may be used to specify the type of wait object associated 177 with a wait set: FI_WAIT_UNSPEC, FI_WAIT_FD, FI_WAIT_MUTEX_COND, 178 and FI_WAIT_YIELD. 179 180- *FI_WAIT_UNSPEC* 181: Specifies that the user will only wait on the wait set using 182 fabric interface calls, such as fi_wait. In this case, the 183 underlying provider may select the most appropriate or highest 184 performing wait object available, including custom wait mechanisms. 185 Applications that select FI_WAIT_UNSPEC are not guaranteed to 186 retrieve the underlying wait object. 187 188- *FI_WAIT_FD* 189: Indicates that the wait set should use a single file descriptor as 190 its wait mechanism, as exposed to the application. Internally, this 191 may require the use of epoll in order to support waiting on a single 192 file descriptor. File descriptor wait objects must be usable in the 193 POSIX select(2) and poll(2), and Linux epoll(7) routines (if 194 available). Provider signal an FD wait object by marking it as 195 readable or with an error. 196 197- *FI_WAIT_MUTEX_COND* 198: Specifies that the wait set should use a pthread mutex and cond 199 variable as a wait object. 200 201- *FI_WAIT_POLLFD* 202: This option is similar to FI_WAIT_FD, but allows the wait mechanism to use 203 multiple file descriptors as its wait mechanism, as viewed by the 204 application. The use of FI_WAIT_POLLFD can eliminate the need to use 205 epoll to abstract away needing to check multiple file descriptors when 206 waiting for events. The file descriptors must be usable in the POSIX 207 select(2) and poll(2) routines, and match directly to being used with 208 poll. See the NOTES section below for details on using pollfd. 209 210- *FI_WAIT_YIELD* 211: Indicates that the wait set will wait without a wait object but instead 212 yield on every wait. 213 214*flags* 215: Flags that set the default operation of the wait set. The use of 216 this field is reserved and must be set to 0 by the caller. 217 218## fi_close 219 220The fi_close call releases all resources associated with a wait set. 221The wait set must not be bound to any other opened resources prior to 222being closed, otherwise the call will return -FI_EBUSY. 223 224## fi_wait 225 226Waits on a wait set until one or more of its underlying wait objects 227is signaled. 228 229## fi_trywait 230 231The fi_trywait call was introduced in libfabric version 1.3. The behavior 232of using native wait objects without the use of fi_trywait is provider 233specific and should be considered non-deterministic. 234 235The fi_trywait() call is used in conjunction with native operating 236system calls to block on wait objects, such as file descriptors. The 237application must call fi_trywait and obtain a return value of 238FI_SUCCESS prior to blocking on a native wait object. Failure to 239do so may result in the wait object not being signaled, and the 240application not observing the desired events. The following 241pseudo-code demonstrates the use of fi_trywait in conjunction with 242the OS select(2) call. 243 244```c 245fi_control(&cq->fid, FI_GETWAIT, (void *) &fd); 246FD_ZERO(&fds); 247FD_SET(fd, &fds); 248 249while (1) { 250 if (fi_trywait(&cq, 1) == FI_SUCCESS) 251 select(fd + 1, &fds, NULL, &fds, &timeout); 252 253 do { 254 ret = fi_cq_read(cq, &comp, 1); 255 } while (ret > 0); 256} 257``` 258 259fi_trywait() will return FI_SUCCESS if it is safe to block on the wait object(s) 260corresponding to the fabric descriptor(s), or -FI_EAGAIN if there are 261events queued on the fabric descriptor or if blocking could hang the 262application. 263 264The call takes an array of fabric descriptors. For each wait object 265that will be passed to the native wait routine, the corresponding 266fabric descriptor should first be passed to fi_trywait. All fabric 267descriptors passed into a single fi_trywait call must make use of the 268same underlying wait object type. 269 270The following types of fabric descriptors may be passed into fi_trywait: 271event queues, completion queues, counters, and wait sets. Applications 272that wish to use native wait calls should select specific wait objects 273when allocating such resources. For example, by setting the item's 274creation attribute wait_obj value to FI_WAIT_FD. 275 276In the case the wait object to check belongs to a wait set, only 277the wait set itself needs to be passed into fi_trywait. The fabric 278resources associated with the wait set do not. 279 280On receiving a return value of -FI_EAGAIN from fi_trywait, an application 281should read all queued completions and events, and call fi_trywait again 282before attempting to block. Applications can make use of a fabric 283poll set to identify completion queues and counters that may require 284processing. 285 286## fi_control 287 288The fi_control call is used to access provider or implementation specific 289details of a fids that support blocking calls, such as wait sets, completion 290queues, counters, and event queues. Access to the wait set or fid should be 291serialized across all calls when fi_control is invoked, as it may redirect 292the implementation of wait set operations. The following control commands 293are usable with a wait set or fid. 294 295*FI_GETWAIT (void \*\*)* 296: This command allows the user to retrieve the low-level wait object 297 associated with a wait set or fid. The format of the wait set is specified 298 during wait set creation, through the wait set attributes. The fi_control 299 arg parameter should be an address where a pointer to the returned wait 300 object will be written. This should be an 'int *' for FI_WAIT_FD, 301 'struct fi_mutex_cond' for FI_WAIT_MUTEX_COND, or 'struct fi_wait_pollfd' 302 for FI_WAIT_POLLFD. Support for FI_GETWAIT is provider specific. 303 304*FI_GETWAITOBJ (enum fi_wait_obj \*)* 305: This command returns the type of wait object associated with a wait set 306 or fid. 307 308# RETURN VALUES 309 310Returns FI_SUCCESS on success. On error, a negative value corresponding to 311fabric errno is returned. 312 313Fabric errno values are defined in 314`rdma/fi_errno.h`. 315 316fi_poll 317: On success, if events are available, returns the number of entries 318 written to the context array. 319 320# NOTES 321 322In many situations, blocking calls may need to wait on signals sent 323to a number of file descriptors. For example, this is the case for 324socket based providers, such as tcp and udp, as well as utility providers 325such as multi-rail. For simplicity, when epoll is available, it can 326be used to limit the number of file descriptors that an application 327must monitor. The use of epoll may also be required in order 328to support FI_WAIT_FD. 329 330However, in order to support waiting on multiple file descriptors on systems 331where epoll support is not available, or where epoll performance may 332negatively impact performance, FI_WAIT_POLLFD provides this mechanism. 333A significant different between using POLLFD versus FD wait objects 334is that with FI_WAIT_POLLFD, the file descriptors may change dynamically. 335As an example, the file descriptors associated with a completion queues' 336wait set may change as endpoint associations with the CQ are added and 337removed. 338 339Struct fi_wait_pollfd is used to retrieve all file descriptors for fids 340using FI_WAIT_POLLFD to support blocking calls. 341 342```c 343struct fi_wait_pollfd { 344 uint64_t change_index; 345 size_t nfds; 346 struct pollfd *fd; 347}; 348``` 349 350*change_index* 351: The change_index may be used to determine if there have been any changes 352 to the file descriptor list. Anytime a file descriptor is added, removed, 353 or its events are updated, this field is incremented by the provider. 354 Applications wishing to wait on file descriptors directly should cache 355 the change_index value. Before blocking on file descriptor events, the 356 app should use fi_control() to retrieve the current change_index and 357 compare that against its cached value. If the values differ, then the 358 app should update its file descriptor list prior to blocking. 359 360*nfds* 361: On input to fi_control(), this indicates the number of entries in the 362 struct pollfd * array. On output, this will be set to the number of 363 entries needed to store the current number of file descriptors. If 364 the input value is smaller than the output value, fi_control() will 365 return the error -FI_ETOOSMALL. Note that setting nfds = 0 allows 366 an efficient way of checking the change_index. 367 368*fd* 369: This points to an array of struct pollfd entries. The number of entries 370 is specified through the nfds field. If the number of needed entries 371 is less than or equal to the number of entries available, the struct 372 pollfd array will be filled out with a list of file descriptors and 373 corresponding events that can be used in the select(2) and poll(2) 374 calls. 375 376The change_index is updated only when the file descriptors associated with 377the pollfd file set has changed. Checking the change_index is an additional 378step needed when working with FI_WAIT_POLLFD wait objects directly. The use 379of the fi_trywait() function is still required if accessing wait objects 380directly. 381 382# SEE ALSO 383 384[`fi_getinfo`(3)](fi_getinfo.3.html), 385[`fi_domain`(3)](fi_domain.3.html), 386[`fi_cntr`(3)](fi_cntr.3.html), 387[`fi_eq`(3)](fi_eq.3.html) 388