1--- 2layout: page 3title: fi_eq(3) 4tagline: Libfabric Programmer's Manual 5--- 6{% include JB/setup %} 7 8# NAME 9 10fi_eq \- Event queue operations 11 12fi_eq_open / fi_close 13: Open/close an event queue 14 15fi_control 16: Control operation of EQ 17 18fi_eq_read / fi_eq_readerr 19: Read an event from an event queue 20 21fi_eq_write 22: Writes an event to an event queue 23 24fi_eq_sread 25: A synchronous (blocking) read of an event queue 26 27fi_eq_strerror 28: Converts provider specific error information into a printable string 29 30# SYNOPSIS 31 32```c 33#include <rdma/fi_domain.h> 34 35int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr, 36 struct fid_eq **eq, void *context); 37 38int fi_close(struct fid *eq); 39 40int fi_control(struct fid *eq, int command, void *arg); 41 42ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event, 43 void *buf, size_t len, uint64_t flags); 44 45ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf, 46 uint64_t flags); 47 48ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event, 49 const void *buf, size_t len, uint64_t flags); 50 51ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event, 52 void *buf, size_t len, int timeout, uint64_t flags); 53 54const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno, 55 const void *err_data, char *buf, size_t len); 56``` 57 58# ARGUMENTS 59 60*fabric* 61: Opened fabric descriptor 62 63*eq* 64: Event queue 65 66*attr* 67: Event queue attributes 68 69*context* 70: User specified context associated with the event queue. 71 72*event* 73: Reported event 74 75*buf* 76: For read calls, the data buffer to write events into. For write 77 calls, an event to insert into the event queue. For fi_eq_strerror, 78 an optional buffer that receives printable error information. 79 80*len* 81: Length of data buffer 82 83*flags* 84: Additional flags to apply to the operation 85 86*command* 87: Command of control operation to perform on EQ. 88 89*arg* 90: Optional control argument 91 92*prov_errno* 93: Provider specific error value 94 95*err_data* 96: Provider specific error data related to a completion 97 98*timeout* 99: Timeout specified in milliseconds 100 101# DESCRIPTION 102 103Event queues are used to report events associated with control 104operations. They are associated with memory registration, address 105vectors, connection management, and fabric and domain level events. 106Reported events are either associated with a requested operation or 107affiliated with a call that registers for specific types of events, 108such as listening for connection requests. 109 110## fi_eq_open 111 112fi_eq_open allocates a new event queue. 113 114The properties and behavior of an event queue are defined by `struct 115fi_eq_attr`. 116 117```c 118struct fi_eq_attr { 119 size_t size; /* # entries for EQ */ 120 uint64_t flags; /* operation flags */ 121 enum fi_wait_obj wait_obj; /* requested wait object */ 122 int signaling_vector; /* interrupt affinity */ 123 struct fid_wait *wait_set; /* optional wait set */ 124}; 125``` 126 127*size* 128: Specifies the minimum size of an event queue. 129 130*flags* 131: Flags that control the configuration of the EQ. 132 133- *FI_WRITE* 134: Indicates that the application requires support for inserting user 135 events into the EQ. If this flag is set, then the fi_eq_write 136 operation must be supported by the provider. If the FI_WRITE flag 137 is not set, then the application may not invoke fi_eq_write. 138 139- *FI_AFFINITY* 140: Indicates that the signaling_vector field (see below) is valid. 141 142*wait_obj* 143: EQ's may be associated with a specific wait object. Wait objects 144 allow applications to block until the wait object is signaled, 145 indicating that an event is available to be read. Users may use 146 fi_control to retrieve the underlying wait object associated with an 147 EQ, in order to use it in other system calls. The following values 148 may be used to specify the type of wait object associated with an 149 EQ: 150 151- *FI_WAIT_NONE* 152: Used to indicate that the user will not block (wait) for events on 153 the EQ. When FI_WAIT_NONE is specified, the application may not 154 call fi_eq_sread. This is the default is no wait object is specified. 155 156- *FI_WAIT_UNSPEC* 157: Specifies that the user will only wait on the EQ using fabric 158 interface calls, such as fi_eq_sread. In this case, the underlying 159 provider may select the most appropriate or highest performing wait 160 object available, including custom wait mechanisms. Applications 161 that select FI_WAIT_UNSPEC are not guaranteed to retrieve the 162 underlying wait object. 163 164- *FI_WAIT_SET* 165: Indicates that the event queue should use a wait set object to wait 166 for events. If specified, the wait_set field must reference an 167 existing wait set object. 168 169- *FI_WAIT_FD* 170: Indicates that the EQ should use a file descriptor as its wait 171 mechanism. A file descriptor wait object must be usable in select, 172 poll, and epoll routines. However, a provider may signal an FD wait 173 object by marking it as readable or with an error. 174 175- *FI_WAIT_MUTEX_COND* 176: Specifies that the EQ should use a pthread mutex and cond variable 177 as a wait object. 178 179- *FI_WAIT_YIELD* 180: Indicates that the EQ will wait without a wait object but instead 181 yield on every wait. Allows usage of fi_eq_sread through a spin. 182 183*signaling_vector* 184: If the FI_AFFINITY flag is set, this indicates the logical cpu number 185 (0..max cpu - 1) that interrupts associated with the EQ should target. 186 This field should be treated as a hint to the provider and may be 187 ignored if the provider does not support interrupt affinity. 188 189*wait_set* 190: If wait_obj is FI_WAIT_SET, this field references a wait object to 191 which the event queue should attach. When an event is inserted into 192 the event queue, the corresponding wait set will be signaled if all 193 necessary conditions are met. The use of a wait_set enables an 194 optimized method of waiting for events across multiple event queues. 195 This field is ignored if wait_obj is not FI_WAIT_SET. 196 197## fi_close 198 199The fi_close call releases all resources associated with an event queue. Any 200events which remain on the EQ when it is closed are lost. 201 202The EQ must not be bound to any other objects prior to being closed, otherwise 203the call will return -FI_EBUSY. 204 205## fi_control 206 207The fi_control call is used to access provider or implementation 208specific details of the event queue. Access to the EQ should be 209serialized across all calls when fi_control is invoked, as it may 210redirect the implementation of EQ operations. The following control 211commands are usable with an EQ. 212 213*FI_GETWAIT (void \*\*)* 214: This command allows the user to retrieve the low-level wait object 215 associated with the EQ. The format of the wait-object is specified 216 during EQ creation, through the EQ attributes. The fi_control arg 217 parameter should be an address where a pointer to the returned wait 218 object will be written. This should be an 'int *' for FI_WAIT_FD, 219 or 'struct fi_mutex_cond' for FI_WAIT_MUTEX_COND. 220 221```c 222struct fi_mutex_cond { 223 pthread_mutex_t *mutex; 224 pthread_cond_t *cond; 225}; 226``` 227 228## fi_eq_read 229 230The fi_eq_read operations performs a non-blocking read of event data 231from the EQ. The format of the event data is based on the type of 232event retrieved from the EQ, with all events starting with a struct 233fi_eq_entry header. At most one event will be returned per EQ read 234operation. The number of bytes successfully read from the EQ is 235returned from the read. The FI_PEEK flag may be used to indicate that 236event data should be read from the EQ without being consumed. A 237subsequent read without the FI_PEEK flag would then remove the event 238from the EQ. 239 240The following types of events may be reported to an EQ, along with 241information regarding the format associated with each event. 242 243*Asynchronous Control Operations* 244: Asynchronous control operations are basic requests that simply need 245 to generate an event to indicate that they have completed. These 246 include the following types of events: memory registration, address 247 vector resolution, and multicast joins. 248 249 Control requests report their completion by inserting a `struct 250 fi_eq_entry` into the EQ. The format of this structure is: 251 252```c 253struct fi_eq_entry { 254 fid_t fid; /* fid associated with request */ 255 void *context; /* operation context */ 256 uint64_t data; /* completion-specific data */ 257}; 258``` 259 260 For the completion of basic asynchronous control operations, the 261 returned event will indicate the operation that has completed, and 262 the fid will reference the fabric descriptor associated with 263 the event. For memory registration, this will be an FI_MR_COMPLETE 264 event and the fid_mr. Address resolution will reference an 265 FI_AV_COMPLETE event and fid_av. Multicast joins will report an 266 FI_JOIN_COMPLETE and fid_mc. The context field will be set 267 to the context specified as part of the operation, if available, 268 otherwise the context will be associated with the fabric descriptor. 269 The data field will be set as described in the man page for the 270 corresponding object type (e.g., see [`fi_av`(3)](fi_av.3.html) for 271 a description of how asynchronous address vector insertions are 272 completed). 273 274*Connection Notification* 275: Connection notifications are connection management notifications 276 used to setup or tear down connections between endpoints. There are 277 three connection notification events: FI_CONNREQ, FI_CONNECTED, and 278 FI_SHUTDOWN. Connection notifications are reported using `struct 279 fi_eq_cm_entry`: 280 281```c 282struct fi_eq_cm_entry { 283 fid_t fid; /* fid associated with request */ 284 struct fi_info *info; /* endpoint information */ 285 uint8_t data[]; /* app connection data */ 286}; 287``` 288 289 A connection request (FI_CONNREQ) event indicates that 290 a remote endpoint wishes to establish a new connection to a listening, 291 or passive, endpoint. The fid is the passive endpoint. 292 Information regarding the requested, active endpoint's 293 capabilities and attributes are available from the info field. The 294 application is responsible for freeing this structure by calling 295 fi_freeinfo when it is no longer needed. The fi_info connreq field 296 will reference the connection request associated with this event. 297 To accept a connection, an endpoint must first be created by passing 298 an fi_info structure referencing this connreq field to fi_endpoint(). 299 This endpoint is then passed to fi_accept() to complete the acceptance 300 of the connection attempt. 301 Creating the endpoint is most easily accomplished by 302 passing the fi_info returned as part of the CM event into 303 fi_endpoint(). If the connection is to be rejected, the connreq is 304 passed to fi_reject(). 305 306 Any application data exchanged as part of the connection request is 307 placed beyond the fi_eq_cm_entry structure. The amount of data 308 available is application dependent and limited to the buffer space 309 provided by the application when fi_eq_read is called. The amount 310 of returned data may be calculated using the return value to 311 fi_eq_read. Note that the amount of returned data is limited by the 312 underlying connection protocol, and the length of any data returned 313 may include protocol padding. As a result, the returned length may 314 be larger than that specified by the connecting peer. 315 316 If a connection request has been accepted, an FI_CONNECTED event will 317 be generated on both sides of the connection. The active side -- one 318 that called fi_connect() -- may receive user data as part of the 319 FI_CONNECTED event. The user data is passed to the connection 320 manager on the passive side through the fi_accept call. User data is 321 not provided with an FI_CONNECTED event on the listening side of the 322 connection. 323 324 Notification that a remote peer has disconnected from an active 325 endpoint is done through the FI_SHUTDOWN event. Shutdown 326 notification uses struct fi_eq_cm_entry as declared above. The fid 327 field for a shutdown notification refers to the active endpoint's 328 fid_ep. 329 330*Asynchronous Error Notification* 331: Asynchronous errors are used to report problems with fabric resources. 332 Reported errors may be fatal or transient, based on the error, and 333 result in the resource becoming disabled. Disabled resources will fail 334 operations submitted against them until they are explicitly re-enabled 335 by the application. 336 337 Asynchronous errors may be reported for completion queues and endpoints 338 of all types. CQ errors can result when resource management has been 339 disabled, and the provider has detected a queue overrun. Endpoint 340 errors may be result of numerous actions, but are often associated with 341 a failed operation. Operations may fail because of buffer overruns, 342 invalid permissions, incorrect memory access keys, network routing 343 failures, network reach-ability issues, etc. 344 345 Asynchronous errors are reported using struct fi_eq_err_entry, as defined 346 below. The fabric descriptor (fid) associated with the error is provided 347 as part of the error data. An error code is also available to determine 348 the cause of the error. 349 350## fi_eq_sread 351 352The fi_eq_sread call is the blocking (or synchronous) equivalent to 353fi_eq_read. It behaves is similar to the non-blocking call, with the 354exception that the calls will not return until either an event has 355been read from the EQ or an error or timeout occurs. Specifying a 356negative timeout means an infinite timeout. 357 358Threads blocking in this function will return to the caller if 359they are signaled by some external source. This is true even if 360the timeout has not occurred or was specified as infinite. 361 362It is invalid for applications to call this function if the EQ 363has been configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET. 364 365## fi_eq_readerr 366 367The read error function, fi_eq_readerr, retrieves information 368regarding any asynchronous operation which has completed with an 369unexpected error. fi_eq_readerr is a non-blocking call, returning 370immediately whether an error completion was found or not. 371 372EQs are optimized to report operations which have completed 373successfully. Operations which fail are reported 'out of band'. Such 374operations are retrieved using the fi_eq_readerr function. When an 375operation that completes with an unexpected error is inserted into an 376EQ, it is placed into a temporary error queue. Attempting to read 377from an EQ while an item is in the error queue results in an FI_EAVAIL 378failure. Applications may use this return code to determine when to 379call fi_eq_readerr. 380 381Error information is reported to the user through struct 382fi_eq_err_entry. The format of this structure is defined below. 383 384```c 385struct fi_eq_err_entry { 386 fid_t fid; /* fid associated with error */ 387 void *context; /* operation context */ 388 uint64_t data; /* completion-specific data */ 389 int err; /* positive error code */ 390 int prov_errno; /* provider error code */ 391 void *err_data; /* additional error data */ 392 size_t err_data_size; /* size of err_data */ 393}; 394``` 395 396The fid will reference the fabric descriptor associated with the 397event. For memory registration, this will be the fid_mr, address 398resolution will reference a fid_av, and CM events will refer to a 399fid_ep. The context field will be set to the context specified as 400part of the operation. 401 402The data field will be set as described in the man page for the 403corresponding object type (e.g., see [`fi_av`(3)](fi_av.3.html) for a 404description of how asynchronous address vector insertions are 405completed). 406 407The general reason for the error is provided through the err field. 408Provider or operational specific error information may also be available 409through the prov_errno and err_data fields. Users may call fi_eq_strerror to 410convert provider specific error information into a printable string 411for debugging purposes. 412 413On input, err_data_size indicates the size of the err_data buffer in bytes. 414On output, err_data_size will be set to the number of bytes copied to the 415err_data buffer. The err_data information is typically used with 416fi_eq_strerror to provide details about the type of error that occurred. 417 418For compatibility purposes, if err_data_size is 0 on input, or the fabric 419was opened with release < 1.5, err_data will be set to a data buffer 420owned by the provider. The contents of the buffer will remain valid until a 421subsequent read call against the EQ. Applications must serialize access 422to the EQ when processing errors to ensure that the buffer referenced by 423err_data does not change. 424 425# EVENT FIELDS 426 427The EQ entry data structures share many of the same fields. The meanings 428are the same or similar for all EQ structure formats, with specific details 429described below. 430 431*fid* 432: This corresponds to the fabric descriptor associated with the event. The 433 type of fid depends on the event being reported. For FI_CONNREQ this will 434 be the fid of the passive endpoint. FI_CONNECTED and FI_SHUTDOWN will 435 reference the active endpoint. FI_MR_COMPLETE and FI_AV_COMPLETE will 436 refer to the MR or AV fabric descriptor, respectively. FI_JOIN_COMPLETE 437 will point to the multicast descriptor returned as part of the join 438 operation. Applications can use fid->context value to retrieve the 439 context associated with the fabric descriptor. 440 441*context* 442: The context value is set to the context parameter specified with the 443 operation that generated the event. If no context parameter is 444 associated with the operation, this field will be NULL. 445 446*data* 447: Data is an operation specific value or set of bytes. For connection 448 events, data is application data exchanged as part of the connection 449 protocol. 450 451*err* 452: This err code is a positive fabric errno associated with an event. 453 The err value indicates the general reason for an error, if one occurred. 454 See fi_errno.3 for a list of possible error codes. 455 456*prov_errno* 457: On an error, prov_errno may contain a provider specific error code. The 458 use of this field and its meaning is provider specific. It is intended 459 to be used as a debugging aid. See fi_eq_strerror for additional details 460 on converting this error value into a human readable string. 461 462*err_data* 463: On an error, err_data may reference a provider specific amount of data 464 associated with an error. The use of this field and its meaning is 465 provider specific. It is intended to be used as a debugging aid. See 466 fi_eq_strerror for additional details on converting this error data into 467 a human readable string. 468 469*err_data_size* 470: On input, err_data_size indicates the size of the err_data buffer in bytes. 471 On output, err_data_size will be set to the number of bytes copied to the 472 err_data buffer. The err_data information is typically used with 473 fi_eq_strerror to provide details about the type of error that occurred. 474 475 For compatibility purposes, if err_data_size is 0 on input, or the fabric 476 was opened with release < 1.5, err_data will be set to a data buffer 477 owned by the provider. The contents of the buffer will remain valid until a 478 subsequent read call against the EQ. Applications must serialize access 479 to the EQ when processing errors to ensure that the buffer referenced by 480 err_data does no change. 481 482# NOTES 483 484If an event queue has been overrun, it will be placed into an 'overrun' 485state. Write operations against an overrun EQ will fail with -FI_EOVERRUN. 486Read operations will continue to return any valid, non-corrupted events, if 487available. After all valid events have been retrieved, any attempt to read 488the EQ will result in it returning an FI_EOVERRUN error event. Overrun 489event queues are considered fatal and may not be used to report additional 490events once the overrun occurs. 491 492# RETURN VALUES 493 494fi_eq_open 495: Returns 0 on success. On error, a negative value corresponding to 496 fabric errno is returned. 497 498fi_eq_read / fi_eq_readerr 499: On success, returns the number of bytes read from the 500 event queue. On error, a negative value corresponding to fabric 501 errno is returned. If no data is available to be read from the 502 event queue, -FI_EAGAIN is returned. 503 504fi_eq_sread 505: On success, returns the number of bytes read from the 506 event queue. On error, a negative value corresponding to fabric 507 errno is returned. If the timeout expires or the calling 508 thread is signaled and no data is available to be read from the 509 event queue, -FI_EAGAIN is returned. 510 511fi_eq_write 512: On success, returns the number of bytes written to the 513 event queue. On error, a negative value corresponding to fabric 514 errno is returned. 515 516fi_eq_strerror 517: Returns a character string interpretation of the provider specific 518 error returned with a completion. 519 520Fabric errno values are defined in 521`rdma/fi_errno.h`. 522 523# SEE ALSO 524 525[`fi_getinfo`(3)](fi_getinfo.3.html), 526[`fi_endpoint`(3)](fi_endpoint.3.html), 527[`fi_domain`(3)](fi_domain.3.html), 528[`fi_cntr`(3)](fi_cntr.3.html), 529[`fi_poll`(3)](fi_poll.3.html) 530