1---
2layout: page
3title: fi_eq(3)
4tagline: Libfabric Programmer's Manual
5---
6{% include JB/setup %}
7
8# NAME
9
10fi_eq \- Event queue operations
11
12fi_eq_open / fi_close
13: Open/close an event queue
14
15fi_control
16: Control operation of EQ
17
18fi_eq_read / fi_eq_readerr
19: Read an event from an event queue
20
21fi_eq_write
22: Writes an event to an event queue
23
24fi_eq_sread
25: A synchronous (blocking) read of an event queue
26
27fi_eq_strerror
28: Converts provider specific error information into a printable string
29
30# SYNOPSIS
31
32```c
33#include <rdma/fi_domain.h>
34
35int fi_eq_open(struct fid_fabric *fabric, struct fi_eq_attr *attr,
36    struct fid_eq **eq, void *context);
37
38int fi_close(struct fid *eq);
39
40int fi_control(struct fid *eq, int command, void *arg);
41
42ssize_t fi_eq_read(struct fid_eq *eq, uint32_t *event,
43    void *buf, size_t len, uint64_t flags);
44
45ssize_t fi_eq_readerr(struct fid_eq *eq, struct fi_eq_err_entry *buf,
46    uint64_t flags);
47
48ssize_t fi_eq_write(struct fid_eq *eq, uint32_t event,
49    const void *buf, size_t len, uint64_t flags);
50
51ssize_t fi_eq_sread(struct fid_eq *eq, uint32_t *event,
52    void *buf, size_t len, int timeout, uint64_t flags);
53
54const char * fi_eq_strerror(struct fid_eq *eq, int prov_errno,
55      const void *err_data, char *buf, size_t len);
56```
57
58# ARGUMENTS
59
60*fabric*
61: Opened fabric descriptor
62
63*eq*
64: Event queue
65
66*attr*
67: Event queue attributes
68
69*context*
70: User specified context associated with the event queue.
71
72*event*
73: Reported event
74
75*buf*
76: For read calls, the data buffer to write events into.  For write
77  calls, an event to insert into the event queue.  For fi_eq_strerror,
78  an optional buffer that receives printable error information.
79
80*len*
81: Length of data buffer
82
83*flags*
84: Additional flags to apply to the operation
85
86*command*
87: Command of control operation to perform on EQ.
88
89*arg*
90: Optional control argument
91
92*prov_errno*
93: Provider specific error value
94
95*err_data*
96: Provider specific error data related to a completion
97
98*timeout*
99: Timeout specified in milliseconds
100
101# DESCRIPTION
102
103Event queues are used to report events associated with control
104operations.  They are associated with memory registration, address
105vectors, connection management, and fabric and domain level events.
106Reported events are either associated with a requested operation or
107affiliated with a call that registers for specific types of events,
108such as listening for connection requests.
109
110## fi_eq_open
111
112fi_eq_open allocates a new event queue.
113
114The properties and behavior of an event queue are defined by `struct
115fi_eq_attr`.
116
117```c
118struct fi_eq_attr {
119	size_t               size;      /* # entries for EQ */
120	uint64_t             flags;     /* operation flags */
121	enum fi_wait_obj     wait_obj;  /* requested wait object */
122	int                  signaling_vector; /* interrupt affinity */
123	struct fid_wait     *wait_set;  /* optional wait set */
124};
125```
126
127*size*
128: Specifies the minimum size of an event queue.
129
130*flags*
131: Flags that control the configuration of the EQ.
132
133- *FI_WRITE*
134: Indicates that the application requires support for inserting user
135  events into the EQ.  If this flag is set, then the fi_eq_write
136  operation must be supported by the provider.  If the FI_WRITE flag
137  is not set, then the application may not invoke fi_eq_write.
138
139- *FI_AFFINITY*
140: Indicates that the signaling_vector field (see below) is valid.
141
142*wait_obj*
143: EQ's may be associated with a specific wait object.  Wait objects
144  allow applications to block until the wait object is signaled,
145  indicating that an event is available to be read.  Users may use
146  fi_control to retrieve the underlying wait object associated with an
147  EQ, in order to use it in other system calls.  The following values
148  may be used to specify the type of wait object associated with an
149  EQ:
150
151- *FI_WAIT_NONE*
152: Used to indicate that the user will not block (wait) for events on
153  the EQ.  When FI_WAIT_NONE is specified, the application may not
154  call fi_eq_sread.  This is the default is no wait object is specified.
155
156- *FI_WAIT_UNSPEC*
157: Specifies that the user will only wait on the EQ using fabric
158  interface calls, such as fi_eq_sread.  In this case, the underlying
159  provider may select the most appropriate or highest performing wait
160  object available, including custom wait mechanisms.  Applications
161  that select FI_WAIT_UNSPEC are not guaranteed to retrieve the
162  underlying wait object.
163
164- *FI_WAIT_SET*
165: Indicates that the event queue should use a wait set object to wait
166  for events.  If specified, the wait_set field must reference an
167  existing wait set object.
168
169- *FI_WAIT_FD*
170: Indicates that the EQ should use a file descriptor as its wait
171  mechanism.  A file descriptor wait object must be usable in select,
172  poll, and epoll routines.  However, a provider may signal an FD wait
173  object by marking it as readable or with an error.
174
175- *FI_WAIT_MUTEX_COND*
176: Specifies that the EQ should use a pthread mutex and cond variable
177  as a wait object.
178
179- *FI_WAIT_YIELD*
180: Indicates that the EQ will wait without a wait object but instead
181  yield on every wait. Allows usage of fi_eq_sread through a spin.
182
183*signaling_vector*
184: If the FI_AFFINITY flag is set, this indicates the logical cpu number
185  (0..max cpu - 1) that interrupts associated with the EQ should target.
186  This field should be treated as a hint to the provider and may be
187  ignored if the provider does not support interrupt affinity.
188
189*wait_set*
190: If wait_obj is FI_WAIT_SET, this field references a wait object to
191  which the event queue should attach.  When an event is inserted into
192  the event queue, the corresponding wait set will be signaled if all
193  necessary conditions are met.  The use of a wait_set enables an
194  optimized method of waiting for events across multiple event queues.
195  This field is ignored if wait_obj is not FI_WAIT_SET.
196
197## fi_close
198
199The fi_close call releases all resources associated with an event queue.  Any
200events which remain on the EQ when it is closed are lost.
201
202The EQ must not be bound to any other objects prior to being closed, otherwise
203the call will return -FI_EBUSY.
204
205## fi_control
206
207The fi_control call is used to access provider or implementation
208specific details of the event queue.  Access to the EQ should be
209serialized across all calls when fi_control is invoked, as it may
210redirect the implementation of EQ operations.  The following control
211commands are usable with an EQ.
212
213*FI_GETWAIT (void \*\*)*
214: This command allows the user to retrieve the low-level wait object
215  associated with the EQ.  The format of the wait-object is specified
216  during EQ creation, through the EQ attributes.  The fi_control arg
217  parameter should be an address where a pointer to the returned wait
218  object will be written.  This should be an 'int *' for FI_WAIT_FD,
219  or 'struct fi_mutex_cond' for FI_WAIT_MUTEX_COND.
220
221```c
222struct fi_mutex_cond {
223	pthread_mutex_t     *mutex;
224	pthread_cond_t      *cond;
225};
226```
227
228## fi_eq_read
229
230The fi_eq_read operations performs a non-blocking read of event data
231from the EQ.  The format of the event data is based on the type of
232event retrieved from the EQ, with all events starting with a struct
233fi_eq_entry header.  At most one event will be returned per EQ read
234operation.  The number of bytes successfully read from the EQ is
235returned from the read.  The FI_PEEK flag may be used to indicate that
236event data should be read from the EQ without being consumed.  A
237subsequent read without the FI_PEEK flag would then remove the event
238from the EQ.
239
240The following types of events may be reported to an EQ, along with
241information regarding the format associated with each event.
242
243*Asynchronous Control Operations*
244: Asynchronous control operations are basic requests that simply need
245  to generate an event to indicate that they have completed.  These
246  include the following types of events: memory registration, address
247  vector resolution, and multicast joins.
248
249  Control requests report their completion by inserting a `struct
250  fi_eq_entry` into the EQ.  The format of this structure is:
251
252```c
253struct fi_eq_entry {
254	fid_t            fid;        /* fid associated with request */
255	void            *context;    /* operation context */
256	uint64_t         data;       /* completion-specific data */
257};
258```
259
260  For the completion of basic asynchronous control operations, the
261  returned event will indicate the operation that has completed, and
262  the fid will reference the fabric descriptor associated with
263  the event.  For memory registration, this will be an FI_MR_COMPLETE
264  event and the fid_mr.  Address resolution will reference an
265  FI_AV_COMPLETE event and fid_av.  Multicast joins will report an
266  FI_JOIN_COMPLETE and fid_mc.  The context field will be set
267  to the context specified as part of the operation, if available,
268  otherwise the context will be associated with the fabric descriptor.
269  The data field will be set as described in the man page for the
270  corresponding object type (e.g., see [`fi_av`(3)](fi_av.3.html) for
271  a description of how asynchronous address vector insertions are
272  completed).
273
274*Connection Notification*
275: Connection notifications are connection management notifications
276  used to setup or tear down connections between endpoints.  There are
277  three connection notification events: FI_CONNREQ, FI_CONNECTED, and
278  FI_SHUTDOWN.  Connection notifications are reported using `struct
279  fi_eq_cm_entry`:
280
281```c
282struct fi_eq_cm_entry {
283	fid_t            fid;        /* fid associated with request */
284	struct fi_info  *info;       /* endpoint information */
285	uint8_t         data[];     /* app connection data */
286};
287```
288
289  A connection request (FI_CONNREQ) event indicates that
290  a remote endpoint wishes to establish a new connection to a listening,
291  or passive, endpoint.  The fid is the passive endpoint.
292  Information regarding the requested, active endpoint's
293  capabilities and attributes are available from the info field.  The
294  application is responsible for freeing this structure by calling
295  fi_freeinfo when it is no longer needed.  The fi_info connreq field
296  will reference the connection request associated with this event.
297  To accept a connection, an endpoint must first be created by passing
298  an fi_info structure referencing this connreq field to fi_endpoint().
299  This endpoint is then passed to fi_accept() to complete the acceptance
300  of the connection attempt.
301  Creating the endpoint is most easily accomplished by
302  passing the fi_info returned as part of the CM event into
303  fi_endpoint().  If the connection is to be rejected, the connreq is
304  passed to fi_reject().
305
306  Any application data exchanged as part of the connection request is
307  placed beyond the fi_eq_cm_entry structure.  The amount of data
308  available is application dependent and limited to the buffer space
309  provided by the application when fi_eq_read is called.  The amount
310  of returned data may be calculated using the return value to
311  fi_eq_read.  Note that the amount of returned data is limited by the
312  underlying connection protocol, and the length of any data returned
313  may include protocol padding.  As a result, the returned length may
314  be larger than that specified by the connecting peer.
315
316  If a connection request has been accepted, an FI_CONNECTED event will
317  be generated on both sides of the connection.  The active side -- one
318  that called fi_connect() -- may receive user data as part of the
319  FI_CONNECTED event.  The user data is passed to the connection
320  manager on the passive side through the fi_accept call.  User data is
321  not provided with an FI_CONNECTED event on the listening side of the
322  connection.
323
324  Notification that a remote peer has disconnected from an active
325  endpoint is done through the FI_SHUTDOWN event.  Shutdown
326  notification uses struct fi_eq_cm_entry as declared above.  The fid
327  field for a shutdown notification refers to the active endpoint's
328  fid_ep.
329
330*Asynchronous Error Notification*
331: Asynchronous errors are used to report problems with fabric resources.
332  Reported errors may be fatal or transient, based on the error, and
333  result in the resource becoming disabled.  Disabled resources will fail
334  operations submitted against them until they are explicitly re-enabled
335  by the application.
336
337  Asynchronous errors may be reported for completion queues and endpoints
338  of all types.  CQ errors can result when resource management has been
339  disabled, and the provider has detected a queue overrun.  Endpoint
340  errors may be result of numerous actions, but are often associated with
341  a failed operation.  Operations may fail because of buffer overruns,
342  invalid permissions, incorrect memory access keys, network routing
343  failures, network reach-ability issues, etc.
344
345  Asynchronous errors are reported using struct fi_eq_err_entry, as defined
346  below.  The fabric descriptor (fid) associated with the error is provided
347  as part of the error data.  An error code is also available to determine
348  the cause of the error.
349
350## fi_eq_sread
351
352The fi_eq_sread call is the blocking (or synchronous) equivalent to
353fi_eq_read.  It behaves is similar to the non-blocking call, with the
354exception that the calls will not return until either an event has
355been read from the EQ or an error or timeout occurs.  Specifying a
356negative timeout means an infinite timeout.
357
358Threads blocking in this function will return to the caller if
359they are signaled by some external source.  This is true even if
360the timeout has not occurred or was specified as infinite.
361
362It is invalid for applications to call this function if the EQ
363has been configured with a wait object of FI_WAIT_NONE or FI_WAIT_SET.
364
365## fi_eq_readerr
366
367The read error function, fi_eq_readerr, retrieves information
368regarding any asynchronous operation which has completed with an
369unexpected error.  fi_eq_readerr is a non-blocking call, returning
370immediately whether an error completion was found or not.
371
372EQs are optimized to report operations which have completed
373successfully.  Operations which fail are reported 'out of band'.  Such
374operations are retrieved using the fi_eq_readerr function.  When an
375operation that completes with an unexpected error is inserted into an
376EQ, it is placed into a temporary error queue.  Attempting to read
377from an EQ while an item is in the error queue results in an FI_EAVAIL
378failure.  Applications may use this return code to determine when to
379call fi_eq_readerr.
380
381Error information is reported to the user through struct
382fi_eq_err_entry.  The format of this structure is defined below.
383
384```c
385struct fi_eq_err_entry {
386	fid_t            fid;        /* fid associated with error */
387	void            *context;    /* operation context */
388	uint64_t         data;       /* completion-specific data */
389	int              err;        /* positive error code */
390	int              prov_errno; /* provider error code */
391	void            *err_data;   /* additional error data */
392	size_t           err_data_size; /* size of err_data */
393};
394```
395
396The fid will reference the fabric descriptor associated with the
397event.  For memory registration, this will be the fid_mr, address
398resolution will reference a fid_av, and CM events will refer to a
399fid_ep.  The context field will be set to the context specified as
400part of the operation.
401
402The data field will be set as described in the man page for the
403corresponding object type (e.g., see [`fi_av`(3)](fi_av.3.html) for a
404description of how asynchronous address vector insertions are
405completed).
406
407The general reason for the error is provided through the err field.
408Provider or operational specific error information may also be available
409through the prov_errno and err_data fields.  Users may call fi_eq_strerror to
410convert provider specific error information into a printable string
411for debugging purposes.
412
413On input, err_data_size indicates the size of the err_data buffer in bytes.
414On output, err_data_size will be set to the number of bytes copied to the
415err_data buffer.  The err_data information is typically used with
416fi_eq_strerror to provide details about the type of error that occurred.
417
418For compatibility purposes, if err_data_size is 0 on input, or the fabric
419was opened with release < 1.5, err_data will be set to a data buffer
420owned by the provider.  The contents of the buffer will remain valid until a
421subsequent read call against the EQ.  Applications must serialize access
422to the EQ when processing errors to ensure that the buffer referenced by
423err_data does not change.
424
425# EVENT FIELDS
426
427The EQ entry data structures share many of the same fields.  The meanings
428are the same or similar for all EQ structure formats, with specific details
429described below.
430
431*fid*
432: This corresponds to the fabric descriptor associated with the event.  The
433  type of fid depends on the event being reported.  For FI_CONNREQ this will
434  be the fid of the passive endpoint.  FI_CONNECTED and FI_SHUTDOWN will
435  reference the active endpoint.  FI_MR_COMPLETE and FI_AV_COMPLETE will
436  refer to the MR or AV fabric descriptor, respectively.  FI_JOIN_COMPLETE
437  will point to the multicast descriptor returned as part of the join
438  operation.  Applications can use fid->context value to retrieve the
439  context associated with the fabric descriptor.
440
441*context*
442: The context value is set to the context parameter specified with the
443  operation that generated the event.  If no context parameter is
444  associated with the operation, this field will be NULL.
445
446*data*
447: Data is an operation specific value or set of bytes.  For connection
448  events, data is application data exchanged as part of the connection
449  protocol.
450
451*err*
452: This err code is a positive fabric errno associated with an event.
453  The err value indicates the general reason for an error, if one occurred.
454  See fi_errno.3 for a list of possible error codes.
455
456*prov_errno*
457: On an error, prov_errno may contain a provider specific error code.  The
458  use of this field and its meaning is provider specific.  It is intended
459  to be used as a debugging aid.  See fi_eq_strerror for additional details
460  on converting this error value into a human readable string.
461
462*err_data*
463: On an error, err_data may reference a provider specific amount of data
464  associated with an error.  The use of this field and its meaning is
465  provider specific.  It is intended to be used as a debugging aid.  See
466  fi_eq_strerror for additional details on converting this error data into
467  a human readable string.
468
469*err_data_size*
470: On input, err_data_size indicates the size of the err_data buffer in bytes.
471  On output, err_data_size will be set to the number of bytes copied to the
472  err_data buffer.  The err_data information is typically used with
473  fi_eq_strerror to provide details about the type of error that occurred.
474
475  For compatibility purposes, if err_data_size is 0 on input, or the fabric
476  was opened with release < 1.5, err_data will be set to a data buffer
477  owned by the provider.  The contents of the buffer will remain valid until a
478  subsequent read call against the EQ.  Applications must serialize access
479  to the EQ when processing errors to ensure that the buffer referenced by
480  err_data does no change.
481
482# NOTES
483
484If an event queue has been overrun, it will be placed into an 'overrun'
485state.  Write operations against an overrun EQ will fail with -FI_EOVERRUN.
486Read operations will continue to return any valid, non-corrupted events, if
487available.  After all valid events have been retrieved, any attempt to read
488the EQ will result in it returning an FI_EOVERRUN error event.  Overrun
489event queues are considered fatal and may not be used to report additional
490events once the overrun occurs.
491
492# RETURN VALUES
493
494fi_eq_open
495: Returns 0 on success.  On error, a negative value corresponding to
496  fabric errno is returned.
497
498fi_eq_read / fi_eq_readerr
499: On success, returns the number of bytes read from the
500  event queue.  On error, a negative value corresponding to fabric
501  errno is returned.  If no data is available to be read from the
502  event queue, -FI_EAGAIN is returned.
503
504fi_eq_sread
505: On success, returns the number of bytes read from the
506  event queue.  On error, a negative value corresponding to fabric
507  errno is returned.  If the timeout expires or the calling
508  thread is signaled and no data is available to be read from the
509  event queue, -FI_EAGAIN is returned.
510
511fi_eq_write
512: On success, returns the number of bytes written to the
513  event queue.  On error, a negative value corresponding to fabric
514  errno is returned.
515
516fi_eq_strerror
517: Returns a character string interpretation of the provider specific
518  error returned with a completion.
519
520Fabric errno values are defined in
521`rdma/fi_errno.h`.
522
523# SEE ALSO
524
525[`fi_getinfo`(3)](fi_getinfo.3.html),
526[`fi_endpoint`(3)](fi_endpoint.3.html),
527[`fi_domain`(3)](fi_domain.3.html),
528[`fi_cntr`(3)](fi_cntr.3.html),
529[`fi_poll`(3)](fi_poll.3.html)
530