1These are notes describing the MPI RMA runtime. These notes are intended
2to help developers navigate the contents of these files and to locate specific
3functionality.
4
5The files [groups.h](groups.h) and [groups.c](groups.c) contain functionality
6that describes the relation between groups in GA (which are essentially equivalent
7to MPI communicators) and MPI windows. Each global array in GA has its own MPI
8window associated with it and and any communication involving the global array
9must use its corresponding window. On the other hand, collective operations in GA,
10particularly sync, are defined on groups. To implement sync, each group must
11have a list of all windows that are associated with it and there must be
12functionality available to manage the association of windows with the group as
13global arrays are created and destroyed. Most of the code that supports
14this association is located in these two files.
15
16[reg_win.h](reg_win.h) and [reg_win.c](reg_win.c) contain code for finding the
17window corresponding to a point in registered memory. When a global array is
18created, a `reg_entry_t` struct
19is created for each processor in the group on which the global array is defined.
20These structs are grouped into a link list so that the global array can
21determine where on a remote processor the data allocated for the global array
22resides. The functions in these files allow you to identify which window
23contains a given pointer. It allows conversion between the pointers used in
24ARMCI and the integer offsets used in MPI RMA calls.
25
26The remainder of the code is located in [comex.c](comex.c), with a few type
27declarations in [comex_impl.h](comex_impl.h). The comex.c code is has a number
28of preprocessor declarations that
29can be used to investigate the performance of different implemenations of the
30individual ComEx operations. The USE_MPI_DATATYPES symbol uses MPI
31Datatypes to send strided and vector data instead of decomposing the request
32into multiple contiguous data transfers. This option should be used if at all
33possible, it represents a substantial performance boost over sending multiple
34individual messages. If the USE_MPI_REQUESTS variable is defined then request
35based calls are used for all ComEx one-sided operations. These calls are cleared
36using the `MPI_Wait` function. If USE_MPI_FLUSH_LOCAL is defined, then local
37completion of the call is accomplished by using the `MPI_Win_flush_local`
38function. If neither of these calls is used, then `MPI_Win_lock` and
39`MPI_Win_unlock` are used to guarantee progress for one-sided operations. The
40request and flush-based protocols use `MPI_Win_lock_all` on the window that is
41created for each GA to create a passive synchronization epoch for each window.
42Both the request-basted and flush-based protocols support true non-blocking
43operations, for the lock/unlock protocol non-blocking operations default to
44blocking operations and the GA wait function is a no-op.
45