1These are notes describing the MPI RMA runtime. These notes are intended 2to help developers navigate the contents of these files and to locate specific 3functionality. 4 5The files [groups.h](groups.h) and [groups.c](groups.c) contain functionality 6that describes the relation between groups in GA (which are essentially equivalent 7to MPI communicators) and MPI windows. Each global array in GA has its own MPI 8window associated with it and and any communication involving the global array 9must use its corresponding window. On the other hand, collective operations in GA, 10particularly sync, are defined on groups. To implement sync, each group must 11have a list of all windows that are associated with it and there must be 12functionality available to manage the association of windows with the group as 13global arrays are created and destroyed. Most of the code that supports 14this association is located in these two files. 15 16[reg_win.h](reg_win.h) and [reg_win.c](reg_win.c) contain code for finding the 17window corresponding to a point in registered memory. When a global array is 18created, a `reg_entry_t` struct 19is created for each processor in the group on which the global array is defined. 20These structs are grouped into a link list so that the global array can 21determine where on a remote processor the data allocated for the global array 22resides. The functions in these files allow you to identify which window 23contains a given pointer. It allows conversion between the pointers used in 24ARMCI and the integer offsets used in MPI RMA calls. 25 26The remainder of the code is located in [comex.c](comex.c), with a few type 27declarations in [comex_impl.h](comex_impl.h). The comex.c code is has a number 28of preprocessor declarations that 29can be used to investigate the performance of different implemenations of the 30individual ComEx operations. The USE_MPI_DATATYPES symbol uses MPI 31Datatypes to send strided and vector data instead of decomposing the request 32into multiple contiguous data transfers. This option should be used if at all 33possible, it represents a substantial performance boost over sending multiple 34individual messages. If the USE_MPI_REQUESTS variable is defined then request 35based calls are used for all ComEx one-sided operations. These calls are cleared 36using the `MPI_Wait` function. If USE_MPI_FLUSH_LOCAL is defined, then local 37completion of the call is accomplished by using the `MPI_Win_flush_local` 38function. If neither of these calls is used, then `MPI_Win_lock` and 39`MPI_Win_unlock` are used to guarantee progress for one-sided operations. The 40request and flush-based protocols use `MPI_Win_lock_all` on the window that is 41created for each GA to create a passive synchronization epoch for each window. 42Both the request-basted and flush-based protocols support true non-blocking 43operations, for the lock/unlock protocol non-blocking operations default to 44blocking operations and the GA wait function is a no-op. 45