README.md
1<!--- &2>/dev/null
2 xdg-open $0; exit
3--->
4[comment]: # (Comment)
5
6# [Boost].MPI3
7*Alfredo A. Correa*
8<alfredo.correa@gmail.com>
9
10[Boost].MPI3 is a C++ library wrapper for standard MPI3.
11
12[Boost].MPI3 is not an official Boost library.
13However Boost.MPI3 is designed following the principles of Boost and the STL.
14
15[Boost].MPI3 is not a derivative of Boost.MPI and it is unrelated to the, now deprecated, official MPI-C++ interface.
16It adds features which were missing in Boost.MPI (which only covers MPI-1), with an iterator-based interface and MPI-3 features (RMA and Shared memory).
17[Boost].MPI3 is written from scratch in C++14.
18
19[Boost].MPI3 depends and has been compiled against Boost +1.53 and one of the MPI library implementations, OpenMPI +1.9, MPICH +3.2.1 or MVAPICH, using the following compilers gcc +5.4.1, clang +6.0, PGI 18.04.
20The current version of the library (wrapper) is `0.71`, (programmatically accesible from `./version.hpp`).
21
22## Introduction
23
24MPI is a large library for run-time parallelism where several paradigms coexist.
25It was is originally designed as standardized and portable message-passing system to work on a wide variety of parallel computing architectures.
26
27The last standard, MPI-3, uses a combination of techniques to achieve parallelism, Message Passing (MP), (Remote Memory Access (RMA) and Shared Memory (SM).
28We try here to give a uniform interface and abstractions for these features by means of wrapper function calls and concepts brought familiar to C++ and the STL.
29
30## Motivation: The problem with the standard interface
31
32A typical C-call for MP looks like this,
33
34```c++
35int status_send = MPI_Send(&numbers, 10, MPI_INT, 1, 0, MPI_COMM_WORLD);
36assert(status_send == MPI_SUCCESS);
37... // concurrently with
38int status_recv = MPI_Recv(&numbers, 10, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
39assert(status_recv == MPI_SUCCESS);
40```
41
42In principle this call can be made from a C++ program.
43However there are obvious drawbacks from using this standard interface.
44
45Here we enumerate some of problems,
46
47* Function calls have many arguments (e.g. 6 or 7 arguments in average)
48* Many mandatory arguments are redundant or could easily have a default natural value ( e.g. message tags are not always necessary).
49* Use of raw pointers and sizes, (e.g. `&number` and `1`)
50* Argument is type-erased by `void*`.
51* Only primitive types (e.g. `MPI_INT`) can be passed.
52* Consistency between pointer types and data-types is responsibility of the user.
53* Only contiguous memory blocks can be used with this interface.
54* Error codes are stored and had to be checked after each function call.
55* Use of handles (such as `MPI_COMM_WORLD`), handles do not have a well defined semantics.
56
57A call of this type would be an improvement:
58
59```c++
60world.send(numbers.begin(), numbers.end(), 1);
61... // concurrently with
62world.receive(numbers.begin(), numbers.end(), 0);
63```
64
65For other examples, see here: [http://mpitutorial.com/tutorials/mpi-send-and-receive/](http://mpitutorial.com/tutorials/mpi-send-and-receive/)
66
67MPI used to ship with a C++-style interfaces.
68It turns out that this interface was a very minimal change over the C version, and for good reasons it was dropped.
69
70The Boost.MPI3 library was designed to use simultaneously (interleaved) with the standard C interface of MPI.
71In this way, changes to existing code can be made incrementally.
72Mixing the standard C interface with the Boost.MPI3 is not complicated but requires more knowledge of the library internals than the one provided in this document.
73
74## Installation
75
76The library is "header-only"; no separate compilation is necessary.
77Most functions are inline or template functions.
78In order to compile it requires an MPI distribution (e.g. OpenMPI or MPICH2) and the corresponding compiler-wrapper (`mpic++` or `mpicxx`).
79Currently the library requieres C++14 (usually activated with the compiler option `-std=c++14`) and Boost. In particular it depends on Boost.Serialization and may require linking to this library if values passed are not basic types (`-lboost_serialization`). A typical compilation/run command looks like this:
80
81```bash
82$ mpic++ -std=c++14 -O3 mpi3/test/communicator_send.cpp -o communicator_send.x -lboost_serialization
83$ mpirun -n 8 ./communicator_send.x
84```
85
86In a system such as Red Hat, the dependencies can by installed by
87
88```bash
89$ dnf install gcc-c++ boost-devel openmpi-devel mpich-devel
90```
91
92The library is tested frequently against `openmpi` and `mpich`, and less frequently with `mvapich2`.
93
94## Testing
95
96The library has a basic `ctest` based testing system.
97
98```c++
99cd mpi3/test
100mkdir build; cd build
101cmake .. && make && ctest
102```
103
104## Initialization
105
106Like MPI, Boost.MPI3 requires some global library initialization.
107The library includes `mpi3/main.hpp` which wraps around this initialization steps and *simulates* a main function.
108In this way, a parallel program looks very much like normal programs, except that the main function has a third argument with the default global communicator passed in.
109
110```c++
111#include "mpi3/version.hpp"
112#include "mpi3/main.hpp"
113
114#include<iostream>
115
116namespace mpi3 = boost::mpi3;
117using std::cout;
118
119int mpi3::main(int argc, char* argv[], mpi3::communicator world){
120 if(world.rank() == 0) cout << mpi3::version() << '\n';
121 return 0;
122}
123```
124
125Here `world` is a communicator object that is a wrapper over MPI communicator handle.
126
127Changing the `main` program to this syntax in existing code can be too intrusive.
128For this reason a more traditional initialization is also possible.
129The alternative initialization is done by instantiating the `mpi3::environment` object (from with the global communicator `.world()` is extracted).
130
131```c++
132#include "mpi3/environment.hpp"
133int main(int argc, char** argv){
134 mpi3::environment env(argc, argv);
135 auto world = env.world(); // communicator is extracted from the environment
136 // ... code here
137 return 0;
138}
139```
140
141## Communicators
142
143In the last example, `world` is a global communicator (not necessarely the same as `MPI_COMM_WORLD`, but a copy of it).
144There is no global communicator variable `world` that can be accessed directly in a nested function.
145The idea behind this is to avoid using the global communicators in nested functions of the program unless they are explicitly passed in the function call.
146Communicators are usually passed by reference to nested functions.
147Even in traditional MPI it is a mistake to assume that the `MPI_COMM_WORLD` is the only available communicator.
148
149`mpi3::communicator` represent communicators with value-semantics.
150This means that `mpi3::communicator` can be copied or passed by reference.
151A communicator and their copies are different entities that compare equal.
152Communicators can be empty, in a state that is analogous to `MPI_COMM_NULL` but with proper value semantics.
153
154Like in MPI communicators can be duplicated (copied into a new instance) or split.
155They can be also compared.
156
157```c++
158mpi3::communicator world2 = world;
159assert( world2 == world );
160mpi3::communicator hemisphere = world/2;
161mpi3::communicator interleaved = world%2;
162```
163
164This program for example splits the global communicator in two sub-communicators one of size 2 (including process 0 and 1) and one with size 6 (including 2, 3, ... 7);
165
166```c++
167#include "mpi3/main.hpp"
168#include "mpi3/communicator.hpp"
169
170namespace mpi3 = boost::mpi3;
171using std::cout;
172
173int mpi3::main(int argc, char* argv[], mpi3::communicator world){
174 assert(world.size() == 8); // this program can only be run in 8 processes
175 mpi3::communicator comm = (world <= 1);
176 assert(!comm || (comm && comm.size() == 2));
177 return 0;
178}
179```
180
181Communicators give also index access to individual `mpi3::processes` ranging from `0` to `comm.size()`.
182For example, `world[0]` referrers to process 0 or the global communicator.
183An `mpi3::process` is simply a rank inside a communicator.
184This concept doesn't exist explicit in the standard C interface, but it simplifies the syntax for message passing.
185
186Splitting communicators can be done more traditionally via the `communicator::split` member function.
187
188Communicators are used to pass messages and to create memory windows.
189A special type of communicator is a shared-communicator `mpi3::shared_communicator`.
190
191## Message Passing
192
193This section describes the features related to the message passing (MP) functions in the MPI library.
194In C-MPI information is passed via pointers to memory.
195This is expected in a C-based interface and it is also very efficient.
196In Boost.MPI, information is passed exclusively by value semantics.
197Although there are optimizations that amortize the cost, we decided to generalize the pointer interface and leave the value-based message passing for a higher-level syntax.
198
199Here we replicate the design of STL to process information, that is, aggregated data is passed mainly via iterators. (Pointer is a type of iterator).
200
201For example in STL data is copied between ranges in this way.
202```c++
203std::copy(origin.begin(), origin.end(), destination.begin());
204```
205
206The caller of function copy doesn't need to worry about he type of the `origin` and `destination` containers, it can mix pointers and iterators and the function doesn't need more redundant information than the information passed.
207The programmer is responsible for managing the memory and making sure that design is such that the algorithm can access the data referred by the passed iterators.
208
209Contiguous iterators (to built-in types) are particularity efficient because they can be mapped to pointers at compile time. This in turn is translated into a MPI primitive function call.
210The interface for other type of iterators or contiguous iterators to non-build-in type are simulated, mainly via buffers and serialization.
211The idea behind this is that generic message passing function calls can be made to work with arbitrary data types.
212
213The main interface for message passing in Boost.MPI3 are member functions of the communicator.
214For example `communicator::send`, `::receive` and `::barrier`.
215The functions `::rank` and `::size` allows each process to determine their unique identity inside the communicator.
216
217```c++
218int mpi3::main(int argc, char* argv[], mpi3::communicator& world){
219 assert(world.size() == 2);
220 if(world.rank() == 0){
221 std::vector<double> v = {1.,2.,3.};
222 world.send(v.begin(), v.end(), 1); // send to rank 1
223 }else if(world.rank() == 1){
224 std::vector<double> v(3);
225 world.receive(v.begin(), v.end(), 0); // receive from rank 1
226 assert( v == std::vector{1.,2.,3.} );
227 }
228 world.barrier(); // synchronize execution here
229 return 0;
230}
231```
232
233Other important functions are `::gather`, `::broadcast` and `::accumulate`.
234This syntax has a more or less obvious (but simplified) mapping to the standard C-MPI interface.
235In Boost.MPI3 however all, these functions have reasonable defaults that make the function call shorted and less prone to errors and with the C-MPI interface.
236
237For more examples, look into `./mpi3/tests/`, `./mpi3/examples/` and `./mpi3/exercises/`.
238
239The interface described above is iterator based and is a direct generalization of the C-interface which works with pointers.
240If the iterators are contiguous and the associated value types are primitive MPI types, the function is directly mapped to the C-MPI call.
241
242Alternatively, value-based interface can be used.
243We will show the terse syntax, using the process objects.
244
245```c++
246int mpi3::main(int argc, char* argv[], mpi3::communicator& world){
247 assert(world.size() == 2);
248 if(world.rank() == 0){
249 double v = 5.;
250 world[1] << v;
251 }else if(world.rank() == 1){
252 double v = -1.;
253 world[0] >> v;
254 assert(v == 5.);
255 }
256 return 0;
257}
258```
259
260## Remote Memory Access
261
262Remote Memory (RM) is handled by `mpi3::window` objects.
263`mpi3::window`s are created by `mpi3::communicator` via a collective (member) functions.
264Since `mpi3::window`s represent memory, it cannot be copied (but can be moved).
265
266```c++
267mpi3::window w = world.make_window(begin, end);
268```
269
270Just like in the MPI interface, local access and remote access is synchronized by a `window::fence` call.
271Read and write remote access is performed via put and get functions.
272
273```c++
274w.fence();
275w.put(begin, end, rank);
276w.fence();
277```
278
279This is minimal example using `put` and `get` functions.
280
281```c++
282#include "mpi3/main.hpp"
283#include<iostream>
284
285namespace mpi3 = boost::mpi3; using std::cout;
286
287int mpi3::main(int, char*[], mpi3::communicator world){
288
289 std::vector<double> darr(world.rank()?0:100);
290 mpi3::window<double> w = world.make_window(darr.data(), darr.size());
291 w.fence();
292 if(world.rank() == 0){
293 std::vector<double> a = {5., 6.};
294 w.put(a.begin(), a.end(), 0);
295 }
296 world.barrier();
297 w.fence();
298 std::vector<double> b(2);
299 w.get(b.begin(), b.end(), 0);
300 w.fence();
301 assert( b[0] == 5.);
302 world.barrier();
303
304 return 0;
305}
306```
307
308In this example, memory from process 0 is shared across the communicator, and accessible through a common window.
309Process 0 writes (`window::put`s) values in the memory (this can be done locally or remotely).
310Later all processes read from this memory.
311`put` and `get` functions take at least 3 arguments (and at most 4).
312The first two is a range of iterators, while the third is the destination/source process rank (called "target_rank").
313
314Relevant examples and test are located in For more examples, look into `./mpi3/tests/`, `./mpi3/examples/` and `./mpi3/exercises/`.
315
316`mpi3::window`s may carry type information (as `mpi3::window<double>`) or not (`mpi3::window<>`)
317
318## Shared Memory
319
320Shared memory (SM) uses the underlying capability of the operating system to share memory from process within the same node.
321Historically shared memory has an interface similar to that of remove access.
322Only communicators that comprise a single node can be used to create a share memory window.
323A special type of communicator can be created by splitting a given communicator.
324
325`mpi3::shared_communicator node = world.split_shared();`
326
327If the job is launched in single node, `node` will be equal (congruent) to `world`.
328Otherwise the global communicator will be split into a number of (shared) communicators equal to the number of nodes.
329
330`mpi3::shared_communicator`s can create `mpi3::shared_window`s.
331These are special type of memory windows.
332
333```c++
334#include "mpi3/main.hpp"
335
336namespace mpi3 = boost::mpi3; using std::cout;
337
338int mpi3::main(int argc, char* argv[], mpi3::communicator& world){
339
340 mpi3::shared_communicator node = world.split_shared();
341 mpi3::shared_window<int> win = node.make_shared_window<int>(node.rank()==0?1:0);
342
343 assert(win.base() != nullptr and win.size<int>() == 1);
344
345 win.lock_all();
346 if(node.rank()==0) *win.base<int>(0) = 42;
347 for (int j=1; j != node.size(); ++j){
348 if(node.rank()==0) node.send_n((int*)nullptr, 0, j);//, 666);
349 else if(node.rank()==j) node.receive_n((int*)nullptr, 0, 0);//, 666);
350 }
351 win.sync();
352
353 int l = *win.base<int>(0);
354 win.unlock_all();
355
356 int minmax[2] = {-l,l};
357 node.all_reduce_n(&minmax[0], 2, mpi3::max<>{});
358 assert( -minmax[0] == minmax[1] );
359 cout << "proc " << node.rank() << " " << l << std::endl;
360
361 return 0;
362}
363```
364
365For more examples, look into `./mpi3/tests/`, `./mpi3/examples/` and `./mpi3/exercises/`.
366
367# Beyond MP, RMA and SHM
368
369MPI provides a very low level abstraction to inter-process communication.
370Higher level of abstractions can be constructed on top of MPI and by using the wrapper the works is simplified considerably.
371
372## Mutex
373
374Mutexes can be implemented fairly simply on top of RMA.
375Mutexes are used similarly than in threaded code,
376it prevents certain blocks of code to be executed by more than one process (rank) at a time.
377
378```c++
379#include "mpi3/main.hpp"
380#include "mpi3/mutex.hpp"
381
382#include<iostream>
383
384namespace mpi3 = boost::mpi3; using std::cout;
385
386int mpi3::main(int argc, char* argv[], mpi3::communicator& world){
387
388 mpi3::mutex m(world);
389 {
390 m.lock();
391 cout << "locked from " << world.rank() << '\n';
392 cout << "never interleaved " << world.rank() << '\n';
393 cout << "forever blocked " << world.rank() << '\n';
394 cout << std::endl;
395 m.unlock();
396 }
397 return 0;
398}
399```
400
401(Recursive mutexes are not implemented yet)
402
403Mutexes themselves can be used to implement atomic operations on data.
404
405# Ongoing work
406
407We are implementing memory allocators for remote memory, atomic classes and asynchronous remote function calls.
408Higher abstractions and use patterns will be implemented, specially those that fit into the patterns of the STL algorithms and containers.
409
410# Conclusion
411
412The goal is to provide a type-safe, efficient, generic interface for MPI.
413We achieve this by leveraging template code and classes that C++ provides.
414Typical low-level use patterns become extremely simple, and that exposes higher-level patterns.
415