1Overview 2======== 3 4.. currentmodule:: mpi4py.MPI 5 6MPI for Python provides an object oriented approach to message passing 7which grounds on the standard MPI-2 C++ bindings. The interface was 8designed with focus in translating MPI syntax and semantics of 9standard MPI-2 bindings for C++ to Python. Any user of the standard 10C/C++ MPI bindings should be able to use this module without need of 11learning a new interface. 12 13Communicating Python Objects and Array Data 14------------------------------------------- 15 16The Python standard library supports different mechanisms for data 17persistence. Many of them rely on disk storage, but *pickling* and 18*marshaling* can also work with memory buffers. 19 20The :mod:`pickle` modules provide user-extensible facilities to 21serialize general Python objects using ASCII or binary formats. The 22:mod:`marshal` module provides facilities to serialize built-in Python 23objects using a binary format specific to Python, but independent of 24machine architecture issues. 25 26*MPI for Python* can communicate any built-in or user-defined Python 27object taking advantage of the features provided by the :mod:`pickle` 28module. These facilities will be routinely used to build binary 29representations of objects to communicate (at sending processes), and 30restoring them back (at receiving processes). 31 32Although simple and general, the serialization approach (i.e., 33*pickling* and *unpickling*) previously discussed imposes important 34overheads in memory as well as processor usage, especially in the 35scenario of objects with large memory footprints being 36communicated. Pickling general Python objects, ranging from primitive 37or container built-in types to user-defined classes, necessarily 38requires computer resources. Processing is also needed for 39dispatching the appropriate serialization method (that depends on the 40type of the object) and doing the actual packing. Additional memory is 41always needed, and if its total amount is not known *a priori*, many 42reallocations can occur. Indeed, in the case of large numeric arrays, 43this is certainly unacceptable and precludes communication of objects 44occupying half or more of the available memory resources. 45 46*MPI for Python* supports direct communication of any object exporting 47the single-segment buffer interface. This interface is a standard 48Python mechanism provided by some types (e.g., strings and numeric 49arrays), allowing access in the C side to a contiguous memory buffer 50(i.e., address and length) containing the relevant data. This feature, 51in conjunction with the capability of constructing user-defined MPI 52datatypes describing complicated memory layouts, enables the 53implementation of many algorithms involving multidimensional numeric 54arrays (e.g., image processing, fast Fourier transforms, finite 55difference schemes on structured Cartesian grids) directly in Python, 56with negligible overhead, and almost as fast as compiled Fortran, C, 57or C++ codes. 58 59 60Communicators 61------------- 62 63In *MPI for Python*, `Comm` is the base class of communicators. The 64`Intracomm` and `Intercomm` classes are sublcasses of the `Comm` 65class. The `Comm.Is_inter` method (and `Comm.Is_intra`, provided for 66convenience but not part of the MPI specification) is defined for 67communicator objects and can be used to determine the particular 68communicator class. 69 70The two predefined intracommunicator instances are available: 71`COMM_SELF` and `COMM_WORLD`. From them, new communicators can be 72created as needed. 73 74The number of processes in a communicator and the calling process rank 75can be respectively obtained with methods `Comm.Get_size` and 76`Comm.Get_rank`. The associated process group can be retrieved from a 77communicator by calling the `Comm.Get_group` method, which returns an 78instance of the `Group` class. Set operations with `Group` objects 79like like `Group.Union`, `Group.Intersection` and `Group.Difference` 80are fully supported, as well as the creation of new communicators from 81these groups using `Comm.Create` and `Comm.Create_group`. 82 83New communicator instances can be obtained with the `Comm.Clone`, 84`Comm.Dup` and `Comm.Split` methods, as well methods 85`Intracomm.Create_intercomm` and `Intercomm.Merge`. 86 87Virtual topologies (`Cartcomm`, `Graphcomm` and `Distgraphcomm` 88classes, which are specializations of the `Intracomm` class) are fully 89supported. New instances can be obtained from intracommunicator 90instances with factory methods `Intracomm.Create_cart` and 91`Intracomm.Create_graph`. 92 93 94Point-to-Point Communications 95----------------------------- 96 97Point to point communication is a fundamental capability of message 98passing systems. This mechanism enables the transmission of data 99between a pair of processes, one side sending, the other receiving. 100 101MPI provides a set of *send* and *receive* functions allowing the 102communication of *typed* data with an associated *tag*. The type 103information enables the conversion of data representation from one 104architecture to another in the case of heterogeneous computing 105environments; additionally, it allows the representation of 106non-contiguous data layouts and user-defined datatypes, thus avoiding 107the overhead of (otherwise unavoidable) packing/unpacking 108operations. The tag information allows selectivity of messages at the 109receiving end. 110 111 112Blocking Communications 113^^^^^^^^^^^^^^^^^^^^^^^ 114 115MPI provides basic send and receive functions that are *blocking*. 116These functions block the caller until the data buffers involved in 117the communication can be safely reused by the application program. 118 119In *MPI for Python*, the `Comm.Send`, `Comm.Recv` and `Comm.Sendrecv` 120methods of communicator objects provide support for blocking 121point-to-point communications within `Intracomm` and `Intercomm` 122instances. These methods can communicate memory buffers. The variants 123`Comm.send`, `Comm.recv` and `Comm.sendrecv` can communicate general 124Python objects. 125 126Nonblocking Communications 127^^^^^^^^^^^^^^^^^^^^^^^^^^ 128 129On many systems, performance can be significantly increased by 130overlapping communication and computation. This is particularly true 131on systems where communication can be executed autonomously by an 132intelligent, dedicated communication controller. 133 134MPI provides *nonblocking* send and receive functions. They allow the 135possible overlap of communication and computation. Non-blocking 136communication always come in two parts: posting functions, which begin 137the requested operation; and test-for-completion functions, which 138allow to discover whether the requested operation has completed. 139 140In *MPI for Python*, the `Comm.Isend` and `Comm.Irecv` methods 141initiate send and receive operations, respectively. These methods 142return a `Request` instance, uniquely identifying the started 143operation. Its completion can be managed using the `Request.Test`, 144`Request.Wait` and `Request.Cancel` methods. The management of 145`Request` objects and associated memory buffers involved in 146communication requires a careful, rather low-level coordination. Users 147must ensure that objects exposing their memory buffers are not 148accessed at the Python level while they are involved in nonblocking 149message-passing operations. 150 151Persistent Communications 152^^^^^^^^^^^^^^^^^^^^^^^^^ 153 154Often a communication with the same argument list is repeatedly 155executed within an inner loop. In such cases, communication can be 156further optimized by using persistent communication, a particular case 157of nonblocking communication allowing the reduction of the overhead 158between processes and communication controllers. Furthermore , this 159kind of optimization can also alleviate the extra call overheads 160associated to interpreted, dynamic languages like Python. 161 162In *MPI for Python*, the `Comm.Send_init` and `Comm.Recv_init` methods 163create persistent requests for a send and receive operation, 164respectively. These methods return an instance of the `Prequest` 165class, a subclass of the `Request` class. The actual communication can 166be effectively started using the `Prequest.Start` method, and its 167completion can be managed as previously described. 168 169 170Collective Communications 171-------------------------- 172 173Collective communications allow the transmittal of data between 174multiple processes of a group simultaneously. The syntax and semantics 175of collective functions is consistent with point-to-point 176communication. Collective functions communicate *typed* data, but 177messages are not paired with an associated *tag*; selectivity of 178messages is implied in the calling order. Additionally, collective 179functions come in blocking versions only. 180 181The more commonly used collective communication operations are the 182following. 183 184* Barrier synchronization across all group members. 185 186* Global communication functions 187 188 + Broadcast data from one member to all members of a group. 189 190 + Gather data from all members to one member of a group. 191 192 + Scatter data from one member to all members of a group. 193 194* Global reduction operations such as sum, maximum, minimum, etc. 195 196In *MPI for Python*, the `Comm.Bcast`, `Comm.Scatter`, `Comm.Gather`, 197`Comm.Allgather`, `Comm.Alltoall` methods provide support for 198collective communications of memory buffers. The lower-case variants 199`Comm.bcast`, `Comm.scatter`, `Comm.gather`, `Comm.allgather` and 200`Comm.alltoall` can communicate general Python objects. The vector 201variants (which can communicate different amounts of data to each 202process) `Comm.Scatterv`, `Comm.Gatherv`, `Comm.Allgatherv`, 203`Comm.Alltoallv` and `Comm.Alltoallw` are also supported, they can 204only communicate objects exposing memory buffers. 205 206Global reducion operations on memory buffers are accessible through 207the `Comm.Reduce`, `Comm.Reduce_scatter`, `Comm.Allreduce`, 208`Intracomm.Scan` and `Intracomm.Exscan` methods. The lower-case 209variants `Comm.reduce`, `Comm.allreduce`, `Intracomm.scan` and 210`Intracomm.exscan` can communicate general Python objects; however, 211the actual required reduction computations are performed sequentially 212at some process. All the predefined (i.e., `SUM`, `PROD`, `MAX`, etc.) 213reduction operations can be applied. 214 215 216Support for GPU-aware MPI 217------------------------- 218 219Several MPI implementations, including Open MPI and MVAPICH, support 220passing GPU pointers to MPI calls to avoid explict data movement 221between the host and the device. On the Python side, GPU arrays have 222been implemented by many libraries that need GPU computation, such as 223CuPy, Numba, PyTorch, and PyArrow. In order to increase library 224interoperability, two kinds of zero-copy data exchange protocols are 225defined and agreed upon: `DLPack`_ and `CUDA Array Interface`_. For 226example, a CuPy array can be passed to a Numba CUDA-jit kernel. 227 228.. _DLPack: https://data-apis.org/array-api/latest/design_topics/data_interchange.html 229.. _CUDA Array Interface: https://numba.readthedocs.io/en/stable/cuda/cuda_array_interface.html 230 231*MPI for Python* provides an experimental support for GPU-aware MPI. 232This feature requires: 233 2341. mpi4py is built against a GPU-aware MPI library. 235 2362. The Python GPU arrays are compliant with either of the protocols. 237 238See the :doc:`tutorial` section for further information. We note that 239 240* Whether or not a MPI call can work for GPU arrays depends on the 241 underlying MPI implementation, not on mpi4py. 242 243* This support is currently experimental and subject to change in the 244 future. 245 246 247Dynamic Process Management 248-------------------------- 249 250In the context of the MPI-1 specification, a parallel application is 251static; that is, no processes can be added to or deleted from a 252running application after it has been started. Fortunately, this 253limitation was addressed in MPI-2. The new specification added a 254process management model providing a basic interface between an 255application and external resources and process managers. 256 257This MPI-2 extension can be really useful, especially for sequential 258applications built on top of parallel modules, or parallel 259applications with a client/server model. The MPI-2 process model 260provides a mechanism to create new processes and establish 261communication between them and the existing MPI application. It also 262provides mechanisms to establish communication between two existing 263MPI applications, even when one did not *start* the other. 264 265In *MPI for Python*, new independent process groups can be created by 266calling the `Intracomm.Spawn` method within an intracommunicator. 267This call returns a new intercommunicator (i.e., an `Intercomm` 268instance) at the parent process group. The child process group can 269retrieve the matching intercommunicator by calling the 270`Comm.Get_parent` class method. At each side, the new 271intercommunicator can be used to perform point to point and collective 272communications between the parent and child groups of processes. 273 274Alternatively, disjoint groups of processes can establish 275communication using a client/server approach. Any server application 276must first call the `Open_port` function to open a *port* and the 277`Publish_name` function to publish a provided *service*, and next call 278the `Intracomm.Accept` method. Any client applications can first find 279a published *service* by calling the `Lookup_name` function, which 280returns the *port* where a server can be contacted; and next call the 281`Intracomm.Connect` method. Both `Intracomm.Accept` and 282`Intracomm.Connect` methods return an `Intercomm` instance. When 283connection between client/server processes is no longer needed, all of 284them must cooperatively call the `Comm.Disconnect` 285method. Additionally, server applications should release resources by 286calling the `Unpublish_name` and `Close_port` functions. 287 288 289One-Sided Communications 290------------------------ 291 292One-sided communications (also called *Remote Memory Access*, *RMA*) 293supplements the traditional two-sided, send/receive based MPI 294communication model with a one-sided, put/get based 295interface. One-sided communication that can take advantage of the 296capabilities of highly specialized network hardware. Additionally, 297this extension lowers latency and software overhead in applications 298written using a shared-memory-like paradigm. 299 300The MPI specification revolves around the use of objects called 301*windows*; they intuitively specify regions of a process's memory that 302have been made available for remote read and write operations. The 303published memory blocks can be accessed through three functions for 304put (remote send), get (remote write), and accumulate (remote update 305or reduction) data items. A much larger number of functions support 306different synchronization styles; the semantics of these 307synchronization operations are fairly complex. 308 309In *MPI for Python*, one-sided operations are available by using 310instances of the `Win` class. New window objects are created by 311calling the `Win.Create` method at all processes within a communicator 312and specifying a memory buffer . When a window instance is no longer 313needed, the `Win.Free` method should be called. 314 315The three one-sided MPI operations for remote write, read and 316reduction are available through calling the methods `Win.Put`, 317`Win.Get`, and `Win.Accumulate` respectively within a `Win` instance. 318These methods need an integer rank identifying the target process and 319an integer offset relative the base address of the remote memory block 320being accessed. 321 322The one-sided operations read, write, and reduction are implicitly 323nonblocking, and must be synchronized by using two primary modes. 324Active target synchronization requires the origin process to call the 325`Win.Start` and `Win.Complete` methods at the origin process, and 326target process cooperates by calling the `Win.Post` and `Win.Wait` 327methods. There is also a collective variant provided by the 328`Win.Fence` method. Passive target synchronization is more lenient, 329only the origin process calls the `Win.Lock` and `Win.Unlock` 330methods. Locks are used to protect remote accesses to the locked 331remote window and to protect local load/store accesses to a locked 332local window. 333 334 335Parallel Input/Output 336--------------------- 337 338The POSIX standard provides a model of a widely portable file 339system. However, the optimization needed for parallel input/output 340cannot be achieved with this generic interface. In order to ensure 341efficiency and scalability, the underlying parallel input/output 342system must provide a high-level interface supporting partitioning of 343file data among processes and a collective interface supporting 344complete transfers of global data structures between process memories 345and files. Additionally, further efficiencies can be gained via 346support for asynchronous input/output, strided accesses to data, and 347control over physical file layout on storage devices. This scenario 348motivated the inclusion in the MPI-2 standard of a custom interface in 349order to support more elaborated parallel input/output operations. 350 351The MPI specification for parallel input/output revolves around the 352use objects called *files*. As defined by MPI, files are not just 353contiguous byte streams. Instead, they are regarded as ordered 354collections of *typed* data items. MPI supports sequential or random 355access to any integral set of these items. Furthermore, files are 356opened collectively by a group of processes. 357 358The common patterns for accessing a shared file (broadcast, scatter, 359gather, reduction) is expressed by using user-defined datatypes. 360Compared to the communication patterns of point-to-point and 361collective communications, this approach has the advantage of added 362flexibility and expressiveness. Data access operations (read and 363write) are defined for different kinds of positioning (using explicit 364offsets, individual file pointers, and shared file pointers), 365coordination (non-collective and collective), and synchronism 366(blocking, nonblocking, and split collective with begin/end phases). 367 368In *MPI for Python*, all MPI input/output operations are performed 369through instances of the `File` class. File handles are obtained by 370calling the `File.Open` method at all processes within a communicator 371and providing a file name and the intended access mode. After use, 372they must be closed by calling the `File.Close` method. Files even 373can be deleted by calling method `File.Delete`. 374 375After creation, files are typically associated with a per-process 376*view*. The view defines the current set of data visible and 377accessible from an open file as an ordered set of elementary 378datatypes. This data layout can be set and queried with the 379`File.Set_view` and `File.Get_view` methods respectively. 380 381Actual input/output operations are achieved by many methods combining 382read and write calls with different behavior regarding positioning, 383coordination, and synchronism. Summing up, *MPI for Python* provides 384the thirty (30) methods defined in MPI-2 for reading from or writing 385to files using explicit offsets or file pointers (individual or 386shared), in blocking or nonblocking and collective or noncollective 387versions. 388 389Environmental Management 390------------------------ 391 392Initialization and Exit 393^^^^^^^^^^^^^^^^^^^^^^^ 394 395Module functions `Init` or `Init_thread` and `Finalize` provide MPI 396initialization and finalization respectively. Module functions 397`Is_initialized` and `Is_finalized` provide the respective tests for 398initialization and finalization. 399 400.. note:: 401 402 :c:func:`MPI_Init` or :c:func:`MPI_Init_thread` is actually called 403 when you import the :mod:`~mpi4py.MPI` module from the 404 :mod:`mpi4py` package, but only if MPI is not already 405 initialized. In such case, calling `Init` or `Init_thread` from 406 Python is expected to generate an MPI error, and in turn an 407 exception will be raised. 408 409.. note:: 410 411 :c:func:`MPI_Finalize` is registered (by using Python C/API 412 function :c:func:`Py_AtExit`) for being automatically called when 413 Python processes exit, but only if :mod:`mpi4py` actually 414 initialized MPI. Therefore, there is no need to call `Finalize` 415 from Python to ensure MPI finalization. 416 417Implementation Information 418^^^^^^^^^^^^^^^^^^^^^^^^^^ 419 420* The MPI version number can be retrieved from module function 421 `Get_version`. It returns a two-integer tuple ``(version, 422 subversion)``. 423 424* The `Get_processor_name` function can be used to access the 425 processor name. 426 427* The values of predefined attributes attached to the world 428 communicator can be obtained by calling the `Comm.Get_attr` method 429 within the `COMM_WORLD` instance. 430 431Timers 432^^^^^^ 433 434MPI timer functionalities are available through the `Wtime` and 435`Wtick` functions. 436 437Error Handling 438^^^^^^^^^^^^^^ 439 440In order facilitate handle sharing with other Python modules 441interfacing MPI-based parallel libraries, the predefined MPI error 442handlers `ERRORS_RETURN` and `ERRORS_ARE_FATAL` can be assigned to and 443retrieved from communicators using methods `Comm.Set_errhandler` and 444`Comm.Get_errhandler`, and similarly for windows and files. 445 446When the predefined error handler `ERRORS_RETURN` is set, errors 447returned from MPI calls within Python code will raise an instance of 448the exception class `Exception`, which is a subclass of the standard 449Python exception `python:RuntimeError`. 450 451.. note:: 452 453 After import, mpi4py overrides the default MPI rules governing 454 inheritance of error handlers. The `ERRORS_RETURN` error handler is 455 set in the predefined `COMM_SELF` and `COMM_WORLD` communicators, 456 as well as any new `Comm`, `Win`, or `File` instance created 457 through mpi4py. If you ever pass such handles to C/C++/Fortran 458 library code, it is recommended to set the `ERRORS_ARE_FATAL` error 459 handler on them to ensure MPI errors do not pass silently. 460 461.. warning:: 462 463 Importing with ``from mpi4py.MPI import *`` will cause a name 464 clashing with the standard Python `python:Exception` base class. 465