1 MPE (Multi-Processing Environment) 2 ---------------------------------- 3 4 Version 2.4.6. March, 2008 5 6 Mathematics and Computer Science Division 7 Argonne National Laboratory 8 9I. INTRODUCTION 10---------------- 11 12The Multi-Processing Environment (MPE) attempts to provide programmers with 13a complete suite of performance analysis tools for their MPI programs based 14on post processing approach. These tools include a set of profiling libraries, 15a set of utility programs, and a set of graphical tools. 16 17The first set of tools to be used with user MPI programs is profiling libraries 18which provide a collection of routines that create log files. These log files 19can be created manually by inserting MPE calls in the MPI program, or 20automatically by linking with the appropriate MPE libraries, or by combining 21the above two methods. Currently, the MPE offers the following 4 profiling 22libraries. 23 24 1) Tracing Library - Traces all MPI calls. Each MPI call is preceded by a 25 line that contains the rank in MPI_COMM_WORLD of the calling process, 26 and followed by another line indicating that the call has completed. 27 Most send and receive routines also indicate the values of count, tag, 28 and partner (destination for sends, source for receives). Output is to 29 standard output. 30 31 2) Animation Libraries - A simple form of real-time program animation 32 that requires X window routines. 33 34 3) Logging Libraries - The most useful and widely used profiling libraries 35 in MPE. These libraries form the basis to generate log files from 36 user MPI programs. There are several different log file formats 37 available in MPE. The default log file format is CLOG2. It is a low 38 overhead logging format, a simple collection of single timestamp events. 39 The old format ALOG, which is not being developed for years, is not 40 distributed here. The powerful visualization format is SLOG-2, stands 41 for Scalable LOGfile format version II which is a total redesign of the 42 original SLOG format. SLOG-2 allows for much improved scalability for 43 visualization purpose. CLOG2 file can be easily converted to 44 SLOG-2 file through the new SLOG-2 viewer, Jumpshot-4. The MPI logging 45 library is now thread-safe through the use of a global mutex over the 46 the MPE logging library which is not yet thread-safe. 47 48 4) Collective and datatype checking library - An argument consistency 49 checking library for MPI collective calls. It checks for datatype, 50 root, and various argument consistency in MPI collective calls. 51 If an error is detected, a backtrace of the callstack (on the supported 52 platform) will be printed to locate the offended call. 53 54The set of utility programs in MPE includes log format converter (e.g. 55clogTOslog2), logfile print (e.g. slog2print) and logfile viewer and 56convertor (e.g. jumpshot). These new tools, clog2TOslog2, slog2print and 57jumpshot(Jumpshot-4) replace old tools, clog2slog, slog_print and logviewer 58(i.e. Jumpshot-2 and Jumpshot-3). For more information of various 59logfile formats and their viewers, see 60 61http://www.mcs.anl.gov/perfvis 62 63 64 65II. CONFIGURATION 66----------------- 67 68MPE can be configured and installed as an extension to most MPI standard 69-compliant MPI implementations, e.g. MPICH-2, MPICH, OpenMPI, LAM/MPI, 70SGI's MPI, HP-UX's MPI, IBM's MPI, Cray's MPI and NEC's MPI. 71MPE has been integrated into MPICH and MPE2 has been integrated seamlessly 72into MPICH-2, so MPEx will be installed automatically during MPICHx's 73installation process. 74 75For details of configuration of MPE, see the INSTALL or INSTALL.cross file. 76 77 78 79III. INSTALLATION INSTRUCTIONS 80------------------------------- 81 82For details of installation instruction/examples of MPE, see the INSTALL file. 83 84 85 86IV. EXAMPLE PROGRAMS 87---------------------- 88 89As previously noted, the MPE library is composed of 3 different profiling 90libraries. During configure, the compiler's library linkage flags and 91appropriate libraries are determined. These variables are first substituted 92in the Makefiles in the directories, mpe2/src/wrappers/test, 93mpe2/src/graphics/contrib/test and mpe2/src/collchk/test. The Makefiles for 94mpe2/src/wrappers/test and mpe2/src/graphics/contrib/test are then installed 95into directory share/ as examples/logging/ and examples/graphics/ during the 96final installation process. The following are some of the crucial variables: 97 98LOG_LIBS = library flag that links with the logging libraries 99TRACE_LIBS = library flag that links with the tracing library 100ANIM_LIBS = library flag that links with the animation library 101COLLCHK_LIBS = library flag that links with the collective and datatype 102 checking library 103 104The variable FLIB_PATH is the compiler's library path needed to link fortran 105MPI programs with the logging library. 106 107During make, small test programs cpi.c, cpilog.c and fpilog.f will be linked 108with each of the above libraries. In the output from Make, a message will be 109printed to indicate the status of each attempted link test. The success 110of these linkage tests will also be included in the Make output. If the 111linkage tests are successful, then these library linkage flags can be used 112for your programs as well. 113 114The following example programs are also included in the distribution: 115 116 mpe/src/graphics/contrib/mandel is a Mandelbrot program that uses the MPE 117 graphics package. 118 119 mpe/src/graphics/contrib/mastermind is a program for solving the Mastermind 120 puzzle in parallel. 121 122These programs should work on all MPI implementations, but have not been 123extensively tested. 124 125 126 127 128 129V. MPEINSTALL 130-------------- 131 132A 'mpeinstall' script is created during configuration. If configuring with 133MPICH and MPICH2, then the 'mpiinstall' script will invoke the 'mpeinstall' 134script. However, 'mpeinstall' can also be used by itself. This is only 135optional and is of use only if you wish to install the MPE library in a 136public place so that others may use it. Final install directory will 137consist of an include, lib, bin, sbin and share subdirectories. Examples 138and various logfile viewers will be installed under share. 139 140 141 142 143 144VI. USAGE 145--------- 146 147The final install directory contains the following subdirectories. 148 149 include/ contains all the include files that user program needs to read. 150 lib/ contains all the libraries that user program needs to link with. 151 bin/ contains all the utility programs that user needs to use. 152 doc/ contains available MPE documentation, e.g. Jumpshot-4's userguide. 153 sbin/ contains the MPE uninstall script to uninstall the installation. 154 share/ contains user read-only data. Besides share/examples/logging/, 155 share/examples/graphics/, share/examples/collchk, and 156 share/examples/logfiles, user usually does NOT need to know 157 the details of other subdirectories. 158 159In terms of usage of MPE, user usually only need to know about the files 160that have been installed in include/, lib/ and bin/. 161 162 163 164VI. a) CUSTOMIZING LOGFILES 165--------------------------- 166 167In addition to using the predefined MPE logging libraries to log all MPI 168calls, MPE logging calls can be inserted into user's MPI program to define 169and log states. These states are called User-Defined states. States may 170be nested, allowing one to define a state describing a user routine that 171contains several MPI calls, and display both the user-defined state and 172the MPI operations contained within it. 173 174The simplest way to insert user-defined states is as follows: 1751) Get handles from MPE logging library: MPE_Log_get_state_eventIDs() 176 has to be used to get unique event IDs (MPE logging handles). 177 This is important if you are writing a library that uses 178 the MPE logging routines from the MPE system. 179 180 PS. Hardwiring the eventIDs is considered a bad idea since it may cause 181 eventID confict and so the practice isn't supported. Older MPE libraries 182 provide MPE_Log_get_event_number() which is still being supported but 183 has been deprecated. Users are strongly urged to use 184 MPE_Log_get_state_eventIDs() instead. 1852) Set the logged state's characteristics: MPE_Describe_state() sets the 186 name and color of the states. 1873) Log the events of the logged states: MPE_Log_event() are called twice 188 to log the user-defined states. 189 190Below is a simple example that uses the 3 steps outlined above. 191 192\begin{verbatim} 193 194int eventID_begin, eventID_end; 195... 196MPE_Log_get_state_eventIDs( &eventID_begin, &eventID_end ); 197... 198MPE_Describe_state( eventID_begin, eventID_end, "Multiplication", "red" ); 199... 200MyAmult( Matrix m, Vector v ) 201{ 202 /* Log the start event of the red "Multiplication" state */ 203 MPE_Log_event( eventID_begin, 0, NULL ); 204 ... 205 ... Amult code, including MPI calls ... 206 ... 207 /* Log the end event of the red "Multiplication" state */ 208 MPE_Log_event( eventID_end, 0, NULL ); 209} 210 211\end{verbatim} 212 213The logfile generated by this code will have the MPI routines nested within 214the routine MyAmult(). 215 216Besides user-defined states, MPE2 also provides support for user-defined 217events which can be defined through use of MPE_Log_get_solo_eventID() 218and MPE_Describe_event. For more details, e.g. see cpilog.c. 219 220If the MPE logging library, liblmpe.a, is NOT linked with the user program, 221MPE_Init_log() and MPE_Finish_log() need to be used before and after all 222the MPE calls. Sample programs cpilog.c and fpilog.f are available to 223illustrate the use of these MPE routines. They are in the MPE 224source directory, mpe2/src/wrappers/test or the installed directory, 225share/examples/logging to illustrate the use of these MPE routines. 226For futher linking information, see section "Convenient Compiler Wrappers". 227 228For undefined user-defined state, i.e. corresponding MPE_Describe_state() 229has not been issued, new jumpshot (Jumpshot-4) may display the legend name as 230"UnknownType-INDEX" where INDEX is the internal MPE category index. 231 232 233 234VI. b) ENVIRONMENTAL VARIABLES 235------------------------------ 236 237For MPE logging, MPE_TMPDIR and MPE_LOGFILE_PREFIX are 2 environment variables 238that most users find to be very useful. So it is recommended to set these 2392 env. variables before launching the MPI program during logging : 240 241CLOG_BLOCK_SIZE: The integer value determines the clog2 buffer block size 242 which set the least minimum clog2 file size. If 243 CLOG_BLOCK_SIZE is not set, 64K per block is assumed. 244 245CLOG_BUFFERED_BLOCKS: The integer value determines the number of blocks 246 witin the CLOG2's internal buffer. Together with 247 CLOG_BLOCK_SIZE, CLOG_BUFFERED_BLOCKS determines how 248 often the internal buffer is flushed to the disk. 249 The total buffer size is determined by the product of 250 CLOG_BLOCK_SIZE and CLOG_BUFFERED_BLOCKS. These 2 251 environmental variables allows user to minimize MPE2 252 logging overhead when large local memory is available. 253 The default value is 128. 254 255MPE_TMPDIR: MPE_TMPDIR takes precedence over TMPDIR. It specifies a 256 directory to be used as temporary storage for each process. 257 By default, when MPE_TMPDIR and TMPDIR are NOT set, 258 /tmp will be used. When user needs to generate a very large 259 logfile for long-running MPI job, user needs to make sure that 260 MPE_TMPDIR(or TMPDIR) is big enough to hold the temporary local 261 logfile which will be deleted if the merged logfile can be 262 created successfully. In order to minimize the overhead of the 263 logging to the MPI program, it is highly recommended user to 264 use a *local* file system for TMPDIR. 265 266 Note : The final merged logfile will be written back to the 267 file system where process 0 is. 268 269MPE_SAME_TMPDIR: The boolean value determines whether MPE_TMPDIR will be 270 the same across the whole MPI_COMM_WORLD. By default, 271 MPE_SAME_TMPDIR is set to true and only rank 0's MPE_TMPDIR 272 will be broadcasted to every process. There is scalability 273 implication on whether MPE_SAME_TMPDIR is true. When 274 MPE_SAME_TMPDIR=false, MPE_TMPDIR could be set differently 275 on different processes and hence mkstemp() will be called 276 once on each process to create temporary filename. Calling 277 mkstemp() on the same filesystem, e.g. $HOME, for each process 278 is not scalable, i.e. the shared filesystem may be hanging. 279 MPE_SAME_TMPDIR=false is provided for cases that MPE_TMPDIR 280 has to be set differently on different process but it is 281 in general not scalable when MPI_COMM_WORLD's size >> 1K. 282 283MPE_DELETE_LOCALFILE: The boolean value determines whether to delete the 284 temporary local clog2 file. When this flag is 285 set to true, user needs to collect from the temporary 286 clog2 files from each slave node's MPE_TMPDIR. 287 Then separate serial programs, clog2_join and 288 clog2_repair, can be used to merge the local clog2 289 files. This process is useful e.g. when MPI_Finalize() 290 fails to complete properly, e.g. due to user program 291 overwritten to MPE/MPI internal data structures. 292 293MPE_CLOCKS_SYNC: The boolean value determines the behavior of 294 MPE_Log_sync_clocks() and the default clock synchronization 295 at the end of logging. Users may way to force MPE 296 clock synchronization when the MPI implementation has 297 buggy clock synchronization mechanism, e.g. Some versions 298 of BG/L MPI's MPI_WTIME_IS_GLOBAL is incorrectly set 299 to true when 64-ways or 256-ways partition is used. 300 301MPE_SYNC_ALGORITHM: specifies the clock synchronization algorithm. The 302 accepted values are "DEFAULT", "SEQ", "BITREE" 303 and "ALTNGBR". 304 SEQ: a O(N) steps algorithm and is non-scalable and slowest 305 but is also the most accurate. 306 BITREE: a O(log2(N)) steps algorithm, scalable and much 307 faster than SEQ but less accurate than SEQ. 308 A good compromise. 309 ALTNGBR: a O(1) steps algorithm, perfectly parallel 310 is the fastest of 3 algorithms supported. 311 It is also the least accurate. 312 DEFAULT: uses SEQ when the number of processes <= 16. 313 uses BITREE when number of processes > 16. 314 315MPE_SYNC_FREQUENCY: specifies the number of iterations of selected clock 316 synchronization. In general, the higher of 317 MPI_SYNC_FREQUENCY, the higher the probability of 318 obtaining a accurate measurement of all the clocks, 319 i.e. less error. Keep in mind, this is generally 320 not a guarantee and is highly dependent of the system 321 noise. The default is 3. 322 323MPE_LOGFILE_PREFIX: specifies the filename prefix of the output logfile. 324 The file extension will be determined by the output 325 logfile format, i.e. MPE_LOG_FORMAT. 326 327MPE_LOG_FORMAT: determines the format of the logfile generated from the 328 execution of application linked with MPE logging libraries. 329 The allowed value for MPE_LOG_FORMAT is CLOG2 only. 330 So there is no need to use this variable at the moment. 331 332MPE_LOG_OVERHEAD: The boolean value determines to log MPE/CLOG2's internal 333 profiling state CLOG_Buffer_write2disk(). The default 334 setting is yes. CLOG_Buffer_write2disk labels region 335 in each process that MPE/CLOG2 spends on flushing logging 336 data in the memory to the disk. The frequency and location 337 of CLOG_Buffer_write2disk state can be altered by changing 338 CLOG_BLOCK_SIZE and/or CLOG_BUFFERED_BLOCKS. 339 340MPE_LOG_RANK2PROCNAME: The boolean value determines if a .clog2.pnm file 341 that contains the MPI_COMM_WORLD rank to processor 342 name determined by MPI_Get_processor_name(). The 343 default is No. 344 345MPE_WRAPPERS_ADD_LDFLAGS: The variable tells MPE wrappers, mpecc/mpefc 346 (includes mpich2's mpicc and friends), to use 347 LDFLAGS added by MPE, e.g. -Wl,--export-dynamic. 348 The default is yes. User can override it by 349 setting it to "no" or "false". 350 351MPE_USE_FCONSTS_IN_MPIH : The boolean value determines if MPI_F_* constants 352 from mpi.h will be used. By default, MPE computes 353 the MPI_F_* constants instead of reading from mpi.h. 354 The affected constants are MPI_F_STATUS(ES)_IGNORE. 355 356Possible boolean values are "true", "false", "yes" and "no" in either 357all lower or upper cases. 358 359 360For MPE X11 graphics, environment variables DISPLAY set in each process 361is read during MPE_Open_graphics. 362 363DISPLAY: determines where MPE X11 graphics on each process is connected to. 364 365 366 367VI. c) EXAMPLE MAKEFILE 368----------------------- 369 370The install directories, share/examples/logging, share/examples/graphics and 371share/examples/collchk contain some very useful and simple example programs. 372The Makefiles in these directories illustrate the usage of MPE routines 373and how to link with various MPE libraries. In most cases, users can simply 374copy the share/examples/logging/Makefile to their home directory, and do a 375"make" to compile the suggested targets. Users don't need to copy the 376.c and .f files when MPE has been compiled with a MAKE that has VPATH 377support. The created executables can be launched with mpiexec or mpirun 378from the MPI implementation to generate sample logfiles. 379 380 381 382VI. d) UTILITY PROGRAMS 383----------------------- 384 385In bin/, user can find several useful utility programs when manipulating 386logfiles. These includes log format converters, log format print programs, 387and logfile display program, 388 389 390Log Format Converters 391--------------------- 392 393clog2TOslog2 : a CLOG2 to SLOG-2 logfile convertor. For more details, 394 do "clog2TOslog2 -h". 395 396rlogTOslog2 : a RLOG to SLOG-2 logfile convertor. For more details, 397 do "rlogTOslog2 -h". Where RLOG is an internal MPICH2 logging 398 format. 399 400logconvertor : a standalone GUI based convertor that invokes clog2TOslog2 401 or rlogTOslog2 based on logfile extension. The GUI also 402 shows the progress of the conversion. The same convertor 403 can be invoked from within the logfile viewer, jumpshot. 404 405slog2filter : a SLOG-2 to SLOG-2 logfile convertor. It allows for removal 406 unwanted categories (when used with slog2print -c). It also 407 allows for changing of the SLOG-2 internal structure, e.g. 408 modify the duration of preview drawable. The tool reads 409 and writes SLOG-2 file of same version. 410 411slog2updater: a SLOG-2 file format update utility. It is essentially 412 a slog2filter that reads in older SLOG-2 file and writes 413 out the latest SLOG-2 file format. 414 415 416Log Format Print Programs 417------------------------- 418 419clog2_print : a stdout print program for CLOG file. 420 Java version is named as clogprint. 421 422clog2_join : a clog2 serial merging program that merges clog2 files 423 1) temporary local clog2 files which all are from the 424 same MPI_COMM_WORLD. 425 2) merged clog2 files from each MPI_COMM_WORLDs 426 (Incomplete!, timestamps are not sync'ed yet.) 427 428clog2_repair : a clog2 repair program that tries to fix the missing data 429 of a clog2 file (when the MPI program that is being profiled 430 aborts) so that the file can be processed by other tools 431 like clog2TOslog2. 432 433rlog_print : a stdout print program for SLOG-2 file. 434 435slog2print : a stdout print program for SLOG-2 file. 436 437 438 439Log File Display Program 440------------------------ 441 442jumpshot : the Jumpshot-4 launcher script. Jumpshot-4 does logfile 443 conversion as well as visualization. 444 445To view a logfile, say fpilog.slog2, do 446 447jumpshot fpilog.slog2 448 449The command will select and invoke Jumpshot-4 to display the content 450of SLOG-2 file if Jumpshot-4 has been built and installed successfully. 451 452One can also do 453 454jumpshot fpilog.clog2 455 456or 457 458jumpshot barrier.rlog 459 460Both will invoke the logfile convertor first before visualization. 461 462 463Collective and Datatype Checking 464-------------------------------- 465 466Linking an MPI application with the collective and datatype checking library 467as follows 468 469mpicc -o mpi_pgm *.o -L<mpe2_libdir> -lmpe_collchk. 470 471Or using compiler wrappers (more details in next section), e.g. 472 473(with mpich2's compiler wrapper) 474mpicc -mpe=mpicheck -o mpi_pgm *.o 475 476(with MPE's compiler wrapper) 477mpecc -mpe=mpicheck -o mpi_pgm *.o 478 479 480Convenient Compiler Wrappers 481---------------------------- 482 483Standalone MPE installation with non-MPICH2 will see 2 convenient compiler 484wrappers mpecc and mpefc which mimic the typical usage of mpicc/mpif77/mpif90 485in MPICH* by providing convenient compilation and linking switches. 486mpecc is for C program, and mpefc is for Fortran program. Typically, 487user can use mpecc as follows: 488 489mpecc -mpilog -o cpilog cpilog.c 490 491which is equivalent to doing 492 493mpicc -o cpilog cpilog.c -L<mpe2_libdir> -llmpe -lmpe 494 495Available MPE profiling options for "mpecc" and "mpefc" are as follows: 496 497 -mpilog : Automatic MPI and MPE user-defined states logging. 498 This links against -llmpe -lmpe. 499 500 -mpitrace : Trace MPI program with printf. 501 This links against -ltmpe. 502 503 -mpianim : Animate MPI program in real-time. 504 This links against -lampe -lmpe. 505 506 -mpicheck : Check MPI Program with the Collective & Datatype 507 Checking library. This links against -lmpe_collchk. 508 509 -graphics : Use MPE graphics routines with X11 library. 510 This links against -lmpe <X11 libraries>. 511 512 -log : MPE user-defined states logging. 513 This links against -lmpe. 514 515 -nolog : Nullify MPE user-defined states logging. 516 This links against -lmpe_null. 517 518 -help : Print this help page. 519 520 521MPE has been seamlessly integrated into MPICH2 and MPICH distributions, 522 523In MPICH2, all the convenient compilation and linking switches described above 524are provided through -mpe= option in mpicc/mpicxx/mpif77/mpif90. For instance, 525to compile and link cpilog with automatic MPI logging library can be done 526as follows 527 528mpicc -mpe=mpilog -o cpilog cpilog.c 529 530which is equivalent to the mpecc command 531 532mpecc -mpilog -o cpilog cpilog.c 533 534 535In MPICH(the old one), the compiler wrappers, mpicc/mpiCC/mpif77/mpif90 536do not provide all the convenient switches listed above, only 3 of them 537are available. These options are : 538 539-mpitrace - to compile and link with tracing library. 540-mpianim - to compile and link with animation libraries. 541-mpilog - to compile and link with logging libraries. 542 543For instance, the following command creates executable, {\tt fpilog}, which 544generates logfile when it is executed. 545 546mpif77 -mpilog -o fpilog fpilog.f 547 548 549 550 551VII. Using MPE in MPICHx 552------------------------ 553 554VII. a) Inheritance of Environmental Variables 555---------------------------------------------- 556MPE relies on certain environmental variables (e.g. MPE_TMPDIR). These 557variables determine how MPE behaves. It is important to make sure that 558all the MPI processes receive the intended value of environmental variables. 559The complication of this issue comes from the fact that different MPI 560implementations have different ways of passing environmental varaiable. For 561instance, MPICH contains many different devices for different platforms, 562some of these devices have their unique way of passing of environmental 563variables to other processes. The often used devices, like ch_p4 and ch_shmem, 564do not require special attention to pass the value of the environmental 565variable to spawned processes. The spawned process inherits the value from 566the launching process when the environmental variable in the launching 567process has been set. But this is NOT true for all the devices, for instance, 568the ch_p4mpd device requires special option of mpirun to set environmental 569variables to all processes. 570 571mpirun -np N fpilog -MPDENV- MPE_LOGFILE_PREFIX=fpilog 572 573In this example, the option -MPDENV- is needed to make sure 574that all processes have their environmental variable, MPE_LOGFILE_PREFIX, 575set to the desirable output logfile prefix. 576 577 578In MPICH2, when using MPD as a process manage, passing MPE_LOGFILE_PREFIX 579and MPE_TMPDIR can be done as follows: 580 581mpiexec -env MPE_LOGFILE_PREFIX <output-logname-prefix> \ 582 -env MPE_TMPDIR <local-tmp-dir> -n 32 <executable-name> 583 584Also, with MPE X11 graphics library, the local DISPLAY variable set on each 585process is read (so ssh tunnelling can be used), for examples assume 586an mpd ring has been set up by on 2 machines, schwinn and triumph, as follows: 587 588cat > mpd.hosts << EOF 589schwinn 590triumph 591EOF 592 593mpdboot -n 2 -f mpd.hosts 594 595Now launch MPE X11 graphics sample code, cxgraphics, as follows: 596 597mpiexec -host schwinn -env DISPLAY <display_0> cxgraphics : \ 598 -host triumph -env DISPLAY <display_1> cxgraphics 599 600Where <display_0> and <display_1> are the local DISPLAY variable 601echoed from consoles connected to schwinn and triumph respectively. 602 603 604For other MPI implementations, how environmental variables are passed 605remains unchanged. User needs to get familar with the runtime environment 606and set the environmental variables appropriately. 607 608 609VII. b) Viewing Logfiles 610------------------------ 611MPE's install directory structure is the same as MPICH's and MPICH-2's. 612So all MPE's utility programs will be located in the bin/ directory of 613MPICH and MPICH-2. 614 615