1 MPE (Multi-Processing Environment)
2 ----------------------------------
3
4 Version 2.4.6. March, 2008
5
6 Mathematics and Computer Science Division
7 Argonne National Laboratory
8
9I. INTRODUCTION
10----------------
11
12The Multi-Processing Environment (MPE) attempts to provide programmers with
13a complete suite of performance analysis tools for their MPI programs based
14on post processing approach. These tools include a set of profiling libraries,
15a set of utility programs, and a set of graphical tools.
16
17The first set of tools to be used with user MPI programs is profiling libraries
18which provide a collection of routines that create log files. These log files
19can be created manually by inserting MPE calls in the MPI program, or
20automatically by linking with the appropriate MPE libraries, or by combining
21the above two methods. Currently, the MPE offers the following 4 profiling
22libraries.
23
24 1) Tracing Library - Traces all MPI calls. Each MPI call is preceded by a
25 line that contains the rank in MPI_COMM_WORLD of the calling process,
26 and followed by another line indicating that the call has completed.
27 Most send and receive routines also indicate the values of count, tag,
28 and partner (destination for sends, source for receives). Output is to
29 standard output.
30
31 2) Animation Libraries - A simple form of real-time program animation
32 that requires X window routines.
33
34 3) Logging Libraries - The most useful and widely used profiling libraries
35 in MPE. These libraries form the basis to generate log files from
36 user MPI programs. There are several different log file formats
37 available in MPE. The default log file format is CLOG2. It is a low
38 overhead logging format, a simple collection of single timestamp events.
39 The old format ALOG, which is not being developed for years, is not
40 distributed here. The powerful visualization format is SLOG-2, stands
41 for Scalable LOGfile format version II which is a total redesign of the
42 original SLOG format. SLOG-2 allows for much improved scalability for
43 visualization purpose. CLOG2 file can be easily converted to
44 SLOG-2 file through the new SLOG-2 viewer, Jumpshot-4. The MPI logging
45 library is now thread-safe through the use of a global mutex over the
46 the MPE logging library which is not yet thread-safe.
47
48 4) Collective and datatype checking library - An argument consistency
49 checking library for MPI collective calls. It checks for datatype,
50 root, and various argument consistency in MPI collective calls.
51 If an error is detected, a backtrace of the callstack (on the supported
52 platform) will be printed to locate the offended call.
53
54The set of utility programs in MPE includes log format converter (e.g.
55clogTOslog2), logfile print (e.g. slog2print) and logfile viewer and
56convertor (e.g. jumpshot). These new tools, clog2TOslog2, slog2print and
57jumpshot(Jumpshot-4) replace old tools, clog2slog, slog_print and logviewer
58(i.e. Jumpshot-2 and Jumpshot-3). For more information of various
59logfile formats and their viewers, see
60
61http://www.mcs.anl.gov/perfvis
62
63
64
65II. CONFIGURATION
66-----------------
67
68MPE can be configured and installed as an extension to most MPI standard
69-compliant MPI implementations, e.g. MPICH-2, MPICH, OpenMPI, LAM/MPI,
70SGI's MPI, HP-UX's MPI, IBM's MPI, Cray's MPI and NEC's MPI.
71MPE has been integrated into MPICH and MPE2 has been integrated seamlessly
72into MPICH-2, so MPEx will be installed automatically during MPICHx's
73installation process.
74
75For details of configuration of MPE, see the INSTALL or INSTALL.cross file.
76
77
78
79III. INSTALLATION INSTRUCTIONS
80-------------------------------
81
82For details of installation instruction/examples of MPE, see the INSTALL file.
83
84
85
86IV. EXAMPLE PROGRAMS
87----------------------
88
89As previously noted, the MPE library is composed of 3 different profiling
90libraries. During configure, the compiler's library linkage flags and
91appropriate libraries are determined. These variables are first substituted
92in the Makefiles in the directories, mpe2/src/wrappers/test,
93mpe2/src/graphics/contrib/test and mpe2/src/collchk/test. The Makefiles for
94mpe2/src/wrappers/test and mpe2/src/graphics/contrib/test are then installed
95into directory share/ as examples/logging/ and examples/graphics/ during the
96final installation process. The following are some of the crucial variables:
97
98LOG_LIBS = library flag that links with the logging libraries
99TRACE_LIBS = library flag that links with the tracing library
100ANIM_LIBS = library flag that links with the animation library
101COLLCHK_LIBS = library flag that links with the collective and datatype
102 checking library
103
104The variable FLIB_PATH is the compiler's library path needed to link fortran
105MPI programs with the logging library.
106
107During make, small test programs cpi.c, cpilog.c and fpilog.f will be linked
108with each of the above libraries. In the output from Make, a message will be
109printed to indicate the status of each attempted link test. The success
110of these linkage tests will also be included in the Make output. If the
111linkage tests are successful, then these library linkage flags can be used
112for your programs as well.
113
114The following example programs are also included in the distribution:
115
116 mpe/src/graphics/contrib/mandel is a Mandelbrot program that uses the MPE
117 graphics package.
118
119 mpe/src/graphics/contrib/mastermind is a program for solving the Mastermind
120 puzzle in parallel.
121
122These programs should work on all MPI implementations, but have not been
123extensively tested.
124
125
126
127
128
129V. MPEINSTALL
130--------------
131
132A 'mpeinstall' script is created during configuration. If configuring with
133MPICH and MPICH2, then the 'mpiinstall' script will invoke the 'mpeinstall'
134script. However, 'mpeinstall' can also be used by itself. This is only
135optional and is of use only if you wish to install the MPE library in a
136public place so that others may use it. Final install directory will
137consist of an include, lib, bin, sbin and share subdirectories. Examples
138and various logfile viewers will be installed under share.
139
140
141
142
143
144VI. USAGE
145---------
146
147The final install directory contains the following subdirectories.
148
149 include/ contains all the include files that user program needs to read.
150 lib/ contains all the libraries that user program needs to link with.
151 bin/ contains all the utility programs that user needs to use.
152 doc/ contains available MPE documentation, e.g. Jumpshot-4's userguide.
153 sbin/ contains the MPE uninstall script to uninstall the installation.
154 share/ contains user read-only data. Besides share/examples/logging/,
155 share/examples/graphics/, share/examples/collchk, and
156 share/examples/logfiles, user usually does NOT need to know
157 the details of other subdirectories.
158
159In terms of usage of MPE, user usually only need to know about the files
160that have been installed in include/, lib/ and bin/.
161
162
163
164VI. a) CUSTOMIZING LOGFILES
165---------------------------
166
167In addition to using the predefined MPE logging libraries to log all MPI
168calls, MPE logging calls can be inserted into user's MPI program to define
169and log states. These states are called User-Defined states. States may
170be nested, allowing one to define a state describing a user routine that
171contains several MPI calls, and display both the user-defined state and
172the MPI operations contained within it.
173
174The simplest way to insert user-defined states is as follows:
1751) Get handles from MPE logging library: MPE_Log_get_state_eventIDs()
176 has to be used to get unique event IDs (MPE logging handles).
177 This is important if you are writing a library that uses
178 the MPE logging routines from the MPE system.
179
180 PS. Hardwiring the eventIDs is considered a bad idea since it may cause
181 eventID confict and so the practice isn't supported. Older MPE libraries
182 provide MPE_Log_get_event_number() which is still being supported but
183 has been deprecated. Users are strongly urged to use
184 MPE_Log_get_state_eventIDs() instead.
1852) Set the logged state's characteristics: MPE_Describe_state() sets the
186 name and color of the states.
1873) Log the events of the logged states: MPE_Log_event() are called twice
188 to log the user-defined states.
189
190Below is a simple example that uses the 3 steps outlined above.
191
192\begin{verbatim}
193
194int eventID_begin, eventID_end;
195...
196MPE_Log_get_state_eventIDs( &eventID_begin, &eventID_end );
197...
198MPE_Describe_state( eventID_begin, eventID_end, "Multiplication", "red" );
199...
200MyAmult( Matrix m, Vector v )
201{
202 /* Log the start event of the red "Multiplication" state */
203 MPE_Log_event( eventID_begin, 0, NULL );
204 ...
205 ... Amult code, including MPI calls ...
206 ...
207 /* Log the end event of the red "Multiplication" state */
208 MPE_Log_event( eventID_end, 0, NULL );
209}
210
211\end{verbatim}
212
213The logfile generated by this code will have the MPI routines nested within
214the routine MyAmult().
215
216Besides user-defined states, MPE2 also provides support for user-defined
217events which can be defined through use of MPE_Log_get_solo_eventID()
218and MPE_Describe_event. For more details, e.g. see cpilog.c.
219
220If the MPE logging library, liblmpe.a, is NOT linked with the user program,
221MPE_Init_log() and MPE_Finish_log() need to be used before and after all
222the MPE calls. Sample programs cpilog.c and fpilog.f are available to
223illustrate the use of these MPE routines. They are in the MPE
224source directory, mpe2/src/wrappers/test or the installed directory,
225share/examples/logging to illustrate the use of these MPE routines.
226For futher linking information, see section "Convenient Compiler Wrappers".
227
228For undefined user-defined state, i.e. corresponding MPE_Describe_state()
229has not been issued, new jumpshot (Jumpshot-4) may display the legend name as
230"UnknownType-INDEX" where INDEX is the internal MPE category index.
231
232
233
234VI. b) ENVIRONMENTAL VARIABLES
235------------------------------
236
237For MPE logging, MPE_TMPDIR and MPE_LOGFILE_PREFIX are 2 environment variables
238that most users find to be very useful. So it is recommended to set these
2392 env. variables before launching the MPI program during logging :
240
241CLOG_BLOCK_SIZE: The integer value determines the clog2 buffer block size
242 which set the least minimum clog2 file size. If
243 CLOG_BLOCK_SIZE is not set, 64K per block is assumed.
244
245CLOG_BUFFERED_BLOCKS: The integer value determines the number of blocks
246 witin the CLOG2's internal buffer. Together with
247 CLOG_BLOCK_SIZE, CLOG_BUFFERED_BLOCKS determines how
248 often the internal buffer is flushed to the disk.
249 The total buffer size is determined by the product of
250 CLOG_BLOCK_SIZE and CLOG_BUFFERED_BLOCKS. These 2
251 environmental variables allows user to minimize MPE2
252 logging overhead when large local memory is available.
253 The default value is 128.
254
255MPE_TMPDIR: MPE_TMPDIR takes precedence over TMPDIR. It specifies a
256 directory to be used as temporary storage for each process.
257 By default, when MPE_TMPDIR and TMPDIR are NOT set,
258 /tmp will be used. When user needs to generate a very large
259 logfile for long-running MPI job, user needs to make sure that
260 MPE_TMPDIR(or TMPDIR) is big enough to hold the temporary local
261 logfile which will be deleted if the merged logfile can be
262 created successfully. In order to minimize the overhead of the
263 logging to the MPI program, it is highly recommended user to
264 use a *local* file system for TMPDIR.
265
266 Note : The final merged logfile will be written back to the
267 file system where process 0 is.
268
269MPE_SAME_TMPDIR: The boolean value determines whether MPE_TMPDIR will be
270 the same across the whole MPI_COMM_WORLD. By default,
271 MPE_SAME_TMPDIR is set to true and only rank 0's MPE_TMPDIR
272 will be broadcasted to every process. There is scalability
273 implication on whether MPE_SAME_TMPDIR is true. When
274 MPE_SAME_TMPDIR=false, MPE_TMPDIR could be set differently
275 on different processes and hence mkstemp() will be called
276 once on each process to create temporary filename. Calling
277 mkstemp() on the same filesystem, e.g. $HOME, for each process
278 is not scalable, i.e. the shared filesystem may be hanging.
279 MPE_SAME_TMPDIR=false is provided for cases that MPE_TMPDIR
280 has to be set differently on different process but it is
281 in general not scalable when MPI_COMM_WORLD's size >> 1K.
282
283MPE_DELETE_LOCALFILE: The boolean value determines whether to delete the
284 temporary local clog2 file. When this flag is
285 set to true, user needs to collect from the temporary
286 clog2 files from each slave node's MPE_TMPDIR.
287 Then separate serial programs, clog2_join and
288 clog2_repair, can be used to merge the local clog2
289 files. This process is useful e.g. when MPI_Finalize()
290 fails to complete properly, e.g. due to user program
291 overwritten to MPE/MPI internal data structures.
292
293MPE_CLOCKS_SYNC: The boolean value determines the behavior of
294 MPE_Log_sync_clocks() and the default clock synchronization
295 at the end of logging. Users may way to force MPE
296 clock synchronization when the MPI implementation has
297 buggy clock synchronization mechanism, e.g. Some versions
298 of BG/L MPI's MPI_WTIME_IS_GLOBAL is incorrectly set
299 to true when 64-ways or 256-ways partition is used.
300
301MPE_SYNC_ALGORITHM: specifies the clock synchronization algorithm. The
302 accepted values are "DEFAULT", "SEQ", "BITREE"
303 and "ALTNGBR".
304 SEQ: a O(N) steps algorithm and is non-scalable and slowest
305 but is also the most accurate.
306 BITREE: a O(log2(N)) steps algorithm, scalable and much
307 faster than SEQ but less accurate than SEQ.
308 A good compromise.
309 ALTNGBR: a O(1) steps algorithm, perfectly parallel
310 is the fastest of 3 algorithms supported.
311 It is also the least accurate.
312 DEFAULT: uses SEQ when the number of processes <= 16.
313 uses BITREE when number of processes > 16.
314
315MPE_SYNC_FREQUENCY: specifies the number of iterations of selected clock
316 synchronization. In general, the higher of
317 MPI_SYNC_FREQUENCY, the higher the probability of
318 obtaining a accurate measurement of all the clocks,
319 i.e. less error. Keep in mind, this is generally
320 not a guarantee and is highly dependent of the system
321 noise. The default is 3.
322
323MPE_LOGFILE_PREFIX: specifies the filename prefix of the output logfile.
324 The file extension will be determined by the output
325 logfile format, i.e. MPE_LOG_FORMAT.
326
327MPE_LOG_FORMAT: determines the format of the logfile generated from the
328 execution of application linked with MPE logging libraries.
329 The allowed value for MPE_LOG_FORMAT is CLOG2 only.
330 So there is no need to use this variable at the moment.
331
332MPE_LOG_OVERHEAD: The boolean value determines to log MPE/CLOG2's internal
333 profiling state CLOG_Buffer_write2disk(). The default
334 setting is yes. CLOG_Buffer_write2disk labels region
335 in each process that MPE/CLOG2 spends on flushing logging
336 data in the memory to the disk. The frequency and location
337 of CLOG_Buffer_write2disk state can be altered by changing
338 CLOG_BLOCK_SIZE and/or CLOG_BUFFERED_BLOCKS.
339
340MPE_LOG_RANK2PROCNAME: The boolean value determines if a .clog2.pnm file
341 that contains the MPI_COMM_WORLD rank to processor
342 name determined by MPI_Get_processor_name(). The
343 default is No.
344
345MPE_WRAPPERS_ADD_LDFLAGS: The variable tells MPE wrappers, mpecc/mpefc
346 (includes mpich2's mpicc and friends), to use
347 LDFLAGS added by MPE, e.g. -Wl,--export-dynamic.
348 The default is yes. User can override it by
349 setting it to "no" or "false".
350
351MPE_USE_FCONSTS_IN_MPIH : The boolean value determines if MPI_F_* constants
352 from mpi.h will be used. By default, MPE computes
353 the MPI_F_* constants instead of reading from mpi.h.
354 The affected constants are MPI_F_STATUS(ES)_IGNORE.
355
356Possible boolean values are "true", "false", "yes" and "no" in either
357all lower or upper cases.
358
359
360For MPE X11 graphics, environment variables DISPLAY set in each process
361is read during MPE_Open_graphics.
362
363DISPLAY: determines where MPE X11 graphics on each process is connected to.
364
365
366
367VI. c) EXAMPLE MAKEFILE
368-----------------------
369
370The install directories, share/examples/logging, share/examples/graphics and
371share/examples/collchk contain some very useful and simple example programs.
372The Makefiles in these directories illustrate the usage of MPE routines
373and how to link with various MPE libraries. In most cases, users can simply
374copy the share/examples/logging/Makefile to their home directory, and do a
375"make" to compile the suggested targets. Users don't need to copy the
376.c and .f files when MPE has been compiled with a MAKE that has VPATH
377support. The created executables can be launched with mpiexec or mpirun
378from the MPI implementation to generate sample logfiles.
379
380
381
382VI. d) UTILITY PROGRAMS
383-----------------------
384
385In bin/, user can find several useful utility programs when manipulating
386logfiles. These includes log format converters, log format print programs,
387and logfile display program,
388
389
390Log Format Converters
391---------------------
392
393clog2TOslog2 : a CLOG2 to SLOG-2 logfile convertor. For more details,
394 do "clog2TOslog2 -h".
395
396rlogTOslog2 : a RLOG to SLOG-2 logfile convertor. For more details,
397 do "rlogTOslog2 -h". Where RLOG is an internal MPICH2 logging
398 format.
399
400logconvertor : a standalone GUI based convertor that invokes clog2TOslog2
401 or rlogTOslog2 based on logfile extension. The GUI also
402 shows the progress of the conversion. The same convertor
403 can be invoked from within the logfile viewer, jumpshot.
404
405slog2filter : a SLOG-2 to SLOG-2 logfile convertor. It allows for removal
406 unwanted categories (when used with slog2print -c). It also
407 allows for changing of the SLOG-2 internal structure, e.g.
408 modify the duration of preview drawable. The tool reads
409 and writes SLOG-2 file of same version.
410
411slog2updater: a SLOG-2 file format update utility. It is essentially
412 a slog2filter that reads in older SLOG-2 file and writes
413 out the latest SLOG-2 file format.
414
415
416Log Format Print Programs
417-------------------------
418
419clog2_print : a stdout print program for CLOG file.
420 Java version is named as clogprint.
421
422clog2_join : a clog2 serial merging program that merges clog2 files
423 1) temporary local clog2 files which all are from the
424 same MPI_COMM_WORLD.
425 2) merged clog2 files from each MPI_COMM_WORLDs
426 (Incomplete!, timestamps are not sync'ed yet.)
427
428clog2_repair : a clog2 repair program that tries to fix the missing data
429 of a clog2 file (when the MPI program that is being profiled
430 aborts) so that the file can be processed by other tools
431 like clog2TOslog2.
432
433rlog_print : a stdout print program for SLOG-2 file.
434
435slog2print : a stdout print program for SLOG-2 file.
436
437
438
439Log File Display Program
440------------------------
441
442jumpshot : the Jumpshot-4 launcher script. Jumpshot-4 does logfile
443 conversion as well as visualization.
444
445To view a logfile, say fpilog.slog2, do
446
447jumpshot fpilog.slog2
448
449The command will select and invoke Jumpshot-4 to display the content
450of SLOG-2 file if Jumpshot-4 has been built and installed successfully.
451
452One can also do
453
454jumpshot fpilog.clog2
455
456or
457
458jumpshot barrier.rlog
459
460Both will invoke the logfile convertor first before visualization.
461
462
463Collective and Datatype Checking
464--------------------------------
465
466Linking an MPI application with the collective and datatype checking library
467as follows
468
469mpicc -o mpi_pgm *.o -L<mpe2_libdir> -lmpe_collchk.
470
471Or using compiler wrappers (more details in next section), e.g.
472
473(with mpich2's compiler wrapper)
474mpicc -mpe=mpicheck -o mpi_pgm *.o
475
476(with MPE's compiler wrapper)
477mpecc -mpe=mpicheck -o mpi_pgm *.o
478
479
480Convenient Compiler Wrappers
481----------------------------
482
483Standalone MPE installation with non-MPICH2 will see 2 convenient compiler
484wrappers mpecc and mpefc which mimic the typical usage of mpicc/mpif77/mpif90
485in MPICH* by providing convenient compilation and linking switches.
486mpecc is for C program, and mpefc is for Fortran program. Typically,
487user can use mpecc as follows:
488
489mpecc -mpilog -o cpilog cpilog.c
490
491which is equivalent to doing
492
493mpicc -o cpilog cpilog.c -L<mpe2_libdir> -llmpe -lmpe
494
495Available MPE profiling options for "mpecc" and "mpefc" are as follows:
496
497 -mpilog : Automatic MPI and MPE user-defined states logging.
498 This links against -llmpe -lmpe.
499
500 -mpitrace : Trace MPI program with printf.
501 This links against -ltmpe.
502
503 -mpianim : Animate MPI program in real-time.
504 This links against -lampe -lmpe.
505
506 -mpicheck : Check MPI Program with the Collective & Datatype
507 Checking library. This links against -lmpe_collchk.
508
509 -graphics : Use MPE graphics routines with X11 library.
510 This links against -lmpe <X11 libraries>.
511
512 -log : MPE user-defined states logging.
513 This links against -lmpe.
514
515 -nolog : Nullify MPE user-defined states logging.
516 This links against -lmpe_null.
517
518 -help : Print this help page.
519
520
521MPE has been seamlessly integrated into MPICH2 and MPICH distributions,
522
523In MPICH2, all the convenient compilation and linking switches described above
524are provided through -mpe= option in mpicc/mpicxx/mpif77/mpif90. For instance,
525to compile and link cpilog with automatic MPI logging library can be done
526as follows
527
528mpicc -mpe=mpilog -o cpilog cpilog.c
529
530which is equivalent to the mpecc command
531
532mpecc -mpilog -o cpilog cpilog.c
533
534
535In MPICH(the old one), the compiler wrappers, mpicc/mpiCC/mpif77/mpif90
536do not provide all the convenient switches listed above, only 3 of them
537are available. These options are :
538
539-mpitrace - to compile and link with tracing library.
540-mpianim - to compile and link with animation libraries.
541-mpilog - to compile and link with logging libraries.
542
543For instance, the following command creates executable, {\tt fpilog}, which
544generates logfile when it is executed.
545
546mpif77 -mpilog -o fpilog fpilog.f
547
548
549
550
551VII. Using MPE in MPICHx
552------------------------
553
554VII. a) Inheritance of Environmental Variables
555----------------------------------------------
556MPE relies on certain environmental variables (e.g. MPE_TMPDIR). These
557variables determine how MPE behaves. It is important to make sure that
558all the MPI processes receive the intended value of environmental variables.
559The complication of this issue comes from the fact that different MPI
560implementations have different ways of passing environmental varaiable. For
561instance, MPICH contains many different devices for different platforms,
562some of these devices have their unique way of passing of environmental
563variables to other processes. The often used devices, like ch_p4 and ch_shmem,
564do not require special attention to pass the value of the environmental
565variable to spawned processes. The spawned process inherits the value from
566the launching process when the environmental variable in the launching
567process has been set. But this is NOT true for all the devices, for instance,
568the ch_p4mpd device requires special option of mpirun to set environmental
569variables to all processes.
570
571mpirun -np N fpilog -MPDENV- MPE_LOGFILE_PREFIX=fpilog
572
573In this example, the option -MPDENV- is needed to make sure
574that all processes have their environmental variable, MPE_LOGFILE_PREFIX,
575set to the desirable output logfile prefix.
576
577
578In MPICH2, when using MPD as a process manage, passing MPE_LOGFILE_PREFIX
579and MPE_TMPDIR can be done as follows:
580
581mpiexec -env MPE_LOGFILE_PREFIX <output-logname-prefix> \
582 -env MPE_TMPDIR <local-tmp-dir> -n 32 <executable-name>
583
584Also, with MPE X11 graphics library, the local DISPLAY variable set on each
585process is read (so ssh tunnelling can be used), for examples assume
586an mpd ring has been set up by on 2 machines, schwinn and triumph, as follows:
587
588cat > mpd.hosts << EOF
589schwinn
590triumph
591EOF
592
593mpdboot -n 2 -f mpd.hosts
594
595Now launch MPE X11 graphics sample code, cxgraphics, as follows:
596
597mpiexec -host schwinn -env DISPLAY <display_0> cxgraphics : \
598 -host triumph -env DISPLAY <display_1> cxgraphics
599
600Where <display_0> and <display_1> are the local DISPLAY variable
601echoed from consoles connected to schwinn and triumph respectively.
602
603
604For other MPI implementations, how environmental variables are passed
605remains unchanged. User needs to get familar with the runtime environment
606and set the environmental variables appropriately.
607
608
609VII. b) Viewing Logfiles
610------------------------
611MPE's install directory structure is the same as MPICH's and MPICH-2's.
612So all MPE's utility programs will be located in the bin/ directory of
613MPICH and MPICH-2.
614
615
1 MPE (Multi-Processing Environment) for Windows
2 ----------------------------------------------
3
4 Mathematics and Computer Science Division
5 Argonne National Laboratory
6
7I. INTRODUCTION
8----------------
9
10The Multi-Processing Environment (MPE) attempts to provide programmers with
11a complete suite of performance analysis tools for their MPI programs based
12on post processing approach. These tools include a set of profiling libraries,
13a set of utility programs, and a set of graphical tools.
14
15The first set of tools to be used with user MPI programs is profiling libraries
16which provide a collection of routines that create log files. These log files
17can be created manually by inserting MPE calls in the MPI program, or
18automatically by linking with the appropriate MPE libraries, or by combining
19the above two methods. Currently, the MPE offers the following 4 profiling
20libraries.
21
22 1) Tracing Library - Traces all MPI calls. Each MPI call is preceded by a
23 line that contains the rank in MPI_COMM_WORLD of the calling process,
24 and followed by another line indicating that the call has completed.
25 Most send and receive routines also indicate the values of count, tag,
26 and partner (destination for sends, source for receives). Output is to
27 standard output.
28
29 2) Animation Libraries - A simple form of real-time program animation
30 that requires X window routines.
31
32 3) Logging Libraries - The most useful and widely used profiling libraries
33 in MPE. These libraries form the basis to generate log files from
34 user MPI programs. There are several different log file formats
35 available in MPE. The default log file format is CLOG2. It is a low
36 overhead logging format, a simple collection of single timestamp events.
37 The old format ALOG, which is not being developed for years, is not
38 distributed here. The powerful visualization format is SLOG-2, stands
39 for Scalable LOGfile format version II which is a total redesign of the
40 original SLOG format. SLOG-2 allows for much improved scalability for
41 visualization purpose. CLOG2 file can be easily converted to
42 SLOG-2 file through the new SLOG-2 viewer, Jumpshot-4.
43
44 4) Collective and datatype checking library - An argument consistency
45 checking library for MPI collective calls. It checks for datatype, root,
46 and various argument consistency in MPI collective calls.
47
48The set of utility programs in MPE includes log format converter (e.g.
49clogTOslog2), logfile print (e.g. slog2print) and logfile viewer and
50convertor (e.g. jumpshot). These new tools, clog2TOslog2, slog2print and
51jumpshot(Jumpshot-4) replace old tools, clog2slog, slog_print and logviewer
52(i.e. Jumpshot-2 and Jumpshot-3). For more information of various
53logfile formats and their viewers, see
54
55http://www.mcs.anl.gov/perfvis
56
57
58
59II. USAGE
60---------
61
62II. a) CUSTOMIZING LOGFILES
63---------------------------
64
65In addition to using the predefined MPE logging libraries to log all MPI
66calls, MPE logging calls can be inserted into user's MPI program to define
67and log states. These states are called User-Defined states. States may
68be nested, allowing one to define a state describing a user routine that
69contains several MPI calls, and display both the user-defined state and
70the MPI operations contained within it.
71
72The simplest way to insert user-defined states is as follows:
731) Get handles from MPE logging library: MPE_Log_get_state_eventIDs()
74 has to be used to get unique event IDs (MPE logging handles).
75 This is important if you are writing a library that uses
76 the MPE logging routines from the MPE system.
77
78 PS. Hardwiring the eventIDs is considered a bad idea since it may cause
79 eventID confict and so the practice isn't supported. Older MPE libraries
80 provide MPE_Log_get_event_number() which is still being supported but
81 has been deprecated. Users are strongly urged to use
82 MPE_Log_get_state_eventIDs() instead.
832) Set the logged state's characteristics: MPE_Describe_state() sets the
84 name and color of the states.
853) Log the events of the logged states: MPE_Log_event() are called twice
86 to log the user-defined states.
87
88Below is a simple example that uses the 3 steps outlined above.
89
90\begin{verbatim}
91
92int eventID_begin, eventID_end;
93...
94MPE_Log_get_state_eventIDs( &eventID_begin, &eventID_end );
95...
96MPE_Describe_state( eventID_begin, eventID_end, "Multiplication", "red" );
97...
98MyAmult( Matrix m, Vector v )
99{
100 /* Log the start event along with the size of the matrix */
101 MPE_Log_event( eventID_begin, 0, NULL );
102 ... Amult code, including MPI calls ...
103 MPE_Log_event( eventID_end, 0, NULL );
104}
105
106\end{verbatim}
107
108The logfile generated by this code will have the MPI routines nested within
109the routine MyAmult().
110
111Besides user-defined states, MPE2 also provides support for user-defined
112events which can be defined through use of MPE_Log_get_solo_eventID()
113and MPE_Describe_event. For more details, e.g. see cpilog.c.
114
115If the MPE logging library, liblmpe.a, is NOT linked with the user program,
116MPE_Init_log() and MPE_Finish_log() need to be used before and after all
117the MPE calls. Sample programs cpilog.c and fpilog.f are available to
118illustrate the use of these MPE routines. They are in the MPE
119source directory, mpe2/src/wrappers/test or the installed directory,
120share/examples_logging to illustrate the use of these MPE routines.
121For futher linking information, see section "Convenient Compiler Wrappers".
122
123For undefined user-defined state, i.e. corresponding MPE_Describe_state()
124has not been issued, new jumpshot (Jumpshot-4) may display the legend name as
125"UnknownType-INDEX" where INDEX is the internal MPE category index.
126
127
128
129II. b) ENVIRONMENTAL VARIABLES
130------------------------------
131
132For MPE logging, MPE_TMPDIR and MPE_LOGFILE_PREFIX are 2 environment variables
133that most users find to be very useful. So it is recommended to set these
1342 env. variables before launching the MPI program during logging :
135
136CLOG_BLOCK_SIZE: The integer value determines the clog2 buffer block size
137 which set the least minimum clog2 file size. If
138 CLOG_BLOCK_SIZE is not set, 64K per block is assumed.
139
140CLOG_BUFFERED_BLOCKS: The integer value determines the number of blocks
141 witin the CLOG2's internal buffer. Together with
142 CLOG_BLOCK_SIZE, CLOG_BUFFERED_BLOCKS determines how
143 often the internal buffer is flushed to the disk.
144 The total buffer size is determined by the product of
145 CLOG_BLOCK_SIZE and CLOG_BUFFERED_BLOCKS. These 2
146 environmental variables allows user to minimize MPE2
147 logging overhead when large local memory is available.
148 The default value is 128.
149
150MPE_TMPDIR: MPE_TMPDIR takes precedence over TMPDIR. It specifies a
151 directory to be used as temporary storage for each process.
152 By default, when MPE_TMPDIR and TMPDIR are NOT set,
153 /tmp will be used. When user needs to generate a very large
154 logfile for long-running MPI job, user needs to make sure that
155 MPE_TMPDIR(or TMPDIR) is big enough to hold the temporary local
156 logfile which will be deleted if the merged logfile can be
157 created successfully. In order to minimize the overhead of the
158 logging to the MPI program, it is highly recommended user to
159 use a *local* file system for TMPDIR.
160
161 Note : The final merged logfile will be written back to the
162 file system where process 0 is.
163
164MPE_DELETE_LOCALFILE: The boolean value determines whether to delete the
165 temporary local clog2 file. When this flag is
166 set to true, user needs to collect from the temporary
167 clog2 files from each slave node's MPE_TMPDIR.
168 Then separate serial programs, clog2_join and
169 clog2_repair, can be used to merge the local clog2
170 files. This process is useful e.g. when MPI_Finalize()
171 fails to complete properly, e.g. due to user program
172 overwritten to MPE/MPI internal data structures.
173
174MPE_CLOCKS_SYNC: The boolean value determines the behavior of
175 MPE_Log_sync_clocks() and the default clock synchronization
176 at the end of logging. Users may way to force MPE
177 clock synchronization when the MPI implementation has
178 buggy clock synchronization mechanism, e.g. Some versions
179 of BG/L MPI's MPI_WTIME_IS_GLOBAL is incorrectly set
180 to true when 64-ways or 256-ways partition is used.
181
182MPE_SYNC_ALGORITHM: specifies the clock synchronization algorithm. The
183 accepted values are "DEFAULT", "SEQ", "BITREE"
184 and "ALTNGBR".
185 SEQ: a O(N) steps algorithm and is non-scalable and slowest
186 but is also the most accurate.
187 BITREE: a O(log2(N)) steps algorithm, scalable and much
188 faster than SEQ but less accurate than SEQ.
189 A good compromise.
190 ALTNGBR: a O(1) steps algorithm, perfectly parallel
191 is the fastest of 3 algorithms supported.
192 It is also the least accurate.
193 DEFAULT: uses SEQ when the number of processes <= 16.
194 uses BITREE when number of processes > 16.
195
196MPE_SYNC_FREQUENCY: specifies the number of iterations of selected clock
197 synchronization. In general, the higher of
198 MPI_SYNC_FREQUENCY, the higher the probability of
199 obtaining a accurate measurement of all the clocks,
200 i.e. less error. Keep in mind, this is generally
201 not a guarantee and is highly dependent of the system
202 noise. The default is 3.
203
204MPE_LOGFILE_PREFIX: specifies the filename prefix of the output logfile.
205 The file extension will be determined by the output
206 logfile format, i.e. MPE_LOG_FORMAT.
207
208MPE_LOG_FORMAT: determines the format of the logfile generated from the
209 execution of application linked with MPE logging libraries.
210 The allowed value for MPE_LOG_FORMAT is CLOG2 only.
211 So there is no need to use this variable at the moment.
212
213MPE_LOG_OVERHEAD: The boolean value determines to log MPE/CLOG2's internal
214 profiling state CLOG_Buffer_write2disk(). The default
215 setting is yes. CLOG_Buffer_write2disk labels region
216 in each process that MPE/CLOG2 spends on flushing logging
217 data in the memory to the disk. The frequency and location
218 of CLOG_Buffer_write2disk state can be altered by changing
219 CLOG_BLOCK_SIZE and/or CLOG_BUFFERED_BLOCKS.
220
221Possible boolean values are "true", "false", "yes" and "no" in either
222all lower or upper cases.
223
224
225For MPE X11 graphics, environment variables DISPLAY set in each process
226is read during MPE_Open_graphics.
227
228DISPLAY: determines where MPE X11 graphics on each process is connected to.
229
230
231
232II. c) EXAMPLE PROJECT FILE
233---------------------------
234In examples/, user can find example source code, cpilog.c, on how to customize
235log files.
236
237II. d) UTILITY PROGRAMS
238-----------------------
239
240In bin/, user can find several useful utility programs when manipulating
241logfiles. These includes log format converters, log format print programs,
242and logfile display program,
243
244
245Log Format Converters
246---------------------
247
248clog2TOslog2 : a CLOG2 to SLOG-2 logfile convertor. For more details,
249 do "clog2TOslog2 -h".
250
251rlogTOslog2 : a RLOG to SLOG-2 logfile convertor. For more details,
252 do "rlogTOslog2 -h". Where RLOG is an internal MPICH2 logging
253 format.
254
255logconvertor : a standalone GUI based convertor that invokes clog2TOslog2
256 or rlogTOslog2 based on logfile extension. The GUI also
257 shows the progress of the conversion. The same convertor
258 can be invoked from within the logfile viewer, jumpshot.
259
260slog2filter : a SLOG-2 to SLOG-2 logfile convertor. It allows for removal
261 unwanted categories (when used with slog2print -c). It also
262 allows for changing of the SLOG-2 internal structure, e.g.
263 modify the duration of preview drawable. The tool reads
264 and writes SLOG-2 file of same version.
265
266slog2updater: a SLOG-2 file format update utility. It is essentially
267 a slog2filter that reads in older SLOG-2 file and writes
268 out the latest SLOG-2 file format.
269
270
271Log Format Print Programs
272-------------------------
273
274clog2_print : a stdout print program for CLOG file.
275 Java version is named as clogprint.
276
277clog2_join : a clog2 serial merging program that merges clog2 files
278 1) temporary local clog2 files which all are from the
279 same MPI_COMM_WORLD.
280 2) merged clog2 files from each MPI_COMM_WORLDs
281 (Incomplete!, timestamps are not sync'ed yet.)
282
283clog2_repair : a clog2 repair program that tries to fix the missing data
284 of a clog2 file (when the MPI program that is being profiled
285 aborts) so that the file can be processed by other tools
286 like clog2TOslog2.
287
288rlog_print : a stdout print program for SLOG-2 file.
289
290slog2print : a stdout print program for SLOG-2 file.
291
292
293
294Log File Display Program
295------------------------
296
297jumpshot : the Jumpshot-4 launcher script. Jumpshot-4 does logfile
298 conversion as well as visualization.
299
300To view a logfile, say fpilog.slog2, do
301
302jumpshot fpilog.slog2
303
304The command will select and invoke Jumpshot-4 to display the content
305of SLOG-2 file if Jumpshot-4 has been built and installed successfully.
306
307One can also do
308
309jumpshot fpilog.clog2
310
311or
312
313jumpshot barrier.rlog
314
315Both will invoke the logfile convertor first before visualization.
316
317
318
319