1\" @(#)present.me 1.5 03/12/82 2.sh 1 "Data Presentation" 3.pp 4The data is presented to the user in two different formats. 5The first presentation simply lists the routines 6without regard to the amount of time their descendants use. 7The second presentation incorporates the call graph of the 8program. 9.sh 2 "The Flat Profile 10.pp 11The flat profile consists of a list of all the routines 12that are called during execution of the program, 13with the count of the number of times they are called 14and the number of seconds of execution time for which they 15are themselves accountable. 16The routines are listed in decreasing order of execution time. 17A list of the routines that are never called during execution of 18the program is also available 19to verify that nothing important is omitted by 20this execution. 21The flat profile gives a quick overview of the routines that are used, 22and shows the routines that are themselves responsible 23for large fractions of the execution time. 24In practice, 25this profile usually indicates that no single function 26is overwhelmingly responsible for 27the total time of the program. 28Notice that for this profile, 29the individual times sum to the total execution time. 30.sh 2 "The Call Graph Profile" 31.pp 32Ideally, we would like to print the call graph of the program, 33but we are limited by the two-dimensional nature of our output 34devices. 35We cannot assume that a call graph is planar, 36and even if it is, that we can print a planar version of it. 37Instead, we choose to print each routine, 38together with information about 39the routines that are its direct parents and children. 40This listing presents a window into the call graph. 41Based on our experience, 42both parent information and child information 43is important, 44and should be available without searching 45through the output. 46.pp 47The major entries of the call graph profile are the entries from the 48flat profile, augmented by the time propagated to each 49routine from its descendants. 50This profile is sorted by the sum of the time for the routine 51itself plus the time inherited from its descendants. 52The profile shows which of the higher level routines 53spend large portions of the total execution time 54in the routines that they call. 55For each routine, we show the amount of time passed by each child 56to the routine, which includes time for the child itself 57and for the descendants of the child 58(and thus the descendants of the routine). 59We also show the percentage these times represent of the total time 60accounted to the child. 61Similarly, the parents of each routine are listed, 62along with time, 63and percentage of total routine time, 64propagated to each one. 65.pp 66Cycles are handled as single entities. 67The cycle as a whole is shown as though it were a single routine, 68except that members of the cycle are listed in place of the children. 69Although the number of calls of each member 70from within the cycle are shown, 71they do not affect time propagation. 72When a child is a member of a cycle, 73the time shown is the appropriate fraction of the time 74for the whole cycle. 75Self-recursive routines have their calls broken 76down into calls from the outside and self-recursive calls. 77Only the outside calls affect the propagation of time. 78.pp 79The following example is a typical fragment of a call graph. 80.(b 81.TS 82center; 83c c c. 84 85Caller1 Caller2 86 87 88 Example 89 90 91Sub1 Sub2 Sub3 92 93.TE 94.)b 95This example would have the following form of entry 96in the profile listing. 97.sz 10 98.(b 99.TS 100box center; 101c c c c c l l 102c c c c c l l 103c c c c c l l 104l n n n c l l. 105 called/total \ \ parents 106index %time self descendants called+self name index 107 called/total \ \ children 108_ 109 0.20 1.20 4/10 \ \ Caller1 [7] 110 0.30 1.80 6/10 \ \ Caller2 [1] 111[2] 41.5 0.50 3.00 10+4 Example [2] 112 1.50 1.00 20/40 \ \ Sub1 <cycle1> [4] 113 0.00 0.50 1/5 \ \ Sub2 [9] 114 0.00 0.00 0/5 \ \ Sub3 [11] 115.TE 116.)b 117.sz 12 118.pp 119The entry is for routine Example, which has 120the Caller routines as its parents, 121and the Sub routines as its children. 122The reader should keep in mind that all information 123is given \fIwith respect to Example\fP. 124The index in the first column indicates that Example 125is the second entry in the profile listing. 126The Example routine is called ten times, four times by Caller1, 127and six times by Caller2. 128Consequently 40% of Example's time is propagated to Caller1, 129and 60% of Example's time is propagated to Caller2. 130The self and descendant fields of the parents 131show the amount of self and descendant time Example 132propagates to them (but not the time used by 133the parents directly). 134Note that Example calls itself recursively four times. 135The routine Example calls routine Sub1 twenty times, Sub2 once, 136and never calls Sub3. 137Since Sub2 is called a total of five times, 13820% of its self and descendant time is propagated to Example's 139descendant time field. 140Because Sub1 is a member of \fIcycle 1\fR, 141the self and descendant times are those for the cycle as a whole. 142Since Sub1 is called a total of forty times from outside 143of its cycle, it propagates 50% of the cycle's self and descendant 144time to Example's descendant time field. 145Finally each name is followed by an index that indicates 146where on the listing to find the entry for that routine. 147.sh 1 "Using the Profiles" 148.pp 149The profiler is a useful tool for improving 150a set of routines that implement an abstraction. 151It can be helpful in identifying poorly coded routines, 152and in evaluating the new algorithms and code that replace them. 153Taking full advantage of the profiler 154requires a careful examination of the call graph profile, 155and a thorough knowledge of the abstractions underlying 156the program. 157.pp 158One of the easiest optimizations that can be performed 159is a small change 160to a control construct or data structure that improves the 161running time of the program. 162An obvious starting point 163is a routine that is called many times. 164For example, suppose an output 165routine is the only parent 166of a routine that formats the data. 167If this format routine is expanded inline in the 168output routine, the overhead of a function call and 169return can be saved for each datum that needs to be formatted. 170.pp 171The drawback to inline expansion is that the data abstractions 172in the program may become less parameterized, 173hence less clearly defined. 174The profiling will also become less useful since the loss of 175routines will make its output more granular. 176For example, 177if the symbol table functions ``lookup'', ``insert'', and ``delete'' 178are all merged into a single parameterized routine, 179it will be impossible to determine the costs 180of any one of these individual functions from the profile. 181.pp 182Further potential for optimization lies in routines that 183implement data abstractions whose total execution 184time is long. 185For example, a lookup routine might be called only a few 186times, but use an inefficient linear search algorithm, 187that might be replaced with a binary search. 188Alternately, the discovery that a rehashing function is being 189called excessively, can lead to a different 190hash function or a larger hash table. 191If the data abstraction function cannot easily be speeded up, 192it may be advantageous to cache its results, 193and eliminate the need to rerun 194it for identical inputs. 195.pp 196This tool is best used in an iterative approach: 197profiling the program, 198eliminating one bottleneck, 199then finding some other part of the program 200that begins to dominate execution time. 201For instance, we have used \fBgprof\fR on itself; 202eliminating, rewriting, and inline expanding routines, 203until reading 204data files (hardly a target for optimization!) 205represents the dominating factor in its execution time. 206.pp 207Certain types of programs are not easily analyzed by \fBgprof\fR. 208They are typified by programs that exhibit a large degree of 209recursion, such as recursive descent compilers. 210The problem is that most of the major routines are grouped 211into a single monolithic cycle. 212As in the case of the symbol table abstraction that is placed 213in one routine, 214it is impossible to distinguish which members of the cycle are 215responsible for the execution time. 216Unfortunately there are no easy modifications to these programs that 217make them amenable to analysis. 218.pp 219A completely different use of the profiler is to analyze the control 220flow of an unfamiliar program. 221If you receive a program from another user that you need to modify 222in some small way, 223it is often unclear where the changes need to be made. 224By running the program on an example and then using \fBgprof\fR, 225you can get a view of the structure of the program. 226.pp 227Consider an example in which you need to change the output format 228of the program. 229For purposes of this example suppose that the call graph 230of the output portion of the program has the following structure: 231.(b 232.TS 233center; 234c c c. 235 236calc1 calc2 calc3 237 238 239format1 format2 240 241 242 ``write'' 243 244.TE 245.)b 246Initially you look through the \fBgprof\fR 247output for the system call ``write''. 248The format routine you will need to change is probably 249among the parents of the ``write'' procedure. 250The next step is to look at the profile entry for each 251of parents of ``write'', 252in this example either ``format1'' or ``format2'', 253to determine which one to change. 254Each format routine will have one or more parents, 255in this example ``calc1'', ``calc2'', and ``calc3''. 256By inspecting the source code for each of these routines 257you can determine which format routine generates the output that 258you wish to modify. 259Since the \fBgprof\fR entry indicates all the 260potential calls to the format routine you intend to change, 261you can determine if your modifications will affect output that 262should be left alone. 263If you desire to change the output of ``calc2'', but not ``calc3'', 264then formatting routine ``format2'' needs to be split 265into two separate routines, 266one of which implements the new format. 267You can then retarget just the call by ``calc2'' 268that needs the new format. 269It should be noted that the static call information is particularly 270useful in this case since the test case you run probably will not 271exercise the entire program. 272.sh 1 "Conclusions" 273.pp 274We have created a profiler that aids in the evaluation 275of modular programs. 276For each routine in the program, 277the profile shows the extent to which that routine 278helps support various abstractions, 279and how that routine uses other abstractions. 280The profile accurately assesses the cost of routines 281at all levels of the program decomposition. 282The profiler is easily used, 283and can be compiled into the program without any prior planning on the part 284of the programmer. 285It adds only five to thirty percent execution overhead to the program 286being profiled, 287produces no additional output until after the program finishes, 288and allows the program to be measured in its actual environment. 289Finally, the profiler runs on a time-sharing system 290using only the normal services provided by the operating system 291and compilers. 292