1\"	@(#)present.me	1.7	03/14/82
2.sh 1 "Data Presentation"
3.pp
4The data is presented to the user in two different formats.
5The first presentation simply lists the routines
6without regard to the amount of time their descendants use.
7The second presentation incorporates the call graph of the
8program.
9.sh 2 "The Flat Profile
10.pp
11The flat profile consists of a list of all the routines
12that are called during execution of the program,
13with the count of the number of times they are called
14and the number of seconds of execution time for which they
15are themselves accountable.
16The routines are listed in decreasing order of execution time.
17A list of the routines that are never called during execution of
18the program is also available
19to verify that nothing important is omitted by
20this execution.
21The flat profile gives a quick overview of the routines that are used,
22and shows the routines that are themselves responsible
23for large fractions of the execution time.
24In practice,
25this profile usually shows that no single function
26is overwhelmingly responsible for
27the total time of the program.
28Notice that for this profile,
29the individual times sum to the total execution time.
30.sh 2 "The Call Graph Profile"
31.pp
32Ideally, we would like to print the call graph of the program,
33but we are limited by the two-dimensional nature of our output
34devices.
35We cannot assume that a call graph is planar,
36and even if it is, that we can print a planar version of it.
37Instead, we choose to list each routine,
38together with information about
39the routines that are its direct parents and children.
40This listing presents a window into the call graph.
41Based on our experience,
42both parent information and child information
43is important,
44and should be available without searching
45through the output.
46.pp
47The major entries of the call graph profile are the entries from the
48flat profile, augmented by the time propagated to each
49routine from its descendants.
50This profile is sorted by the sum of the time for the routine
51itself plus the time inherited from its descendants.
52The profile shows which of the higher level routines
53spend large portions of the total execution time
54in the routines that they call.
55For each routine, we show the amount of time passed by each child
56to the routine, which includes time for the child itself
57and for the descendants of the child
58(and thus the descendants of the routine).
59We also show the percentage these times represent of the total time
60accounted to the child.
61Similarly, the parents of each routine are listed,
62along with time,
63and percentage of total routine time,
64propagated to each one.
65.pp
66Cycles are handled as single entities.
67The cycle as a whole is shown as though it were a single routine,
68except that members of the cycle are listed in place of the children.
69Although the number of calls of each member
70from within the cycle are shown,
71they do not affect time propagation.
72When a child is a member of a cycle,
73the time shown is the appropriate fraction of the time
74for the whole cycle.
75Self-recursive routines have their calls broken
76down into calls from the outside and self-recursive calls.
77Only the outside calls affect the propagation of time.
78.pp
79The following example is a typical fragment of a call graph.
80.(b
81.TS
82center;
83c c c.
84
85Caller1		Caller2
86
87
88	Example
89
90
91Sub1	Sub2	Sub3
92
93.TE
94.)b
95This example would have the following form of entry
96in the profile listing.
97.sz 10
98.(b
99.TS
100box center;
101c c c c c l l
102c c c c c l l
103c c c c c l l
104l n n n c l l.
105				called/total	\ \ parents
106index	%time	self	descendants	called+self	name	index
107				called/total	\ \ children
108_
109		0.20	1.20	4/10	\ \ Caller1	[7]
110		0.30	1.80	6/10	\ \ Caller2	[1]
111[2]	41.5	0.50	3.00	10+4	Example	[2]
112		1.50	1.00	20/40	\ \ Sub1 <cycle1>	[4]
113		0.00	0.50	1/5	\ \ Sub2 	[9]
114		0.00	0.00	0/5	\ \ Sub3 	[11]
115.TE
116.)b
117.sz 12
118.pp
119The entry is for routine Example, which has
120the Caller routines as its parents,
121and the Sub routines as its children.
122The reader should keep in mind that all information
123is given \fIwith respect to Example\fP.
124The index in the first column shows that Example
125is the second entry in the profile listing.
126The Example routine is called ten times, four times by Caller1,
127and six times by Caller2.
128Consequently 40% of Example's time is propagated to Caller1,
129and 60% of Example's time is propagated to Caller2.
130The self and descendant fields of the parents
131show the amount of self and descendant time Example
132propagates to them (but not the time used by
133the parents directly).
134Note that Example calls itself recursively four times.
135The routine Example calls routine Sub1 twenty times, Sub2 once,
136and never calls Sub3.
137Since Sub2 is called a total of five times,
13820% of its self and descendant time is propagated to Example's
139descendant time field.
140Because Sub1 is a member of \fIcycle 1\fR,
141the self and descendant times
142and call count fraction
143are those for the cycle as a whole.
144Since cycle 1 is called a total of forty times
145(not counting calls among members of the cycle),
146it propagates 50% of the cycle's self and descendant
147time to Example's descendant time field.
148Finally each name is followed by an index that shows
149where on the listing to find the entry for that routine.
150.sh 1 "Using the Profiles"
151.pp
152The profiler is a useful tool for improving
153a set of routines that implement an abstraction.
154It can be helpful in identifying poorly coded routines,
155and in evaluating the new algorithms and code that replace them.
156Taking full advantage of the profiler
157requires a careful examination of the call graph profile,
158and a thorough knowledge of the abstractions underlying
159the program.
160.pp
161The easiest optimization that can be performed
162is a small change
163to a control construct or data structure that improves the
164running time of the program.
165An obvious starting point
166is a routine that is called many times.
167For example, suppose an output
168routine is the only parent
169of a routine that formats the data.
170If this format routine is expanded inline in the
171output routine, the overhead of a function call and
172return can be saved for each datum that needs to be formatted.
173.pp
174The drawback to inline expansion is that the data abstractions
175in the program may become less parameterized,
176hence less clearly defined.
177The profiling will also become less useful since the loss of
178routines will make its output more granular.
179For example,
180if the symbol table functions ``lookup'', ``insert'', and ``delete''
181are all merged into a single parameterized routine,
182it will be impossible to determine the costs
183of any one of these individual functions from the profile.
184.pp
185Further potential for optimization lies in routines that
186implement data abstractions whose total execution
187time is long.
188For example, a lookup routine might be called only a few
189times, but use an inefficient linear search algorithm,
190that might be replaced with a binary search.
191Alternately, the discovery that a rehashing function is being
192called excessively, can lead to a different
193hash function or a larger hash table.
194If the data abstraction function cannot easily be speeded up,
195it may be advantageous to cache its results,
196and eliminate the need to rerun
197it for identical inputs.
198.pp
199This tool is best used in an iterative approach:
200profiling the program,
201eliminating one bottleneck,
202then finding some other part of the program
203that begins to dominate execution time.
204For instance, we have used \fBgprof\fR on itself;
205eliminating, rewriting, and inline expanding routines,
206until reading
207data files (hardly a target for optimization!)
208represents the dominating factor in its execution time.
209.pp
210Certain types of programs are not easily analyzed by \fBgprof\fR.
211They are typified by programs that exhibit a large degree of
212recursion, such as recursive descent compilers.
213The problem is that most of the major routines are grouped
214into a single monolithic cycle.
215As in the symbol table abstraction that is placed
216in one routine,
217it is impossible to distinguish which members of the cycle are
218responsible for the execution time.
219Unfortunately there are no easy modifications to these programs that
220make them amenable to analysis.
221.pp
222A completely different use of the profiler is to analyze the control
223flow of an unfamiliar program.
224If you receive a program from another user that you need to modify
225in some small way,
226it is often unclear where the changes need to be made.
227By running the program on an example and then using \fBgprof\fR,
228you can get a view of the structure of the program.
229.pp
230Consider an example in which you need to change the output format
231of the program.
232For purposes of this example suppose that the call graph
233of the output portion of the program has the following structure:
234.(b
235.TS
236center;
237c c c.
238
239calc1	calc2	calc3
240
241
242format1		format2
243
244
245	``write''
246
247.TE
248.)b
249Initially you look through the \fBgprof\fR
250output for the system call ``write''.
251The format routine you will need to change is probably
252among the parents of the ``write'' procedure.
253The next step is to look at the profile entry for each
254of parents of ``write'',
255in this example either ``format1'' or ``format2'',
256to determine which one to change.
257Each format routine will have one or more parents,
258in this example ``calc1'', ``calc2'', and ``calc3''.
259By inspecting the source code for each of these routines
260you can determine which format routine generates the output that
261you wish to modify.
262Since the \fBgprof\fR entry shows all the
263potential calls to the format routine you intend to change,
264you can determine if your modifications will affect output that
265should be left alone.
266If you desire to change the output of ``calc2'', but not ``calc3'',
267then formatting routine ``format2'' needs to be split
268into two separate routines,
269one of which implements the new format.
270You can then retarget just the call by ``calc2''
271that needs the new format.
272It should be noted that the static call information is particularly
273useful here since the test case you run probably will not
274exercise the entire program.
275.sh 1 "Conclusions"
276.pp
277We have created a profiler that aids in the evaluation
278of modular programs.
279For each routine in the program,
280the profile shows the extent to which that routine
281helps support various abstractions,
282and how that routine uses other abstractions.
283The profile accurately assesses the cost of routines
284at all levels of the program decomposition.
285The profiler is easily used,
286and can be compiled into the program without any prior planning by
287the programmer.
288It adds only five to thirty percent execution overhead to the program
289being profiled,
290produces no additional output until after the program finishes,
291and allows the program to be measured in its actual environment.
292Finally, the profiler runs on a time-sharing system
293using only the normal services provided by the operating system
294and compilers.
295