1\"	@(#)present.me	1.11	03/15/82
2.sh 1 "Data Presentation"
3.pp
4The data is presented to the user in two different formats.
5The first presentation simply lists the routines
6without regard to the amount of time their descendants use.
7The second presentation incorporates the call graph of the
8program.
9.sh 2 "The Flat Profile
10.pp
11The flat profile consists of a list of all the routines
12that are called during execution of the program,
13with the count of the number of times they are called
14and the number of seconds of execution time for which they
15are themselves accountable.
16The routines are listed in decreasing order of execution time.
17A list of the routines that are never called during execution of
18the program is also available
19to verify that nothing important is omitted by
20this execution.
21The flat profile gives a quick overview of the routines that are used,
22and shows the routines that are themselves responsible
23for large fractions of the execution time.
24In practice,
25this profile usually shows that no single function
26is overwhelmingly responsible for
27the total time of the program.
28Notice that for this profile,
29the individual times sum to the total execution time.
30.sh 2 "The Call Graph Profile"
31.sz 10
32.(z
33.TS
34box center;
35c c c c c l l
36c c c c c l l
37c c c c c l l
38l n n n c l l.
39				called/total	\ \ parents
40index	%time	self	descendants	called+self	name	index
41				called/total	\ \ children
42_
43		0.20	1.20	4/10	\ \ \s-1CALLER1\s+1	[7]
44		0.30	1.80	6/10	\ \ \s-1CALLER2\s+1	[1]
45[2]	41.5	0.50	3.00	10+4	\s-1EXAMPLE\s+1	[2]
46		1.50	1.00	20/40	\ \ \s-1SUB1\s+1 <cycle1>	[4]
47		0.00	0.50	1/5	\ \ \s-1SUB2\s+1 	[9]
48		0.00	0.00	0/5	\ \ \s-1SUB3\s+1 	[11]
49.TE
50.ce 2
51Profile entry for \s-1EXAMPLE\s+1.
52Figure 4.
53.)z
54.pp
55Ideally, we would like to print the call graph of the program,
56but we are limited by the two-dimensional nature of our output
57devices.
58We cannot assume that a call graph is planar,
59and even if it is, that we can print a planar version of it.
60Instead, we choose to list each routine,
61together with information about
62the routines that are its direct parents and children.
63This listing presents a window into the call graph.
64Based on our experience,
65both parent information and child information
66is important,
67and should be available without searching
68through the output.
69.pp
70The major entries of the call graph profile are the entries from the
71flat profile, augmented by the time propagated to each
72routine from its descendants.
73This profile is sorted by the sum of the time for the routine
74itself plus the time inherited from its descendants.
75The profile shows which of the higher level routines
76spend large portions of the total execution time
77in the routines that they call.
78For each routine, we show the amount of time passed by each child
79to the routine, which includes time for the child itself
80and for the descendants of the child
81(and thus the descendants of the routine).
82We also show the percentage these times represent of the total time
83accounted to the child.
84Similarly, the parents of each routine are listed,
85along with time,
86and percentage of total routine time,
87propagated to each one.
88.pp
89Cycles are handled as single entities.
90The cycle as a whole is shown as though it were a single routine,
91except that members of the cycle are listed in place of the children.
92Although the number of calls of each member
93from within the cycle are shown,
94they do not affect time propagation.
95When a child is a member of a cycle,
96the time shown is the appropriate fraction of the time
97for the whole cycle.
98Self-recursive routines have their calls broken
99down into calls from the outside and self-recursive calls.
100Only the outside calls affect the propagation of time.
101.pp
102The following example is a typical fragment of a call graph.
103.(b
104.TS
105center;
106c c c.
107
108\s-1CALLER1\s+1		\s-1CALLER2\s+1
109
110
111	\s-1EXAMPLE\s+1
112
113
114\s-1SUB1\s+1	\s-1SUB2\s+1	\s-1SUB3\s+1
115
116.TE
117.)b
118The entry in the call graph profile listing for this example is
119shown in Figure 4.
120.pp
121The entry is for routine \s-1EXAMPLE\s+1, which has
122the Caller routines as its parents,
123and the Sub routines as its children.
124The reader should keep in mind that all information
125is given \fIwith respect to \s-1EXAMPLE\s+1\fP.
126The index in the first column shows that \s-1EXAMPLE\s+1
127is the second entry in the profile listing.
128The \s-1EXAMPLE\s+1 routine is called ten times, four times by \s-1CALLER1\s+1,
129and six times by \s-1CALLER2\s+1.
130Consequently 40% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER1\s+1,
131and 60% of \s-1EXAMPLE\s+1's time is propagated to \s-1CALLER2\s+1.
132The self and descendant fields of the parents
133show the amount of self and descendant time \s-1EXAMPLE\s+1
134propagates to them (but not the time used by
135the parents directly).
136Note that \s-1EXAMPLE\s+1 calls itself recursively four times.
137The routine \s-1EXAMPLE\s+1 calls routine \s-1SUB1\s+1 twenty times, \s-1SUB2\s+1 once,
138and never calls \s-1SUB3\s+1.
139Since \s-1SUB2\s+1 is called a total of five times,
14020% of its self and descendant time is propagated to \s-1EXAMPLE\s+1's
141descendant time field.
142Because \s-1SUB1\s+1 is a member of \fIcycle 1\fR,
143the self and descendant times
144and call count fraction
145are those for the cycle as a whole.
146Since cycle 1 is called a total of forty times
147(not counting calls among members of the cycle),
148it propagates 50% of the cycle's self and descendant
149time to \s-1EXAMPLE\s+1's descendant time field.
150Finally each name is followed by an index that shows
151where on the listing to find the entry for that routine.
152.sh 1 "Using the Profiles"
153.pp
154The profiler is a useful tool for improving
155a set of routines that implement an abstraction.
156It can be helpful in identifying poorly coded routines,
157and in evaluating the new algorithms and code that replace them.
158Taking full advantage of the profiler
159requires a careful examination of the call graph profile,
160and a thorough knowledge of the abstractions underlying
161the program.
162.pp
163The easiest optimization that can be performed
164is a small change
165to a control construct or data structure that improves the
166running time of the program.
167An obvious starting point
168is a routine that is called many times.
169For example, suppose an output
170routine is the only parent
171of a routine that formats the data.
172If this format routine is expanded inline in the
173output routine, the overhead of a function call and
174return can be saved for each datum that needs to be formatted.
175.pp
176The drawback to inline expansion is that the data abstractions
177in the program may become less parameterized,
178hence less clearly defined.
179The profiling will also become less useful since the loss of
180routines will make its output more granular.
181For example,
182if the symbol table functions ``lookup'', ``insert'', and ``delete''
183are all merged into a single parameterized routine,
184it will be impossible to determine the costs
185of any one of these individual functions from the profile.
186.pp
187Further potential for optimization lies in routines that
188implement data abstractions whose total execution
189time is long.
190For example, a lookup routine might be called only a few
191times, but use an inefficient linear search algorithm,
192that might be replaced with a binary search.
193Alternately, the discovery that a rehashing function is being
194called excessively, can lead to a different
195hash function or a larger hash table.
196If the data abstraction function cannot easily be speeded up,
197it may be advantageous to cache its results,
198and eliminate the need to rerun
199it for identical inputs.
200These and other ideas for program improvement are discussed in
201[Bentley81].
202.pp
203This tool is best used in an iterative approach:
204profiling the program,
205eliminating one bottleneck,
206then finding some other part of the program
207that begins to dominate execution time.
208For instance, we have used \fBgprof\fR on itself;
209eliminating, rewriting, and inline expanding routines,
210until reading
211data files (hardly a target for optimization!)
212represents the dominating factor in its execution time.
213.pp
214Certain types of programs are not easily analyzed by \fBgprof\fR.
215They are typified by programs that exhibit a large degree of
216recursion, such as recursive descent compilers.
217The problem is that most of the major routines are grouped
218into a single monolithic cycle.
219As in the symbol table abstraction that is placed
220in one routine,
221it is impossible to distinguish which members of the cycle are
222responsible for the execution time.
223Unfortunately there are no easy modifications to these programs that
224make them amenable to analysis.
225.pp
226A completely different use of the profiler is to analyze the control
227flow of an unfamiliar program.
228If you receive a program from another user that you need to modify
229in some small way,
230it is often unclear where the changes need to be made.
231By running the program on an example and then using \fBgprof\fR,
232you can get a view of the structure of the program.
233.pp
234Consider an example in which you need to change the output format
235of the program.
236For purposes of this example suppose that the call graph
237of the output portion of the program has the following structure:
238.(b
239.TS
240center;
241c c c.
242
243\s-1CALC1\s+1	\s-1CALC2\s+1	\s-1CALC3\s+1
244
245
246\s-1FORMAT1\s+1		\s-1FORMAT2\s+1
247
248
249	``\s-1WRITE\s+1''
250
251.TE
252.)b
253Initially you look through the \fBgprof\fR
254output for the system call ``\s-1WRITE\s+1''.
255The format routine you will need to change is probably
256among the parents of the ``\s-1WRITE\s+1'' procedure.
257The next step is to look at the profile entry for each
258of parents of ``\s-1WRITE\s+1'',
259in this example either ``\s-1FORMAT1\s+1'' or ``\s-1FORMAT2\s+1'',
260to determine which one to change.
261Each format routine will have one or more parents,
262in this example ``\s-1CALC1\s+1'', ``\s-1CALC2\s+1'', and ``\s-1CALC3\s+1''.
263By inspecting the source code for each of these routines
264you can determine which format routine generates the output that
265you wish to modify.
266Since the \fBgprof\fR entry shows all the
267potential calls to the format routine you intend to change,
268you can determine if your modifications will affect output that
269should be left alone.
270If you desire to change the output of ``\s-1CALC2\s+1'', but not ``\s-1CALC3\s+1'',
271then formatting routine ``\s-1FORMAT2\s+1'' needs to be split
272into two separate routines,
273one of which implements the new format.
274You can then retarget just the call by ``\s-1CALC2\s+1''
275that needs the new format.
276It should be noted that the static call information is particularly
277useful here since the test case you run probably will not
278exercise the entire program.
279.sh 1 "Conclusions"
280.pp
281We have created a profiler that aids in the evaluation
282of modular programs.
283For each routine in the program,
284the profile shows the extent to which that routine
285helps support various abstractions,
286and how that routine uses other abstractions.
287The profile accurately assesses the cost of routines
288at all levels of the program decomposition.
289The profiler is easily used,
290and can be compiled into the program without any prior planning by
291the programmer.
292It adds only five to thirty percent execution overhead to the program
293being profiled,
294produces no additional output until after the program finishes,
295and allows the program to be measured in its actual environment.
296Finally, the profiler runs on a time-sharing system
297using only the normal services provided by the operating system
298and compilers.
299