xref: /original-bsd/usr.bin/gprof/PSD.doc/postp.me (revision f4ab8a4c)
1\"	@(#)postp.me	1.11 03/13/82
2.EQ
3delim ##
4gsize 12
5.EN
6.sh 1 "Post Processing"
7.pp
8Having gathered the arcs of the call graph and timing information
9for an execution of the program,
10we are interested in attributing the time for each routine to the
11routines that call it.
12We build a dynamic call graph with arcs from caller to callee,
13and propagate time from descendants to ancestors
14by topologically sorting the call graph.
15Time propagation is performed from the leaves of the
16call graph toward the roots, according to the order
17assigned by a topological numbering algorithm.
18The topological numbering ensures that
19all edges in the graph go from higher numbered nodes to lower
20numbered nodes.
21If we propagate time from nodes in the
22reverse order assigned by the algorithm,
23execution time can be propagated from descendants to ancestors
24after a single traversal of each arc in the call graph.
25Each parent receives some fraction of a child's time.
26Thus time is charged to the
27caller in addition to being charged to the callee.
28.(b
29.TS
30center;
31c c c c c.
328				9
33
34
35	3		7
36
37
382		5		6
39
40
41	1		4
42
43
44.TE
45.ce 1
46Topological ordering
47.ce 0
48.)b
49.pp
50Let #C sub e# be the number of calls to some routine,
51#e#, and #C sub e sup r# be the number of
52calls from a caller #r# to a callee #e#.
53Since we are assuming each call to a routine takes the
54average amount of time for all calls to that routine,
55the caller is accountable for
56#C sub e sup r / C sub e#
57of the time spent by the callee.
58Let the #S sub e# be the #selftime# of a routine, #e#.
59The selftime of a routine can be determined from the
60timing information gathered during profiled program execution.
61The total time, #T sub r#, we wish to account to a routine
62#r#, is then given by the recurrence equation:
63.EQ
64T sub r ~ = ~ {S sub r} ~ + ~
65                   sum from {r ~ roman CALLS ~ e}
66                   {T sub e times {{C sub e sup r} over {C sub e}}}
67.EN
68where #r ~ roman CALLS ~ e# is a relation showing all routines
69#e# called by a routine #r#.
70This relation is easily available from the call graph.
71.pp
72However, if the execution contains recursive calls,
73the call graph has cycles that
74cannot be topologically sorted.
75In these cases, we discover strongly-connected
76components in the call graph,
77treat each such component as a single node,
78and then sort the resulting graph.
79We use a variation of Tarjan's strongly-connected
80components algorithm
81that discovers strongly-connected components as it is assigning
82topological order numbers [Tarjan72].
83.pp
84Time propagation within strongly connected
85components is a problem.
86For example, a self-recursive routine
87(a trivial cycle in the call graph)
88is accountable for all the time it
89uses in all its recursive instantiations.
90In our scheme, this time should be
91shared among its call graph parents.
92The arcs from a routine to itself are of interest,
93but do not participate in time propagation.
94Thus the simple equation for time propagation
95does not work within strongly connected components.
96Time is not propagated from one member of a cycle to another,
97since, by definition, this involves propagating time from a routine
98to itself.
99In addition, children of one member of a cycle
100must be considered children of all members of the cycle.
101Similarly, parents of one member of the cycle must inherit
102all members of the cycle as descendants.
103It is for these reasons that we collapse connected components.
104Our solution collects all members of a cycle together,
105summing the time and call counts for all members.
106All calls into the cycle are made to share the total
107time of the cycle, and all descendants of the cycle
108propagate time into the cycle as a whole.
109Calls among the members of the cycle
110do not propagate any time,
111though they are listed in the call graph profile.
112.(b
113.TS
114center;
115c c c c c.
116o				o
117
118
119	\(bu		\(bu
120
121
122o		o		o
123
124
125	o		o
126
127
128.TE
129.ce 1
130Cycle to be collapsed.
131.ce 0
132.)b
133.(b
134.TS
135center;
136c c c c c.
1377				8
138
139
140	\fI6\fP		\fI6\fP
141
142
1432		4		5
144
145
146	1		3
147
148
149.TE
150.ce 1
151Topological numbering after cycle collapsing.
152.ce 0
153.)b
154.pp
155Since the technique described above only collects the
156dynamic call graph,
157and the program typically does not call every routine
158on each execution,
159different executions can introduce different cycles in the
160dynamic call graph.
161Since cycles often have a significant effect on time propagation,
162it is desirable to incorporate the static call graph so that cycles
163will have the same members regardless of how the program runs.
164.pp
165The static call graph can be constructed from the source text
166of the program.
167However, discovering the static call graph from the source text
168would require two moderately difficult steps:
169finding the source text for the program
170(which may not be available),
171and scanning and parsing that text,
172which may be in any one of several languages.
173In our programming system,
174the static calling information is also contained in the
175executable version of the program,
176which we already have available,
177and which is in language-independent form.
178One can examine the instructions
179in the object program,
180looking for calls to routines, and note which
181routines can be called.
182This technique allows us to add arcs to those already in the
183dynamic call graph.
184If a statically discovered arc already exists in the dynamic call
185graph, no action is required.
186Statically discovered arcs that do not exist in the dynamic call
187graph are added to the graph with a traversal count of zero.
188Thus they are never responsible for any time propagation.
189However, they may affect the structure of the graph.
190Since they may complete strongly connected components,
191the static call graph construction is
192done before topological ordering.
193