gprof/PSD.doc/postp.me

\"	@(#)postp.me	1.14 03/15/82
.EQ
delim ##
gsize 11
.EN
.sh 1 "Post Processing"
.pp
Having gathered the arcs of the call graph and timing information
for an execution of the program,
we are interested in attributing the time for each routine to the
routines that call it.
We build a dynamic call graph with arcs from caller to callee,
and propagate time from descendants to ancestors
by topologically sorting the call graph.
Time propagation is performed from the leaves of the
call graph toward the roots, according to the order
assigned by a topological numbering algorithm.
The topological numbering ensures that
all edges in the graph go from higher numbered nodes to lower
numbered nodes.
An example is given in Figure 1.
If we propagate time from nodes in the
order assigned by the algorithm,
execution time can be propagated from descendants to ancestors
after a single traversal of each arc in the call graph.
Each parent receives some fraction of a child's time.
Thus time is charged to the
caller in addition to being charged to the callee.
.(z C
.TS
center;
c c c c c.
8				9


	3		7


2		5		6


	1		4


.TE
.ce 2
Topological ordering
Figure 1.
.ce 0
.)z
.pp
Let #C sub e# be the number of calls to some routine,
#e#, and #C sub e sup r# be the number of
calls from a caller #r# to a callee #e#.
Since we are assuming each call to a routine takes the
average amount of time for all calls to that routine,
the caller is accountable for
#C sub e sup r / C sub e#
of the time spent by the callee.
Let the #S sub e# be the #selftime# of a routine, #e#.
The selftime of a routine can be determined from the
timing information gathered during profiled program execution.
The total time, #T sub r#, we wish to account to a routine
#r#, is then given by the recurrence equation:
.EQ
T sub r ~ = ~ {S sub r} ~ + ~
                   sum from {r ~ roman CALLS ~ e}
                   {T sub e times {{C sub e sup r} over {C sub e}}}
.EN
where #r ~ roman CALLS ~ e# is a relation showing all routines
#e# called by a routine #r#.
This relation is easily available from the call graph.
.pp
However, if the execution contains recursive calls,
the call graph has cycles that
cannot be topologically sorted.
In these cases, we discover strongly-connected
components in the call graph,
treat each such component as a single node,
and then sort the resulting graph.
We use a variation of Tarjan's strongly-connected
components algorithm
that discovers strongly-connected components as it is assigning
topological order numbers [Tarjan72].
.pp
Time propagation within strongly connected
components is a problem.
For example, a self-recursive routine
(a trivial cycle in the call graph)
is accountable for all the time it
uses in all its recursive instantiations.
In our scheme, this time should be
shared among its call graph parents.
The arcs from a routine to itself are of interest,
but do not participate in time propagation.
Thus the simple equation for time propagation
does not work within strongly connected components.
Time is not propagated from one member of a cycle to another,
since, by definition, this involves propagating time from a routine
to itself.
In addition, children of one member of a cycle
must be considered children of all members of the cycle.
Similarly, parents of one member of the cycle must inherit
all members of the cycle as descendants.
It is for these reasons that we collapse connected components.
Our solution collects all members of a cycle together,
summing the time and call counts for all members.
All calls into the cycle are made to share the total
time of the cycle, and all descendants of the cycle
propagate time into the cycle as a whole.
Calls among the members of the cycle
do not propagate any time,
though they are listed in the call graph profile.
.pp
Figure 2 shows a modified version of the call graph of Figure 1,
in which the nodes labelled 3 and 7 in Figure 1 are mutually
recursive.
The topologically sorted graph after the cycle is collapsed is
given in Figure 3.
.(z C
.TS
center;
c c c c c.
o				o


	\(bu		\(bu


o		o		o


	o		o


.TE
.ce 2
Cycle to be collapsed.
Figure 2.
.ce 0
.)z
.(z C
.TS
center;
c c c c c.
7				8


	\fI6\fP		\fI6\fP


2		4		5


	1		3


.TE
.ce 2
Topological numbering after cycle collapsing.
Figure 3.
.ce 0
.)z
.pp
Since the technique described above only collects the
dynamic call graph,
and the program typically does not call every routine
on each execution,
different executions can introduce different cycles in the
dynamic call graph.
Since cycles often have a significant effect on time propagation,
it is desirable to incorporate the static call graph so that cycles
will have the same members regardless of how the program runs.
.pp
The static call graph can be constructed from the source text
of the program.
However, discovering the static call graph from the source text
would require two moderately difficult steps:
finding the source text for the program
(which may not be available),
and scanning and parsing that text,
which may be in any one of several languages.
.pp
In our programming system,
the static calling information is also contained in the
executable version of the program,
which we already have available,
and which is in language-independent form.
One can examine the instructions
in the object program,
looking for calls to routines, and note which
routines can be called.
This technique allows us to add arcs to those already in the
dynamic call graph.
If a statically discovered arc already exists in the dynamic call
graph, no action is required.
Statically discovered arcs that do not exist in the dynamic call
graph are added to the graph with a traversal count of zero.
Thus they are never responsible for any time propagation.
However, they may affect the structure of the graph.
Since they may complete strongly connected components,
the static call graph construction is
done before topological ordering.