Tuesday, September 23, 2014

Comparing constant duration profiles

I was putting together my slides for Open World, and in one of them I'm showing profile data from a server-style workload. ie one that keeps running until stopped. In this case the profile can be an arbitrary duration, and it's the work done in that time which is the important metric, not the total amount of time taken.

Profiling for a constant duration is a slightly unusual situation. We normally profile a workload that takes N seconds, do some tuning, and it now takes (N-S) seconds, and we can say that we improved performance by S/N percent. This is represented by the left pair of boxes in the following diagram:

In the diagram you can see that the routine B got optimised and therefore the entire runtime, for completing the same amount of work, reduced by an amount corresponding to the performance improvement for B.

Let's run through the same scenario, but instead of profiling for a constant amount of work, we profile for a constant duration. In the diagram this is represented by the outermost pair of boxes.

Both profiles run for the same total amount of time, but the right hand profile has less time spent in routine B() than the left profile, because the time in B() has reduced more time is spent in A(). This is natural, I've made some part of the code more efficient, I'm observing for the same amount of time, so I must spend more time in the part of the code that I've not optimised.

So what's the performance gain? In this case we're more likely to look at the gain in throughput. It's a safe assumption that the amount of time in A() corresponds to the amount of work done - ie that if we did T units of work, then the average cost per unit work A()/T is the same across the pair of experiments. So if we did T units of work in the first experiment, then in the second experiment we'd do T * A'()/A(). ie the throughput increases by S = A'()/A() where S is the scaling factor. What is interesting about this is that A() represents any measure of time spent in code which was not optimised. So A() could be a single routine or it could be all the routines that are untouched by the optimisation.