Wednesday, October 28, 2009

Benchmarking Java method performance

Very often there is a need to make a quick estimation of which of several method implementations performs better. A typical question is how many fractions of second it would take for a single invocation to complete. With Java (and any other JIT-based virtual machines) the estimation is especially difficult as certain number of initial invocations may be done in the interpreted mode with later switching to the compiled form plus some other optimizations may occur. Sometimes though the exact call duration may not be of much interest. We'd rather find out which implementation performs significantly better by comparing the performance in a simulated usage scenario. For that purpose I wrote a few classes that can facilitate that micro-benchmarking.
Our basic requirements are simple:
  1. application should simulate a sustained invocation pattern for the measured method so that any possible compiler warm-up could be neglected after running for long enough time;
  2. application should report in real-time:
    • the average time per a single call;
    • the average number of invocations per second;
  3. application should support pluggable scenarios so that running with a different scenario would not require a full re-build;

Each evaluated implementation has to be wrapped into the IScenario interface:
public interface IScenario
{
   
Object[] generateSample();
    Object invoke
(Object[] args);
}
The generateSample() method is expected to return an array of tuples of arguments for the invoke() method which will be called within a infinite loop. This gives us a freedom to create a mixture of different argument combinations that can be attempted for the scenario invocation.
What the main class is going to do are 1) spawning a thread for executing the simulation scenario, and 2) periodically polling the thread for the number of calls made so far:
public class Main
{
   
public static void main(String[] args) {
       
final ExecutorService executor = Executors.newSingleThreadExecutor();
       
try {
           
final IOptions options =
                ConfigUtils.parseCommandLine
(args, IOptions.class);
           
final IScenario scenario =
               
(IScenario) Class.forName(
                   
options.getScenarioClassName()).newInstance();
            System.out.println
("Initializing...");
           
final CountDownLatch simulationStarted = new CountDownLatch(1);
           
final Simulation simulation =
               
new Simulation(scenario, simulationStarted);
           
final Future<?> simulationFuture = executor.submit(simulation);
            System.out.println
("Awaiting simulation to start...");
            simulationStarted.await
();
            System.out.format
("Started...%n");
           
final Stopwatch sw = Stopwatch.start();
           
while (!simulationFuture.isDone()) {
               
Thread.sleep(1000);
               
final long invocations = simulation.getInvocations();
               
final double invocationRate =
                    invocations / sw.elapsedSeconds
();
                System.out.format
(
                   
"\rAvgDuration: %8.1fns AvgRate: %16.1f calls/sec;" +
                   
" TotalCalls: %16d; TotalTime: %#6.0s",
                   
1e9 / invocationRate,
                    invocationRate,
                    invocations, sw
);
           
}
           
simulationFuture.get();
       
}
       
catch (ConfigurationException e) {
           
final PrintWriter pw = new PrintWriter(System.out);
            pw.format
("Arguments:%n");
           
try {
               
ConfigUtils.printUsage(IOptions.class, pw);
           
}
           
catch (Exception e1) {
               
System.err.println("Error: " + e1);
           
}
        }
       
catch (InterruptedException e) {}
       
catch (Exception e) {
           
e.printStackTrace();
       
}
       
System.out.println("Done.");
        executor.shutdownNow
();
   
}
   
   
public static interface IOptions
   
{
       
@Option(id = 1, argumentTag = "SCENARIO",
            description =
"scenario implementation class",
            opt =
's', longOpt = "scenario", required = true)
       
String getScenarioClassName();
   
}
}
The implementation follows the best practices and re-uses the implementation of command line option parser created earlier in the Parsing command-line options post series.
The simulation will be running as a task in the single-threaded executor:
final class Simulation implements Runnable
{
   
private final CountDownLatch _simulationStarted;
   
private volatile long invocations = 0;
   
private final IScenario _scenario;

   
public Simulation(IScenario scenario, CountDownLatch simulationStarted) {
       
_scenario = scenario;
        _simulationStarted = simulationStarted;
   
}

   
@Override
   
public void run() {
       
try {
           
System.out.format("Scenario: %s%n",
                _scenario.getClass
().getSimpleName());
           
long overall = 0;
           
final Object[] sample = _scenario.generateSample();
           
int i = 0;
            _simulationStarted.countDown
();
           
while (!Thread.currentThread().isInterrupted()) {
               
i = i % sample.length;
               
final Object[] args = (Object[]) sample[i];
               
final Object result = _scenario.invoke(args);
               
// using returned value to avoid JIT optimizations
               
overall += (result != null) ? result.hashCode() : 0;
                invocations++;
                i++;
           
}
        }
       
catch (Throwable t) {
           
throw new IllegalStateException(t);
       
}
    }

   
public final long getInvocations() {
       
return invocations;
   
}
}


Now lets create a simple scenario. We'll do a polynomial calculation using a straightforward approach involving Math.pow() function:
public class Polynomial1 implements IScenario
{
   
public final double compute(double[] a, double x) {
       
double sum = 0.0;
       
final int length = a.length;
       
for (int i = 0; i < length; i++) {
           
final double elem = a[i];
            sum += a
[i] * Math.pow(x, length - i);
       
}
       
return sum;
   
}

   
@Override
   
public Object[] generateSample() {
       
final double[] values = { 1.2, 1.3, 1.4, 2.2, 2.3, 2.4, 3.4, 3.5, 3.6 };
       
final double[] xs = { 21.2, 31.3, 11.4, 22.2, 32.3, 32.4, 53.4, 33.5, 13.6 };
       
final Object[] sample = new Object[xs.length];
       
for (int i = 0; i < xs.length; i++) {
           
sample[i] = new Object[] { values, xs[i] };
       
}
       
return sample;
   
}

   
@Override
   
public Object invoke(Object[] args) {
       
final double[] values = (double[]) args[0];
       
final double x = (Double) args[1];
       
return compute(values, x);
   
}
}

Lets start the application (in the local project I have created a runnable jar) and wait for some time until it reaches the sustained state:
C:\projects\performance>java -server -jar dist\app.jar -s sample.perf.Polynomial1
Initializing...
Awaiting simulation to start...
Scenario: Polynomial1
Started...
AvgDuration: 1526.7ns AvgRate: 654996.8 calls/sec; TotalCalls: 39316798; TotalTime: 60s

Now lets optimize the polynomial calculation by using Horner scheme:
public class Polynomial2 implements IScenario
{
   
public double foo(double[] a, double x) {
       
double sum = 0.0;
       
for (int i = 0; i < a.length; i++) {
           
final double elem = a[i];
            sum =
(sum + elem) * x;
       
}
       
return sum;
   
}
   
   
@Override
   
public Object[] generateSample() {
       
final double[] values = { 1.2, 1.3, 1.4, 2.2, 2.3, 2.4, 3.4, 3.5, 3.6 };
       
final double[] xs = { 21.2, 31.3, 11.4, 22.2, 32.3, 32.4, 53.4, 33.5, 13.6 };
       
final Object[] sample = new Object[xs.length];
       
for(int i = 0; i < xs.length; i++) {
           
sample[i] = new Object[] {values, xs[i]};
       
}
       
return sample;
   
}

   
@Override
   
public Object invoke(Object[] args) {
       
final double[] values = (double[]) args[0];
       
final double x = (Double) args[1];
       
return foo(values, x);
   
}
}

Running it produces the following output:
C:\projects\performance>java -server -jar dist\app.jar -s sample.perf.Polynomial2
Initializing...
Awaiting simulation to start...
Scenario: Polynomial2
Started...
AvgDuration: 50.7ns AvgRate: 19726869.7 calls/sec; TotalCalls: 591850033; TotalTime: 30s

I didn't need to run it for 60s just because it reached quite sustained state much earlier. So we can see that using Horner scheme makes our function 30 times faster which determines our choice.
Things to be done:
  1. the warm-up phase can be made explicit by running for the pre-determined number of iterations before the actual measurement starts - this is a very common approach;
  2. running multiple scenarios in parallel should be available to allow for the quickest comparison.
  3. formatting of the results may be improved.
.

No comments:

Post a Comment