Our basic requirements are simple:
- application should simulate a sustained invocation pattern for the measured method so that any possible compiler warm-up could be neglected after running for long enough time;
- application should report in real-time:
- the average time per a single call;
- the average number of invocations per second;
- application should support pluggable scenarios so that running with a different scenario would not require a full re-build;
Each evaluated implementation has to be wrapped into the IScenario interface:
public interface IScenario
{
Object[] generateSample();
Object invoke(Object[] args);
}
What the main class is going to do are 1) spawning a thread for executing the simulation scenario, and 2) periodically polling the thread for the number of calls made so far:
public class Main
{
public static void main(String[] args) {
final ExecutorService executor = Executors.newSingleThreadExecutor();
try {
final IOptions options =
ConfigUtils.parseCommandLine(args, IOptions.class);
final IScenario scenario =
(IScenario) Class.forName(
options.getScenarioClassName()).newInstance();
System.out.println("Initializing...");
final CountDownLatch simulationStarted = new CountDownLatch(1);
final Simulation simulation =
new Simulation(scenario, simulationStarted);
final Future<?> simulationFuture = executor.submit(simulation);
System.out.println("Awaiting simulation to start...");
simulationStarted.await();
System.out.format("Started...%n");
final Stopwatch sw = Stopwatch.start();
while (!simulationFuture.isDone()) {
Thread.sleep(1000);
final long invocations = simulation.getInvocations();
final double invocationRate =
invocations / sw.elapsedSeconds();
System.out.format(
"\rAvgDuration: %8.1fns AvgRate: %16.1f calls/sec;" +
" TotalCalls: %16d; TotalTime: %#6.0s",
1e9 / invocationRate,
invocationRate,
invocations, sw);
}
simulationFuture.get();
}
catch (ConfigurationException e) {
final PrintWriter pw = new PrintWriter(System.out);
pw.format("Arguments:%n");
try {
ConfigUtils.printUsage(IOptions.class, pw);
}
catch (Exception e1) {
System.err.println("Error: " + e1);
}
}
catch (InterruptedException e) {}
catch (Exception e) {
e.printStackTrace();
}
System.out.println("Done.");
executor.shutdownNow();
}
public static interface IOptions
{
@Option(id = 1, argumentTag = "SCENARIO",
description = "scenario implementation class",
opt = 's', longOpt = "scenario", required = true)
String getScenarioClassName();
}
}
The simulation will be running as a task in the single-threaded executor:
final class Simulation implements Runnable
{
private final CountDownLatch _simulationStarted;
private volatile long invocations = 0;
private final IScenario _scenario;
public Simulation(IScenario scenario, CountDownLatch simulationStarted) {
_scenario = scenario;
_simulationStarted = simulationStarted;
}
@Override
public void run() {
try {
System.out.format("Scenario: %s%n",
_scenario.getClass().getSimpleName());
long overall = 0;
final Object[] sample = _scenario.generateSample();
int i = 0;
_simulationStarted.countDown();
while (!Thread.currentThread().isInterrupted()) {
i = i % sample.length;
final Object[] args = (Object[]) sample[i];
final Object result = _scenario.invoke(args);
// using returned value to avoid JIT optimizations
overall += (result != null) ? result.hashCode() : 0;
invocations++;
i++;
}
}
catch (Throwable t) {
throw new IllegalStateException(t);
}
}
public final long getInvocations() {
return invocations;
}
}
Now lets create a simple scenario. We'll do a polynomial calculation using a straightforward approach involving Math.pow() function:
public class Polynomial1 implements IScenario
{
public final double compute(double[] a, double x) {
double sum = 0.0;
final int length = a.length;
for (int i = 0; i < length; i++) {
final double elem = a[i];
sum += a[i] * Math.pow(x, length - i);
}
return sum;
}
@Override
public Object[] generateSample() {
final double[] values = { 1.2, 1.3, 1.4, 2.2, 2.3, 2.4, 3.4, 3.5, 3.6 };
final double[] xs = { 21.2, 31.3, 11.4, 22.2, 32.3, 32.4, 53.4, 33.5, 13.6 };
final Object[] sample = new Object[xs.length];
for (int i = 0; i < xs.length; i++) {
sample[i] = new Object[] { values, xs[i] };
}
return sample;
}
@Override
public Object invoke(Object[] args) {
final double[] values = (double[]) args[0];
final double x = (Double) args[1];
return compute(values, x);
}
}
Lets start the application (in the local project I have created a runnable jar) and wait for some time until it reaches the sustained state:
C:\projects\performance>java -server -jar dist\app.jar -s sample.perf.Polynomial1
Initializing...
Awaiting simulation to start...
Scenario: Polynomial1
Started...
AvgDuration: 1526.7ns AvgRate: 654996.8 calls/sec; TotalCalls: 39316798; TotalTime: 60s
Initializing...
Awaiting simulation to start...
Scenario: Polynomial1
Started...
AvgDuration: 1526.7ns AvgRate: 654996.8 calls/sec; TotalCalls: 39316798; TotalTime: 60s
Now lets optimize the polynomial calculation by using Horner scheme:
public class Polynomial2 implements IScenario
{
public double foo(double[] a, double x) {
double sum = 0.0;
for (int i = 0; i < a.length; i++) {
final double elem = a[i];
sum = (sum + elem) * x;
}
return sum;
}
@Override
public Object[] generateSample() {
final double[] values = { 1.2, 1.3, 1.4, 2.2, 2.3, 2.4, 3.4, 3.5, 3.6 };
final double[] xs = { 21.2, 31.3, 11.4, 22.2, 32.3, 32.4, 53.4, 33.5, 13.6 };
final Object[] sample = new Object[xs.length];
for(int i = 0; i < xs.length; i++) {
sample[i] = new Object[] {values, xs[i]};
}
return sample;
}
@Override
public Object invoke(Object[] args) {
final double[] values = (double[]) args[0];
final double x = (Double) args[1];
return foo(values, x);
}
}
Running it produces the following output:
C:\projects\performance>java -server -jar dist\app.jar -s sample.perf.Polynomial2
Initializing...
Awaiting simulation to start...
Scenario: Polynomial2
Started...
AvgDuration: 50.7ns AvgRate: 19726869.7 calls/sec; TotalCalls: 591850033; TotalTime: 30s
Initializing...
Awaiting simulation to start...
Scenario: Polynomial2
Started...
AvgDuration: 50.7ns AvgRate: 19726869.7 calls/sec; TotalCalls: 591850033; TotalTime: 30s
I didn't need to run it for 60s just because it reached quite sustained state much earlier. So we can see that using Horner scheme makes our function 30 times faster which determines our choice.
Things to be done:
- the warm-up phase can be made explicit by running for the pre-determined number of iterations before the actual measurement starts - this is a very common approach;
- running multiple scenarios in parallel should be available to allow for the quickest comparison.
- formatting of the results may be improved.
No comments:
Post a Comment