[benchmarks main page] [whole program benchmarks] [expression benchmarks] [Clojure version history] [hardware and software used]

Clojure expression benchmarks

Graphs of measurements are here.

JDK version notes: 64-bit Oracle JDK 1.7.0_80 is downloaded from Oracle's web site. 64-bit Oracle JDK 1.7.0_91 is a version of OpenJDK installed via Ubuntu's apt-get. Here is the output of 'java-version' for it:

java version "1.7.0_91"

OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.14.04.1)

OpenJDK 64-bit Server VM (build 24.91-b01, mixed mode)

For these benchmarks I used Hugo Duncan's criterium library to perform the measurements, version 0.4.2 (the latest released as of Nov 2013). The basic idea of criterium is to do the following for each expression:

First run the expression many times to cause the JVM's JIT compiler to optimize the code for that expression. The expression is evaluated as many times as required until the total time is at least 10 seconds.
Run the expression many times to determine how many times it must be evaluated to take about 1 second. The result is a number of executions "n-exec".
Run the expression n-exec times (taking about 1 second), and then do that 29 more times, taking a total of about 30 seconds.

The final measurement for 1 "run" is the average time it took each expression evaluation to complete in step 3 above, which is after all of the warmup is complete.

In the graphs, each data point is the result of doing 3 runs. Only the fastest value among those 3 runs is charted. The 3 runs were done relatively far apart from each other in time, in hope of reducing any transient conditions that can slow down the results, e.g. other processes running on the machine. The machine was not being used for any other purposes during these performance measurements.

Each expression begins with a "context" given in square brackets. For example:

[arr (object-array (range 1000000))] (reduce + 0 arr)

The results for such a benchmark are obtained by binding the symbol arr to the value of (object-array (range 1000000)), and then in a lexical context where arr has that value, the expression (reduce + 0 arr) is repeatedly evaluated and measured. The evaluation of (object-array (range 1000000)) is done only once, before any of steps 1 through 3 above are done, and thus before any measurements are performed.

FAQs

I will update this section if any questions actually become frequently asked. This is my attempt to answer what I expect might become FAQs.

I can just take these results and use them to predict the performance I will get, right?

The hardware and software I used to measure these results is described here. Changing any of those things could change the results. If the results with a particular combination of hardware, OS, JVM, and Clojure version matter a lot to you, you should be doing your own benchmarks, preferably with your own application, a profiler, and, if you are really serious about tracking down performance issues in your application, by instrumenting your code with metrics, e.g. Coda Hale's Metrics Core library.

For the 3 runs you did to take the minimum for one point on the graph, why would those 3 times differ?

Even when I was keeping all of the hardware, OS, JVM version, and Clojure version exactly the same between the 3 runs I mention above for each data point, my results varied from run to run. Why? Minor variations of under 5% are likely due to differences in the state of hardware and software caches at the beginning of the run, or to process scheduling differences and what other processes were running. I was not using the computer that did the measurements for any other purposes during the measurements, but I did not monitor for occasional events like nightly or weekly log file cleanup tasks, software auto-checking for updates, network activity, etc. I don't expect any of those to have a noticable effect on the graphs, because I made the 3 runs far apart in time -- at least a day, often several days apart. It would take a big coincidence for one of those causes to affect the results on all 3 of those runs.

For some results I saw more than 5% variation between the 3 runs. I don't yet know why that would be. I can tell you for certain that it was not due to changing the hardware, OS, JVM, or Clojure version (for a single data point in the graph).

But the point of benchmarks is to help me draw conclusions about how different pieces of code perform, right?

Sure, just be careful not to extrapolate too far from what the results are actually saying. It would be very surprising if an expression I give measurements for takes 1/10 or 10x the time on similar hardware and software, and would bear close examination to see if a mistake had been made in measurement somewhere.

These measurements were made while running the same expression repeatedly. The JVM should have already performed JIT optimization on this code before the measurements began. Note that some JIT compilers are able to use profiling information about polymorphic method calls and if they are called on only one type during warmup, they can create optimized versions that work only for that type and would be much slower if called on a different type. See here a mention of this behavior from Rich Hickey. The only article I have found about this topic is by Richard Warburton here. If you know of any other articles that give measurements demonstrating this effect -- please email me and I can add them here.

What about earlier versions of Clojure?

There are no JAR files for Clojure versions 1.3-alpha1 through 1.3-alpha4 in the Maven repos that I could find, and I didn't bother to build my own JAR files for them for the purposes of these results. If someone could tell me a good source for those JARs, or exactly which git commit in the Clojure Github repository those versions correspond to, I would consider adding them.

The criterium measurement library does not compile with Clojure versions 1.2 or 1.2.1. I'm sure it could be modified to do so, but that was not a high priority for me in creating these measurements. Contact me if you are interested in helping get criterium working with those versions of Clojure.

What about measuring other expressions?

Let me know what you'd like to see, and I will consider adding it. I started with a list of expressions that are the same as, or very similar to, the ClojureScript expressions measured in the ClojureScript benchmarks.