[benchmarks main page] [whole program benchmarks] [expression benchmarks] [Clojure version history] [hardware and software used]
Clojure expression benchmarks
Graphs of measurements are here.
JDK version notes: 64-bit Oracle JDK 1.7.0_80 is downloaded from
Oracle's web site. 64-bit Oracle JDK 1.7.0_91 is a version of
OpenJDK installed via Ubuntu's apt-get. Here is the output of
'java-version' for it:
java version "1.7.0_91"
OpenJDK Runtime Environment (IcedTea 2.6.3) (7u91-2.6.3-0ubuntu0.14.04.1)
OpenJDK 64-bit Server VM (build 24.91-b01, mixed mode)
For these benchmarks I used Hugo Duncan's criterium
library to perform the measurements, version 0.4.2 (the latest released as of Nov 2013). The
basic idea of criterium is to do the following for each expression:
- First run the
expression many times to cause the JVM's JIT compiler to
optimize the code for that expression. The expression is
evaluated as many times as required until the total time is at least 10
seconds.
- Run
the expression many times to determine how many times it must be
evaluated to take about 1 second. The result is a number of
executions "n-exec".
- Run the expression n-exec times (taking about 1 second), and then do that 29 more times, taking a total of about 30 seconds.
The final measurement for 1 "run" is the average time it took each expression evaluation
to complete in step 3 above, which is after all of the warmup is complete.
In the graphs, each data point is the result of doing 3 runs. Only the fastest
value among those 3 runs is charted. The 3 runs were done
relatively far apart from each other in time, in hope of reducing any
transient conditions that can slow down the results, e.g. other
processes running on the machine. The machine was not being used
for any other purposes during these performance measurements.
Each expression begins with a "context" given in square brackets. For example:
[arr (object-array (range 1000000))] (reduce + 0 arr)
The results for such a benchmark are obtained by binding the symbol arr to the value of (object-array (range 1000000)), and then in a lexical context where arr has that value, the expression (reduce + 0 arr) is repeatedly evaluated and measured. The evaluation of (object-array (range 1000000)) is done only once, before any of steps 1 through 3 above are done, and thus before any measurements are performed.
FAQs
I will update this section if any questions actually become frequently
asked. This is my attempt to answer what I expect might become
FAQs.
I can just take these results and use them to predict the performance I will get, right?
The hardware and software I used to measure these results is described here.
Changing any of those things could change the results. If the
results with a particular combination of hardware, OS, JVM, and Clojure
version matter a lot to you, you should be doing your own benchmarks,
preferably with your own application, a profiler, and, if you are
really serious about tracking down performance issues in your
application, by instrumenting your code with metrics, e.g. Coda Hale's Metrics Core library.
For the 3 runs you did to take the minimum for one point on the graph, why would those 3 times differ?
Even when I was keeping all of the hardware, OS, JVM version, and
Clojure version exactly the same between the 3 runs I mention above for
each data point, my results varied from run to run. Why?
Minor variations of under 5% are likely due to differences in the state
of hardware and software caches at the beginning of the run, or to
process scheduling differences and what other processes were
running. I was not using the computer that did the measurements
for any other purposes during the measurements, but I did not monitor
for occasional events like nightly or weekly log file cleanup tasks,
software auto-checking for updates, network activity, etc. I
don't expect any of those to have a noticable effect on the graphs,
because I made the 3 runs far apart in time -- at least a day, often
several days apart. It would take a big coincidence for one of
those causes to affect the results on all 3 of those runs.
For some results I saw more than 5% variation between the 3 runs.
I don't yet know why that would be. I can tell you for certain
that it was not due to changing the hardware, OS, JVM, or Clojure
version (for a single data point in the graph).
But the point of benchmarks is to help me draw conclusions about how different pieces of code perform, right?
Sure, just be careful not to extrapolate too far from what the results
are actually saying. It would be very surprising if an expression
I give measurements for
takes 1/10 or 10x the time on similar hardware and software, and would
bear close examination to see if a mistake had been made in measurement
somewhere.
These measurements were made while running the same expression
repeatedly. The JVM should have already performed JIT
optimization on this code before the measurements began. Note
that some JIT compilers are able to use profiling information about
polymorphic method calls and if they are called on only one type during
warmup, they can create optimized versions that work only for that type
and would be much slower if called on a different type. See here
a mention of this behavior from Rich Hickey. The only article I have found about this topic is by Richard Warburton here. If you know of any other articles that give measurements demonstrating this effect -- please email me and I can add them here.
What about earlier versions of Clojure?
There are no JAR files for Clojure versions 1.3-alpha1 through
1.3-alpha4 in the Maven repos that I could find, and I didn't bother to
build my own JAR files for them for the purposes of these
results. If someone could tell me a good source for those JARs,
or exactly which git commit in the Clojure Github repository those
versions correspond to, I would consider adding them.
The criterium measurement library does not compile with Clojure
versions 1.2 or 1.2.1. I'm sure it could be modified to do so,
but that was not a high priority for me in creating these
measurements. Contact me if you are interested in helping get
criterium working with those versions of Clojure.
What about measuring other expressions?
Let me know what you'd like to see, and I will consider adding
it. I started with a list of expressions that are the same as, or
very similar to, the ClojureScript expressions measured in the ClojureScript benchmarks.