Few days back, I was writing a program for Dijkstra’s algorithm. I used mutable HashMap to store all nodes and edges in that program. I have to store around 400k edges and their weight. There are two types of HashMaps in Scala collection. one is mutable and the other one is immutable. I have to choose one of these. To find which one gives more performance among mutable HashMap and immutable HashMap when you have many insertions.
You can check Dijkstra’s algorithm implementation on my Github project Scala-Experiment. –Diskstra’s solution
To find this, I did microbenchmarking of mutable and immutable HashMaps.
I used JMH(Java Microbenchmarking Harness) through sbt-jmh for benchmarking.
You can check out the code on Github repository: Scala-Experiment.
https://github.com/pgkaila/Scala-Experiments
I used following tools for benchmarking
- Oracle JDK version 1.8.0_66
- Scala version 2.11.7
- sbt version 0.13.9
- sbt-jmh plugin version 0.2.6 shipped with JMH version 1.11.3
If you are new with sbt, check my previous article on getting started with sbt.
Build your first scala project with sbt
JMH
To use JMH via sbt add sbt-jmh plugin by creating file project/plugin.sbt with the following content.
addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.2.6")
Enable plugin by adding enablePlugins(JmhPlugin)
in build.sbt file.
The following code is to benchmark two methods. In both methods, I’m inserting 1000000 entries in HashMap. In one method I used mutable HashMap and in the other one, I used immutable HashMap.
I’m setting output time unit as millisecond and benchmark modes as AverageTime, Throughput and SampleTime using JMH annotations.
More details about diffrent modes and scope is as following.
Modes
Name | Description |
Mode.Throughput | Calculate number of operations in a time unit. |
Mode.AverageTime | Calculate an average running time. |
Mode.SampleTime | Calculate how long does it take for a method to run (including percentiles). |
Mode.SingleShotTime | Just run a method once (useful for cold-testing mode). Or more than once if you have specified a batch size for your iterations (see @Measurement annotation below) – in this case JMH will calculate the batch running time (total time for all invocations in a batch). |
Any set of these modes | You can specify any set of these modes – the test will be run several times (depending on number of requested modes). |
Mode.All | All these modes one after another. |
Scope
Name | Description |
Scope.Thread | This is a default state. An instance will be allocated for each thread running the given test. |
Scope.Benchmark | An instance will be shared across all threads running the same test. Could be used to test multithreaded performance of a state object (or just mark your benchmark with this scope). |
Scope.Group | An instance will be allocated per thread group (see Groups section down below). |
package be.piyush.microbenchmarking
import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._
import scala.collection.{immutable, _}
import scala.util.Random
@State(Scope.Thread)
@BenchmarkMode(Array(Mode.AverageTime, Mode.Throughput, Mode.SampleTime))
@OutputTimeUnit(TimeUnit.MILLISECONDS)
class CollectionBenchmarking {
@Benchmark
@OperationsPerInvocation(2)
def mutableHashMap: Unit ={
val map = new mutable.HashMap[String, Long]()
val r = new Random()
r.setSeed(1000L)
for (i <- 1 to 1000000) {
val number = r.nextLong()
map(number.toString) = number
}
}
@Benchmark
@OperationsPerInvocation(2)
def immutableHashMap: Unit ={
var map = new immutable.HashMap[String, Long]()
val r = new Random()
r.setSeed(1000L)
for (i <- 1 to 1000000) {
val number = r.nextLong()
map = map.+(number.toString -> number)
}
}
}
Run Benchmark
To run benchmarking execute jmh:run -i 10 -wi 10 -f1 -t1
in sbt.
In this command, -i 10
is for 10 iterations, -wi 10
is for 10 warmup iterations, -f1
is for 1 fork and -t1 is to run the benchmark in one thread.
Output
After running this benchmarking, it would generate output like following,
[info] # Run complete. Total time: 00:02:58
[info]
[info] Benchmark Mode Cnt Score Error Units
[info] CollectionBenchmarking.immutableHashMap thrpt 10 0.002 ± 0.001 ops/ms
[info] CollectionBenchmarking.mutableHashMap thrpt 10 0.004 ± 0.001 ops/ms
[info] CollectionBenchmarking.immutableHashMap avgt 10 564.104 ± 214.220 ms/op
[info] CollectionBenchmarking.mutableHashMap avgt 10 216.340 ± 27.696 ms/op
[info] CollectionBenchmarking.immutableHashMap sample 13 545.784 ± 154.552 ms/op
[info] CollectionBenchmarking.mutableHashMap sample 21 292.191 ± 48.599 ms/op
[success] Total time: 182 s, completed 21 Mar, 2016 6:13:21 PM
We can see mutableHashMap has more throughput(ops/ms) compare to immutableHashMap. Also, It is taking less average time and sample time compare to immutable HashMap.
It says clearly that mutable collection is faster than immutable when you want to do more insertions.