Yet another post about performance and microbenchmarks. Beware of the results.
I was wondering about performance of Scala collections and especially with comparison to its Java counterparts. And few days ago, working on my day-to-day work stuff I decided to indulge myself and benchmark it (finally!). I worked on some simple but quite big cache (up to 1 million keys), so I had to decide, which collection to use :) BTW, there aren’t a lot of good benchmarks done in this area (this one is big and good, but old and doesn’t make comparisons with Java). Anyway, let’s go straight to benchmarks!
I want to benchmark 2 cases:
Map[UUID, V] and
Set[UUID], as I was going to need them both. UUID is used as key, which is slightly less boring than String or Int :) (it’s slightly faster than String’s equals/hashCode and slightly more complicated than Int). Because it’s a cache, I’m less concerned about how much time it takes to build a Map, and more about how much it takes to perform a lookup.
What am I going to benchmark? For
Map[UUID, V]: java.util.HashMap, Wrapper for java.util.Map, immutable.Map and mutable.Map. And the same for Set.
A single benchmark consists of 10 lookups, so when you see 100 nanoseconds, it’s on average 10 nanoseconds per lookup.
So, I wrote all the benchmarks, ran and…
I was disappointed to say the least: Scala collections were significantly slower… Wow.
Then I remembered, that I still used Scala 2.12 in my benchmarking project, I was postponing the upgrade to 2.13 for some time and I guess the time has come. Migration wasn’t painful as there isn’t a lot of code. Running again and…
Now it’s interesting… I would even say unexpected.
mutable.Map is now more or less the same as
java.util.HashMap (depending on the size of Map, for smaller maps it’s even faster!). But performance of
immutable.Map is even worse. Wow again.
Let’s compare now between different Scala versions for openjdk-17:
Yes, we clearly see that mutable implementation is improved significantly and immutable implementation degraded.
From these benchmarks we can see that performance of JDK implementation of HashMap (HashSet’s implementation just uses HashMap internally) was degraded a little bit since openjdk-8 (strange).
Wrapped version performance improved from jdk-8 to jdk-17, maybe JVM optimizes more aggressively in the latest versions.
I don’t know why, but performance of immutable version of Map/Set in Scala degraded, which is a bit counter-intuitive, as immutable collection can be optimized better than mutable… But this is how it’s now. If you really need performance for your lookups - use mutable versions (either Java or Scala)…
P.S. My initial benchmarks had some unpredictable spikes (10x) for Scala collections from time to time (like 500 nanos for 100K elements and then 5000 nanos for 10K elements, which doesn’t make much sense). Those spikes were “consistent” in a way, than on each run (which takes ~45 minutes) there were few spikes for different JDK, different sizes, different Scala collections. I decided to remove possibility of strange behavior of Scala collections by replacing it with vanilla Java code which actually helped (sic!) Don’t know what to make of it, except for “I trust Java more” :)
Play with charts here. Source code is on GitHub. Originally posted on Medium. Cover image by pdpics from Pixabay.