Benchmarking String.regionMatches

Yet another post about performance and microbenchmarks. Beware of the results.

There is a method in String class, called regionMatches. Basically, it tests if the string has another string as a substring at the specific place:

"one_two_three".regionMatches(4, "two", 0, 3) // true

By some reason I wondered about its performance comparing to other methods: startsWith and endsWith. And just for fun adding another comparison with the functional equivalent:

"one_two_three".substring(4, 3).equals("two") // true

Obviously, the regionMatches should outperform the latter, just because of the substring (which has extra memory allocation). I was actually shocked by the results:

regionMatches             avgt    3  26.565 ±   1.306  ns/op
substringEquals           avgt    3  15.663 ±  11.616  ns/op

WHAT?! This can’t be right! However, no matter how many times or how long I ran this benchmark, results were more or less the same: substring+equals outperformed regionMatches… Then I wrote a little bit more comprehensive test (however, it has only one input data — a string of 51 characters).

Benchmarks

Ok, now we need to go deeper, so let me show all the different benchmarks I did.

Test data: a search string uri of 51 characters, needles at the beginning, in the middle and at the end of the string.

val uri = "/site/section/blog/2022-04-17/the-title-of-the-post"
val begin = "/site/section/blog/2"
val middle = "/blog/2022-04-17/the"
val end = "he-title-of-the-post"

Testing at the beginning of the string:

@Benchmark
def begin_startsWith(): Boolean =
  uri.startsWith(begin)

@Benchmark
def begin_regionMatches(): Boolean =
  uri.regionMatches(0, begin, 0, begin.length)

@Benchmark
def begin_substringEquals(): Boolean =
  uri.substring(0, begin.length) == begin

At the end:

@Benchmark
def end_endsWith(): Boolean =
  uri.endsWith(end)

@Benchmark
def end_regionMatches(): Boolean =
  uri.regionMatches(endIndex, end, 0, end.length)

@Benchmark
def end_substringEquals(): Boolean =
  uri.substring(endIndex) == end

In the middle I did 2 tests, one to find exact substring (val middle = "/blog/2022-04-17/the"), another is to find part of the substring (val middle2Sides = s"-$middle-"). The idea of the second case is to make 2 substrings, not one.

@Benchmark
def middle_regionMatches(): Boolean =
  uri.regionMatches(middleIndex, middle, 0, middle.length)

@Benchmark
def middle_substringEquals(): Boolean =
  uri.substring(middleIndex, middleIndex + middle.length) == middle

@Benchmark
def middle2Sides_regionMatches(): Boolean =
  uri.regionMatches(middleIndex, middle2Sides, 1, middle2Sides.length - 2)

@Benchmark
def middle2Sides_substringEquals(): Boolean =
  uri.substring(middleIndex, middleIndex + middle.length) ==
    middle2Sides.substring(1, middle2Sides.length - 1)

JVM

After a while I decided to run benchmarks against different JVMs, because I couldn’t explain what’s happening there (as I showed in the beginning of the post, substring+equals won!). And I ran all these benchmarks on 3 JVMs: openjdk-8, openjdk-11 and openjdk-17.

Interesting, right? Looks like JDK8 is faster or the same for all cases, and then performance deteriorating, and in openjdk-17 regionMatches significantly slower than substring+equals. I don’t know how to explain it apart from some internal JVM optimizations for equals method that aren’t applied for regionMatches.

I did a test for smaller needle, and it shows that there is a some sweet spot (length), when performance of regionMatches becomes better than substring+equals, but it’s in low single digits (on my computer).

My guess, is that this loop in regionMatches isn’t optimized:

while (len-- > 0) {
    if (tv[toffset++] != ov[ooffset++]) {
        return false;
    }
}

equals method uses internal method StringLatin1.equals, which also does the loop, but maybe length check before removes bound checks:

@IntrinsicCandidate
public static boolean equals(byte[] value, byte[] other) {
    if (value.length == other.length) {
        for (int i = 0; i < value.length; i++) {
            if (value[i] != other[i]) {
                return false;
            }
        }
        return true;
    }
    return false;
}

Conclusion

It seems like a bug in JVM, regionMatches isn’t supposed to behave this way. And sometimes it’s worth benchmarking something basic: our assumptions aren’t always correct. Also, I’d like to mention a couple of blog posts about benchmarking around this area: one and two — sadly, all were done for JDK8 and didn’t reveal anything strange.

Play with charts here. Source code is on GitHub. Originally posted on Medium. Cover image by jarmoluk from Pixabay.