奇怪的分支表现 [英] Strange branching performance

查看:123
本文介绍了奇怪的分支表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我字符。结果显示当分支概率达到约0.20时突然下降(性能增加)。



更多指令。这是一条指令,但没有分支,因此没有分支错误预测惩罚。



这解释了20-80%范围内非常恒定的时间。低于20%的增长显然是由于分支误预测。



因此看起来JIT未能使用大约0.04而不是0.18的正确阈值。




The results of my benchmark shows that the performance is worst when the branch has 15% (or 85%) percent probability rather than 50%.

Any explanation?

The code is too long but the relevant parts are here:

private int diff(char c) {
    return TABLE[(145538857 * c) >>> 27] - c;
}

@Benchmark int timeBranching(int reps) {
    int result = 0;
    while (reps-->0) {
        for (final char c : queries) {
            if (diff(c) == 0) {
                ++result;
            }
        }
    }
    return result;
}

It counts the number of BREAKING_WHITESPACE characters in the given string. The results shows a sudden time drop (performance increase) when the branching probability reaches about 0.20.

More details about the drop. Varying the seed shows more strange things happening. Note that the black line denoting the minimum and maximum values is very short except when close to the cliff.

解决方案

It looks like a minor JIT bug. For a small branch probability, it generates something like the following, just much more complicated due to unrolling (I'm simplifying a lot):

movzwl 0x16(%r8,%r10,2),%r9d

Get the char: int r9d = queries[r10]

imul   $0x8acbf29,%r9d,%ebx

Multiply: ebx = 145538857 * r9d

shr    $0x1b,%ebx

Shift: ebx >>>= 27

cmp    %edx,%ebx
jae    somewhere

Check bounds: if (ebx > edx || ebx < 0) goto somewhere (and throw there an IndexOutOfBoundsException.

cmp    %r9d,%ebx
jne    back

Jump back to loop beginning if not equal: if (r9d != ebx) goto back

inc    %eax

Increment the result: eax++

jne    back

Simply goto back

We can see one smart and one dumb thing:

  • The bound check gets done optimally with a single unsigned comparison.
  • It's completely redundant as x >>> 27 is always positive and less than the table length (32).

For a branch probability above 20%, these three instructions

cmp    %r9d,%ebx
jne    back
inc    %eax

get replaced by something like

mov    %eax,%ecx   // ecx = result
inc    %ecx        // ecx++
cmp    %r9d,%ebx   // if (r9b == index)
cmoveq %ecx,%ebx   //    result = ecx

using a conditional move instruction. This is one instruction more, but no branching and thus no branch misprediction penalty.

This explains the very constant time in the 20-80% range. The ramping below 20% is clearly due to branch mispredictions.

So it looks like a JIT failure to use the proper threshold of about 0.04 rather than 0.18.

这篇关于奇怪的分支表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆