为什么边界检查不会被消除? [英] Why the bounds check doesn't get eliminated?

查看:117
本文介绍了为什么边界检查不会被消除?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个简单的基准,以便找出当数组通过按位和计算得到时,是否可以消除边界检查。这基本上就是几乎所有哈希表的作用:它们计算

  h& (table.length  -  1)

作为表的索引,其中 h hashCode 或派生值。 结果显示边界检查不会被消除。



我的基准测试的想法非常简单:计算两个值 i j ,其中两者都保证是有效的数组索引。




  • i 是循环计数器。当它被用作数组索引时,边界检查将被消除。

  • j 计算为 x& ; (table.length - 1),其中 x 是每次迭代时更改的值。当它被用作数组索引时,边界检查不会被消除。



相关部分如下:

  for(int i = 0; i< = table.length-1; ++ i){
x + = result;
final int j = x& (table.length-1);
结果^ = i + table [j];
}

其他实验使用

  result ^ = table [i] + j; 

。时间的差异可能是15%(在我尝试的不同变体中非常一致)。我的问题:




  • 除了绑定检查消除之外还有其他可能的原因吗?

  • 有吗一些复杂的原因我不明白为什么 j 没有绑定检查消除?



答案摘要



MarkoTopolnik的回答表明它更加复杂,并且无法保证取消边界检查,特别是在他的计算机上正常代码比蒙面慢。我想这是因为它允许一些额外的优化,在这种情况下显示实际上是有害的(鉴于当前CPU的复杂性,编译器甚至几乎不知道)。



leventov的答案清楚地表明,数组边界检查在蒙面中完成,并且它的消除使代码与正常一样快。



Donal Fellows points事实上,掩蔽不适用于零长度表,如 x& (0-1)等于 x 。因此,编译器可以做的最好的事情是用零长度检查替换绑定的检查。但这也是恕我直言,因为零长度检查可以轻松地移出循环。



建议的优化



由于等价 a [x& (a.length - 1)] 当且仅当 a.length == 0 时才会抛出,编译器可以执行以下操作:




  • 对于每个数组访问,检查索引是否已通过按位计算。

  • 如果是,检查其中一个操作数是否计算为长度减一。

  • 如果是,请用零长度检查替换边界检查。

  • 让现有的优化会处理它。



这样的优化应该非常简单和便宜,因为它只查看中的父节点 SSA 图表。与许多复杂的优化不同,它永远不会是有害的,因为它只用一个稍微简单的检查替换一个检查;所以没有问题,即使它不能被移出循环也没有问题。



我将把它发布到hotspot-dev邮件列表。



新闻



John Rose提交了 RFE 并且已经有了快速和肮脏补丁

解决方案


  1. 不,这显然是不够的效果智能边界检查消除。

我已经延长了Marko Topolnik的基准:

  @OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(BCElimination.N)
@Warmup(iterations = 5,time = 1)
@Measurement(iterations = 10,time = 1)
@State(Scope.Thread)
@Threads(1)
@Fork(2)
公共类BCElimination {
public static final int N = 1024;
private static final不安全U;
private static final long INT_BASE;
private static final long INT_SCALE;
static {
try {
Field f = Unsafe.class.getDeclaredField(theUnsafe);
f.setAccessible(true);
U =(不安全)f.get(null);
} catch(异常e){
抛出新的IllegalStateException(e);
}

INT_BASE = U.arrayBaseOffset(int [] .class);
INT_SCALE = U.arrayIndexScale(int [] .class);
}

private final int [] table = new int [BCElimination.N];

@Setup public void setUp(){
final Random random = new Random();
for(int i = 0; i< table.length; ++ i)table [i] = random.nextInt();
}

@GenerateMicroBenchmark public int normalIndex(){
int result = 0;
final int [] table = this.table;
int x = 0;
for(int i = 0; i< = table.length-1; ++ i){
x + = i;
final int j = x& (table.length-1);
结果^ = table [i] + j;
}
返回结果;
}

@GenerateMicroBenchmark public int maskedIndex(){
int result = 0;
final int [] table = this.table;
int x = 0;
for(int i = 0; i< = table.length-1; ++ i){
x + = i;
final int j = x& (table.length-1);
结果^ = i + table [j];
}
返回结果;
}

@GenerateMicroBenchmark public int maskedIndexUnsafe(){
int result = 0;
final int [] table = this.table;
long x = 0;
for(int i = 0; i< = table.length-1; ++ i){
x + = i * INT_SCALE;
final long j = x& ((table.length-1)* INT_SCALE);
result ^ = i + U.getInt(table,INT_BASE + j);
}
返回结果;
}
}

结果:

 基准均值误差单位
BCElimination.maskedIndex 1,235 0,004 ns / op
BCElimination.maskedIndexUnsafe 1,092 0,007 ns / op
BCElimination。 normalIndex 1,071 0,008 ns / op



2.第二个问题是针对热点-dev邮件列表而不是StackOverflow,恕我直言。


I wrote a simple benchmark in order to find out if bounds check can be eliminated when the array gets computed via bitwise and. This is basically what nearly all hash tables do: They compute

h & (table.length - 1)

as an index into the table, where h is the hashCode or a derived value. The results shows that the bounds check don't get eliminated.

The idea of my benchmark is pretty simple: Compute two values i and j, where both are guaranteed to be valid array indexes.

  • i is the loop counter. When it gets used as array index, the bounds check gets eliminated.
  • j gets computed as x & (table.length - 1), where x is some value changing on each iteration. When it gets used as array index, the bounds check does not get eliminated.

The relevant part is as follows:

for (int i=0; i<=table.length-1; ++i) {
    x += result;
    final int j = x & (table.length-1);
    result ^= i + table[j];
}

The other experiment uses

    result ^= table[i] + j;

instead. The difference in timing is maybe 15% (pretty consistently across different variants I've tried). My questions:

  • Are there other possible reasons for this besides bound check elimination?
  • Is there some complicated reason I can't see why there's no bound check elimination for j?

A summary of the answers

MarkoTopolnik's answer shows that it's all more complicated and the elimination of the bounds checks is not guaranteed to be a win, especially on his computer the "normal" code is slower than "masked". I guess this is because of it allowing some additional optimization which shows to be actually detrimental in this case (given the complexity of the current CPUs, the compiler hardly even knows for sure).

leventov's answer shows clearly that the array bounds check gets done in "masked" and that it's elimination makes the code as fast as "normal".

Donal Fellows points to the fact, that the masking doesn't work for a zero-length table, as x & (0-1) equals to x. So the best thing the compiler can do is to replace the bound check by a zero-length check. But this is IMHO still worth it, as the zero-length check can be moved out of the loop easily.

Proposed optimization

Because of the the equivalence a[x & (a.length - 1)] throws if and only if a.length == 0, the compiler can do the following:

  • For each array access, check if the index has been computed via a bitwise and.
  • If so, check if either of the operands was computed as length minus one.
  • If so, replace the bounds check by a zero-length check.
  • Let the existing optimizations take care of it.

Such an optimization should be pretty simple and cheap as it only looks at the parent nodes in the SSA graph. Unlike many complex optimizations, it can never be detrimental, as it only replaces one check by a slightly simpler one; so there's no problem, not even if it can't be moved out of the loop.

I'll post this to the hotspot-dev mailing lists.

News

John Rose filed an RFE and there's already a "quick-and-dirty" patch.

解决方案

  1. No, this is evidently an effect of not enough smart bounds check elimination.

I've extended a benchmark by Marko Topolnik:

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(BCElimination.N)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 10, time = 1)
@State(Scope.Thread)
@Threads(1)
@Fork(2)
public class BCElimination {
    public static final int N = 1024;
    private static final Unsafe U;
    private static final long INT_BASE;
    private static final long INT_SCALE;
    static {
        try {
            Field f = Unsafe.class.getDeclaredField("theUnsafe");
            f.setAccessible(true);
            U = (Unsafe) f.get(null);
        } catch (Exception e) {
            throw new IllegalStateException(e);
        }

        INT_BASE = U.arrayBaseOffset(int[].class);
        INT_SCALE = U.arrayIndexScale(int[].class);
    }

    private final int[] table = new int[BCElimination.N];

    @Setup public void setUp() {
        final Random random = new Random();
        for (int i=0; i<table.length; ++i) table[i] = random.nextInt();
    }

    @GenerateMicroBenchmark public int normalIndex() {
        int result = 0;
        final int[] table = this.table;
        int x = 0;
        for (int i=0; i<=table.length-1; ++i) {
            x += i;
            final int j = x & (table.length-1);
            result ^= table[i] + j;
        }
        return result;
    }

    @GenerateMicroBenchmark public int maskedIndex() {
        int result = 0;
        final int[] table = this.table;
        int x = 0;
        for (int i=0; i<=table.length-1; ++i) {
            x += i;
            final int j = x & (table.length-1);
            result ^= i + table[j];
        }
        return result;
    }

    @GenerateMicroBenchmark public int maskedIndexUnsafe() {
        int result = 0;
        final int[] table = this.table;
        long x = 0;
        for (int i=0; i<=table.length-1; ++i) {
            x += i * INT_SCALE;
            final long j = x & ((table.length-1) * INT_SCALE);
            result ^= i + U.getInt(table, INT_BASE + j);
        }
        return result;
    }
}

Results:

Benchmark                                Mean   Mean error    Units
BCElimination.maskedIndex               1,235        0,004    ns/op
BCElimination.maskedIndexUnsafe         1,092        0,007    ns/op
BCElimination.normalIndex               1,071        0,008    ns/op


2. The second question is for hotspot-dev mailing lists rather than StackOverflow, IMHO.

这篇关于为什么边界检查不会被消除?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆