在写入变量之前检查变量是否具有特定值是否明智? [英] Is it a sensible optimization to check whether a variable holds a specific value before writing that value?

查看:77
本文介绍了在写入变量之前检查变量是否具有特定值是否明智?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

if (var != X)
  var = X;

这是否明智?编译器会始终优化if语句吗?是否有任何可从if语句中受益的用例?

var是易失性变量怎么办?

我对C ++和Java答案都感兴趣,因为volatile变量在两种语言中都有不同的语义. Java的JIT编译也可以有所作为.

if语句引入了分支和附加读取,如果我们总是用X覆盖var,则不会发生分支读取和附加读取,所以这很不好.另一方面,如果var == X,则使用此优化,我们仅执行读取操作,而不执行写入操作,这可能会对高速缓存产生一些影响.显然,这里需要权衡取舍.我想知道实际情况.有人对此做过任何测试吗?

我最感兴趣的是在多处理器环境中的外观.在琐碎的情况下,首先检查变量似乎没有多大意义.但是,当必须在处理器/内核之间保持高速缓存一致性时,额外的检查实际上可能是有益的.我只是想知道它会产生多大的影响?同样,处理器不应该自己进行这种优化吗?如果var == X再次为其分配值,则X不应缓存"高速缓存.但是我们可以依靠吗?

解决方案

是的,在某些情况下肯定是明智的,并且正如您所建议的,volatile变量是其中一种情况-即使对于单线程访问!

从硬件和编译器/JIT的角度来看,易失性写入都很昂贵.在硬件级别上,这些写操作可能比普通写操作贵10到100倍,因为必须清空写缓冲区(在x86上,具体信息因平台而异).在编译器/JIT级别,易失性写会阻止许多常见的优化.

但是,投机只能带给您如此远的好处-证明始终在基准中.这是一个微基准测试,可尝试您的两种策略.基本思想是将值从一个数组复制到另一个数组(几乎是System.arraycopy),具有两个变体-一个变体无条件复制,另一个变体首先检查值是否不同.

以下是简单的非易失性案例的复制例程(完整资源此处 ):

        // no check
        for (int i=0; i < ARRAY_LENGTH; i++) {
            target[i] = source[i];
        }

        // check, then set if unequal
        for (int i=0; i < ARRAY_LENGTH; i++) {
            int x = source[i];
            if (target[i] != x) {
                target[i] = x;
            }
        }

使用上述代码并使用 Caliper 复制数组长度为1000的结果作为我的微基准测试工具,是:

    benchmark arrayType    ns linear runtime
  CopyNoCheck      SAME   470 =
  CopyNoCheck DIFFERENT   460 =
    CopyCheck      SAME  1378 ===
    CopyCheck DIFFERENT  1856 ====

这还包括每次运行大约150ns的开销,以每次重置目标阵列.跳过检查要快得多-每个元素大约0.47 ns(或在除去设置开销后,每个元素大约0.32 ns,所以我的盒子上几乎只有1个周期).

当阵列相同时,检查速度要慢3倍左右,而在阵列不同时,速度要慢4倍.鉴于支票被完美预测,我对支票有多糟糕感到惊讶.我怀疑罪魁祸首很大程度上是JIT-循环主体更加复杂,展开的次数可能更少,并且其他优化可能不适用.

让我们切换到易失性案例.在这里,我将AtomicIntegerArray用作易失性元素的数组,因为Java没有任何带有易失性元素的本机数组类型.在内部,此类仅使用sun.misc.Unsafe直接写入数组,从而允许进行易失性写入.生成的程序集与易失性方面(以及可能的范围检查消除,在AIA情况下可能无效)不同,基本上与常规阵列访问类似.

代码如下:

        // no check
        for (int i=0; i < ARRAY_LENGTH; i++) {
            target.set(i, source[i]);
        }

        // check, then set if unequal
        for (int i=0; i < ARRAY_LENGTH; i++) {
            int x = source[i];
            if (target.get(i) != x) {
                target.set(i, x);
            }
        }

结果如下:

arrayType     benchmark    us linear runtime
     SAME   CopyCheckAI  2.85 =======
     SAME CopyNoCheckAI 10.21 ===========================
DIFFERENT   CopyCheckAI 11.33 ==============================
DIFFERENT CopyNoCheckAI 11.19 =============================

桌子已经翻了.首先检查比通常的方法快约3.5倍.总体而言,一切都慢得多-在检查情况下,每个循环我们要付出约3 ns的代价,而在最坏的情况下,我们要付出约10 ns的代价(上面的时间已经过去,涵盖了整个1000个元素数组的副本).易失性写入确实更昂贵. DIFFERENT情况下包含大约1 ns的开销,以在每次迭代时重置阵列(这就是为什么对于DIFFERENT而言,即使简单也要稍微慢一点)的原因.我怀疑检查"情况下的很多开销实际上是边界检查.

这都是单线程的.如果您实际上在一个volatile上存在跨核心争用,那么对于简单方法而言,结果将是非常糟糕的得多,并且与上述情况一样好(缓存行仅处于共享状态-否)所需的一致性流量.

我也只测试了每个元素相等"与每个元素不同"的极端.这意味着检查"算法中的分支总是可以完美预测的.如果混合使用相等和不同,那么您将不会仅获得SAME和DIFFERENT情况下时间的加权组合-由于预测错误(在硬件级别,甚至在JIT级别),您的情况会更糟. ,无法再针对始终采用的分支进行优化).

因此,即使是易失性,它是否明智也取决于特定的上下文-相等和不相等的值的混合,周围的代码等等.在单线程情况下,通常我不会仅针对volatile这样做,除非我怀疑大量的集合是多余的.但是,在高度多线程的结构中,先读取然后进行易失性写(或其他昂贵的操作,如CAS)是一种最佳实践,并且您会看到高质量的代码,例如java.util.concurrent结构.

if (var != X)
  var = X;

Is it sensible or not? Will the compiler always optimize-out the if statement? Are there any use cases that would benefit from the if statement?

What if var is a volatile variable?

I'm interested in both C++ and Java answers as the volatile variables have different semantics in both of the languages. Also the Java's JIT-compiling can make a difference.

The if statement introduces branching and additional read that wouldn't happen if we always overwrote var with X, so it's bad. On the other hand, if var == X then using this optimization we perform only a read and we do not perform a write, which could have some effects on cache. Clearly, there are some trade-offs here. I'd like to know how it looks like in practice. Has anyone done any testing on this?

EDIT:

I'm mostly interested about how it looks like in a multi-processor environment. In a trivial situation there doesn't seem to be much sense in checking the variable first. But when cache coherency has to be kept between processors/cores the extra check might be actually beneficial. I just wonder how big impact can it have? Also shouldn't the processor do such an optimization itself? If var == X assigning it once more value X should not 'dirt-up' the cache. But can we rely on this?

解决方案

Yes, there are definitely cases where this is sensible, and as you suggest, volatile variables are one of those cases - even for single threaded access!

Volatile writes are expensive, both from a hardware and a compiler/JIT perspective. At the hardware level, these writes might be 10x-100x more expensive than a normal write, since write buffers have to be flushed (on x86, the details will vary by platform). At the compiler/JIT level, volatile writes inhibit many common optimizations.

Speculation, however, can only get you so far - the proof is always in the benchmarking. Here's a microbenchmark that tries your two strategies. The basic idea is to copy values from one array to another (pretty much System.arraycopy), with two variants - one which copies unconditionally, and one that checks to see if the values are different first.

Here are the copy routines for the simple, non-volatile case (full source here):

        // no check
        for (int i=0; i < ARRAY_LENGTH; i++) {
            target[i] = source[i];
        }

        // check, then set if unequal
        for (int i=0; i < ARRAY_LENGTH; i++) {
            int x = source[i];
            if (target[i] != x) {
                target[i] = x;
            }
        }

The results using the above code to copy an array length of 1000, using Caliper as my microbenchmark harness, are:

    benchmark arrayType    ns linear runtime
  CopyNoCheck      SAME   470 =
  CopyNoCheck DIFFERENT   460 =
    CopyCheck      SAME  1378 ===
    CopyCheck DIFFERENT  1856 ====

This also includes about 150ns of overhead per run to reset the target array each time. Skipping the check is much faster - about 0.47 ns per element (or around 0.32 ns per element after we remove the setup overhead, so pretty much exactly 1 cycle on my box).

Checking is about 3x slower when the arrays are the same, and 4x slower then they are different. I'm surprised at how bad the check is, given that it is perfectly predicted. I suspect that the culprit is largely the JIT - with a much more complex loop body, it may be unrolled fewer times, and other optimizations may not apply.

Let's switch to the volatile case. Here, I've used AtomicIntegerArray as my arrays of volatile elements, since Java doesn't have any native array types with volatile elements. Internally, this class is just writing straight through to the array using sun.misc.Unsafe, which allows volatile writes. The assembly generated is substantially similar to normal array access, other than the volatile aspect (and possibly range check elimination, which may not be effective in the AIA case).

Here's the code:

        // no check
        for (int i=0; i < ARRAY_LENGTH; i++) {
            target.set(i, source[i]);
        }

        // check, then set if unequal
        for (int i=0; i < ARRAY_LENGTH; i++) {
            int x = source[i];
            if (target.get(i) != x) {
                target.set(i, x);
            }
        }

And here are the results:

arrayType     benchmark    us linear runtime
     SAME   CopyCheckAI  2.85 =======
     SAME CopyNoCheckAI 10.21 ===========================
DIFFERENT   CopyCheckAI 11.33 ==============================
DIFFERENT CopyNoCheckAI 11.19 =============================

The tables have turned. Checking first is ~3.5x faster than the usual method. Everything is much slower overall - in the check case, we are paying ~3 ns per loop, and in the worst cases ~10 ns (the times above are in us, and cover the copy of the whole 1000 element array). Volatile writes really are more expensive. There is about 1 ns of overheaded included in the DIFFERENT case to reset the array on each iteration (which is why even the simple is slightly slower for DIFFERENT). I suspect a lot of the overhead in the "check" case is actually bounds checking.

This is all single threaded. If you actual had cross-core contention over a volatile, the results would be much, much worse for the simple method, and just about as good as the above for the check case (the cache line would just sit in the shared state - no coherency traffic needed).

I've also only tested the extremes of "every element equal" vs "every element different". This means the branch in the "check" algorithm is always perfectly predicted. If you had a mix of equal and different, you wouldn't get just a weighted combination of the times for the SAME and DIFFERENT cases - you do worse, due to misprediction (both at the hardware level, and perhaps also at the JIT level, which can no longer optimize for the always-taken branch).

So whether it is sensible, even for volatile, depends on the specific context - the mix of equal and unequal values, the surrounding code and so on. I'd usually not do it for volatile alone in a single-threaded scenario, unless I suspected a large number of sets are redundant. In heavily multi-threaded structures, however, reading and then doing a volatile write (or other expensive operation, like a CAS) is a best-practice and you'll see it quality code such as java.util.concurrent structures.

这篇关于在写入变量之前检查变量是否具有特定值是否明智?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆