为什么 if (variable1 % variable2 == 0) 效率低下? [英] Why is if (variable1 % variable2 == 0) inefficient?

查看:61
本文介绍了为什么 if (variable1 % variable2 == 0) 效率低下?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Java 新手,昨晚运行了一些代码,这让我很困扰.我正在构建一个简单的程序来显示 for 循环中的每个 X 输出,当我使用模数作为 variable % variable vs variable % 5000 时,我注意到性能大幅下降> 或诸如此类.有人可以向我解释为什么会这样以及是什么原因造成的吗?所以我可以变得更好...

I am new to java, and was running some code last night, and this really bothered me. I was building a simple program to display every X outputs in a for loop, and I noticed a MASSIVE decrease in performance, when I used modulus as variable % variable vs variable % 5000 or whatnot. Can someone explain to me why this is and what is causing it? So I can be better...

这是高效"的代码(对不起,如果我的语法有点错误,我现在不在有代码的计算机上)

Here is the "efficient" code (sorry if I get a bit of syntax wrong I'm not on the computer with the code right now)

long startNum = 0;
long stopNum = 1000000000L;

for (long i = startNum; i <= stopNum; i++){
    if (i % 50000 == 0) {
        System.out.println(i);
    }
}

这里是低效代码"

long startNum = 0;
long stopNum = 1000000000L;
long progressCheck = 50000;

for (long i = startNum; i <= stopNum; i++){
    if (i % progressCheck == 0) {
        System.out.println(i);
    }
}

请注意,我有一个日期变量来测量差异,一旦它变得足够长,第一个需要 50 毫秒,而另一个需要 12 秒或类似的时间.如果您的 PC 比我的效率更高或其他什么原因,您可能需要增加 stopNum 或减少 progressCheck.

Mind you I had a date variable to measure the differences, and once it became long enough, the first one took 50ms while the other took 12 seconds or something like that. You may have to increase the stopNum or decrease the progressCheck if your PC is more efficient than mine or what not.

我在网上寻找这个问题,但找不到答案,也许我只是问得不对.

I looked for this question around the web, but I can't find an answer, maybe I'm just not asking it right.

没想到我的问题这么受欢迎,我很感激所有的答案.我确实对所用时间的每一半进行了基准测试,而低效代码所用的时间要长得多,1/4 秒 vs 10 秒.授予他们使用 println,但他们都做相同的数量,所以我不认为这会扭曲它,特别是因为差异是可重复的.至于答案,由于我是 Java 新手,我现在让投票决定哪个答案最好.我会尽量在周三之前挑选一个.

I did not expect my question to be so popular, I appreciate all the answers. I did perform a benchmark on each half in time taken, and the inefficient code took considerably longer, 1/4 of a second vs 10 seconds give or take. Granted they are using println, but they are both doing the same amount, so I wouldn't imagine that would skew it much, especially since the discrepancy is repeatable. As for the answers, since I am new to Java, I will let votes decide for now which answer is best. I will try to pick one by Wednesday.

今晚我将进行另一次测试,它只是增加一个变量而不是模数,当它到达 progressCheck 时,它将执行一个,然后将该变量重置为 0.作为第三个选项.

I am going to make another test tonight, where instead of modulus, it just increments a variable, and when it reaches progressCheck, it will perform one, and then reset that variable to 0. for a 3rd option.

EDIT3.5:

我使用了这段代码,下面我将展示我的结果..谢谢大家的大力帮助!我还尝试将 long 的 short 值与 0 进行比较,因此我所有的新检查都发生了65536"次,使其重复次数相等.

I used this code, and below I'll show my results.. Thank you ALL for the wonderful help! I also tried comparing the short value of the long to 0, so all my new checks happen ever "65536" times making it equal in repeats.

public class Main {


    public static void main(String[] args) {

        long startNum = 0;
        long stopNum = 1000000000L;
        long progressCheck = 65536;
        final long finalProgressCheck = 50000;
        long date;

        // using a fixed value
        date = System.currentTimeMillis();
        for (long i = startNum; i <= stopNum; i++) {
            if (i % 65536 == 0) {
                System.out.println(i);
            }
        }
        long final1 = System.currentTimeMillis() - date;
        date = System.currentTimeMillis();
        //using a variable
        for (long i = startNum; i <= stopNum; i++) {
            if (i % progressCheck == 0) {
                System.out.println(i);
            }
        }
        long final2 = System.currentTimeMillis() - date;
        date = System.currentTimeMillis();

        // using a final declared variable
        for (long i = startNum; i <= stopNum; i++) {
            if (i % finalProgressCheck == 0) {
                System.out.println(i);
            }
        }
        long final3 = System.currentTimeMillis() - date;
        date = System.currentTimeMillis();
        // using increments to determine progressCheck
        int increment = 0;
        for (long i = startNum; i <= stopNum; i++) {
            if (increment == 65536) {
                System.out.println(i);
                increment = 0;
            }
            increment++;

        }

        //using a short conversion
        long final4 = System.currentTimeMillis() - date;
        date = System.currentTimeMillis();
        for (long i = startNum; i <= stopNum; i++) {
            if ((short)i == 0) {
                System.out.println(i);
            }
        }
        long final5 = System.currentTimeMillis() - date;

                System.out.println(
                "\nfixed = " + final1 + " ms " + "\nvariable = " + final2 + " ms " + "\nfinal variable = " + final3 + " ms " + "\nincrement = " + final4 + " ms" + "\nShort Conversion = " + final5 + " ms");
    }
}

结果:

  • 固定 = 874 毫秒(通常约为 1000 毫秒,但由于它是 2 的幂而更快)
  • 变量 = 8590 毫秒
  • 最终变量 = 1944 毫秒(使用 50000 时约为 1000 毫秒)
  • 增量 = 1904 毫秒
  • 短转换 = 679 毫秒

不足为奇,由于缺少除法,短转换比快速"方式快 23%.注意到这一点很有趣.如果您需要每 256 次(或大约在那里)显示或比较某些内容,您可以这样做,并使用

Not surprising enough, due to a lack of division, the Short Conversion was 23% faster than the "fast" way. This is interesting to note. If you need to show or compare something every 256 times (or about there) you can do this, and use

if ((byte)integer == 0) {'Perform progress check code here'}

最后一个有趣的注意事项,在最终声明的变量"上使用 65536(不是一个漂亮的数字)的模数是固定值速度(慢)的一半.之前它以接近相同的速度进行基准测试.

ONE FINAL INTERESTING NOTE, using modulus on the "Final declared Variable" with 65536 (not a pretty number) was half the speed of (slower) than the fixed value. Where before it was benchmarking near the same speed.

推荐答案

您正在测量 OSR(堆栈上替换) 存根.

OSR 存根 是编译方法的特殊版本,专门用于在方法运行时将执行从解释模式转移到编译代码.

OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.

OSR 存根不像常规方法那样优化,因为它们需要与解释框架兼容的框架布局.我已经在以下答案中展示了这一点:123.

OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.

这里也发生了类似的事情.当低效代码"运行一个长循环时,该方法是专门为循环内部的堆栈替换而编译的.状态从解释的帧转移到 OSR 编译的方法,这个状态包括 progressCheck 局部变量.此时 JIT 无法用常量替换变量,因此无法应用某些优化,例如 强度降低.

A similar thing happens here, too. While "inefficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.

特别是这意味着 JIT 不会用乘法代替整数除法.(参见 为什么 GCC 使用在实现整数除法时乘以一个奇怪的数字? 对于来自提前编译器的 asm 技巧,当值是内联/常量传播后的编译时常量时,如果启用了这些优化.% 表达式中的整数文字也被 gcc -O0 优化,类似于这里它甚至在 OSR 存根中也被 JITer 优化.)

In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)

但是,如果多次运行相同的方法,第二次和后续运行将执行常规(非 OSR)代码,这是完全优化的.这是证明该理论的基准(使用 JMH 进行基准测试)::>

However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory (benchmarked using JMH):

@State(Scope.Benchmark)
public class Div {

    @Benchmark
    public void divConst(Blackhole blackhole) {
        long startNum = 0;
        long stopNum = 100000000L;

        for (long i = startNum; i <= stopNum; i++) {
            if (i % 50000 == 0) {
                blackhole.consume(i);
            }
        }
    }

    @Benchmark
    public void divVar(Blackhole blackhole) {
        long startNum = 0;
        long stopNum = 100000000L;
        long progressCheck = 50000;

        for (long i = startNum; i <= stopNum; i++) {
            if (i % progressCheck == 0) {
                blackhole.consume(i);
            }
        }
    }
}

结果:

# Benchmark: bench.Div.divConst

# Run progress: 0,00% complete, ETA 00:00:16
# Fork: 1 of 1
# Warmup Iteration   1: 126,967 ms/op
# Warmup Iteration   2: 105,660 ms/op
# Warmup Iteration   3: 106,205 ms/op
Iteration   1: 105,620 ms/op
Iteration   2: 105,789 ms/op
Iteration   3: 105,915 ms/op
Iteration   4: 105,629 ms/op
Iteration   5: 105,632 ms/op


# Benchmark: bench.Div.divVar

# Run progress: 50,00% complete, ETA 00:00:09
# Fork: 1 of 1
# Warmup Iteration   1: 844,708 ms/op          <-- much slower!
# Warmup Iteration   2: 105,893 ms/op          <-- as fast as divConst
# Warmup Iteration   3: 105,601 ms/op
Iteration   1: 105,570 ms/op
Iteration   2: 105,475 ms/op
Iteration   3: 105,702 ms/op
Iteration   4: 105,535 ms/op
Iteration   5: 105,766 ms/op

divVar 的第一次迭代确实慢得多,因为编译的 OSR 存根效率低下.但是,一旦该方法从头开始重新运行,就会执行新的无约束版本,该版本利用了所有可用的编译器优化.

The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.

这篇关于为什么 if (variable1 % variable2 == 0) 效率低下?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆