“快速” Java的整数能力 [英] "Fast" Integer Powers in Java

查看:85
本文介绍了“快速” Java的整数能力的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[简短回答:基准测试方法不正确。您可能以为我现在已经知道了。]



问题表现为找到一种快速计算x ^ y的方法,其中x和y是正整数。典型的快速算法如下所示:

  public long fastPower(int x,in y){
/ /用下面介绍的更好版本
//替换了我的代码,但是这个版本的速度并没有比
long base = x之前的版本要快得多; //否则,我们可能会在x * = x处溢出。
长结果= y%2 == 1? x:1;
而(y> 1){
base * = base;
y>> = 1;
如果(y%2 == 1)结果* =基数;
}

返回结果;
}

我想看看这比说快多少,请致电Math.pow (),或使用简单的方法,例如将x自身乘以y次,例如:

  public long naivePower(int x, int y){
长结果= 1;
for(int i = 0; i< y; i ++){
结果* = x;
}
返回结果;
}

编辑:好的,已经向我指出(正确)我的基准测试代码没有消耗结果,而这完全丢掉了所有东西。一旦开始使用结果,我仍然会发现天真的方法比快速方法快25%。



原始文本:


我很惊讶地发现,幼稚的方法比快速版本快4倍,而快速版本本身比Math.pow()版本快3倍。

我的测试使用了10,000,000次试验(然后是1亿次试验,只是为了确保JIT有时间预热),每个试验都使用随机值(以防止调用被优化掉)2< = x< = 3,和25< = y< =29。我选择了一个狭窄的值范围,该值的范围不会大于2 ^ 63,但是偏向更大的指数,试图给快速版本一个优势。我正在预先生成10,000个伪随机数,以从时序中消除那部分代码。



我知道对于小指数而言,天真的版本可能会快点。 快速版本具有两个分支而不是一个分支,并且通常执行的算术/存储操作是朴素的分支的两倍-但是我希望对于指数较大的用户,这仍然会导致快速方法节省一半的运算量。



任何人都知道为什么天真的方法比快速版本要快得多,甚至数据偏向快速版本(即较大的指数)?标杆代码(是的,我知道我应该对官方基准使用某种框架,但是代码中的额外分支是否在运行时造成了很大的差异?



基准测试代码)这是一个玩具问题)-更新后会预热并消耗结果:

  PowerIf [] powers = new PowerIf [] { 
new EasyPower(),//仅调用Math.pow()并转换为int
new NaivePower(),
new FastPower()
};

Random rand = new Random(0); //每次运行相同的种子
int randCount = 10000;
int [] bases = new int [randCount];
int []指数= new int [randCount];
for(int i = 0; i bases [i] = 2 + rand.nextInt(2);
指数[i] = 25 + rand.nextInt(5);
}

int计数= 1000000000;

for(int trial = 0; trial< powers.length; trial ++){
long total = 0;
for(int i = 0; i< count; i ++){//预热
final int x = bases [i%randCount];
final int y =指数[i%randCount];
总数+ = powers [trial] .power(x,y);
}
长期= System.currentTimeMillis();
for(int i = 0; i final int x = bases [i%randCount];
final int y =指数[i%randCount];
总数+ = powers [trial] .power(x,y);
}
long end = System.currentTimeMillis();
System.out.printf(%25s:%d ms%n,powers [trial] .toString(),(结束-开始));
System.out.println(total);
}

产生输出:

 
EasyPower:7908毫秒
-407261252961037760
NaivePower:1993毫秒
-407261252961037760
FastPower:2394毫秒
-407261252961037760

使用随机数参数和试验确实会改变输出特性,但试验之间的比率始终与所示比率相同。

解决方案

fastPower 有两个问题:


  1. 最好将 y%2 == 0 替换为( y& 1)== 0 ;按位运算会更快。

  2. 您的代码始终会减小 y 并执行额外的乘法运算,包括 y的情况是偶数。最好将这部分放在 else 子句中。

无论如何,我想您的基准测试方法并不完美。 4倍的性能差异听起来很奇怪,如果不看完整的代码就无法解释。



应用上述改进后,我已经使用 JMH 基准为 fastPower 确实比 naivePower 快1.3倍至2倍。

 包装台; 

导入org.openjdk.jmh.annotations。*;

@State(Scope.Benchmark)
公共类FastPow {
@Param( 3)
int x;
@Param({ 25, 28, 31, 32})
int y;

@基准
public long fast(){
return fastPower(x,y);
}

@Benchmark
public long naive(){
return naivePower(x,y);
}

public static long fastPower(long x,in y){
long result = 1;
而(y> 0){
if((y& 1)== 0){
x * = x;
y>> == 1;
}否则{
结果* = x;
y--;
}
}
返回结果;
}

public static long naivePower(long x,int y){
long result = 1;
for(int i = 0; i< y; i ++){
结果* = x;
}
返回结果;
}
}

结果:

 基准(x)(y)模式Cnt得分错误单位
FastPow.fast 3 25 thrpt 10 103,406±0,664 ops / us
FastPow。 fast 3 28 thrpt 10 103,520±0,351 ops / us
FastPow.fast 3 31 thrpt 10 85,390±0,286 ops / us
FastPow.fast 3 32 thrpt 10 115,868±0,294 ops / us
FastPow .naive 3 25 thrpt 10 76,331±0,660 ops / us
FastPow.naive 3 28 thrpt 10 69,527±0,464 ops / us
FastPow.naive 3 31 thrpt 10 54,407±0,231 ops / us
FastPow.naive 3 32 thrpt 10 56,127±0,207 ops / us

注意:整数乘法运算非常快,有时甚至比额外的比较还要快。不要期望将的值设置为适当的值会带来巨大的性能提升。快速幂算法的优势将在具有更大指数的 BigInteger 上显而易见。



Update



自从作者发布了基准以来,我必须承认令人惊讶的性能结果来自于常见的基准陷阱。
我在保留原始方法的同时改进了基准,现在它显示 FastPower 确实比 NaivePower 请参见此处



改进版本的主要变化是什么?


  1. 应在不同的JVM实例中分别测试不同的算法,以防止配置文件污染。 / li>
  2. 必须多次调用基准测试程序才能正确编译/重新编译,直到达到稳定状态为止。

  3. 一个基准测试程序应放在单独的位置避免堆栈替换问题的方法。

  4. y%2 y&替换。 1 ,因为HotSpot不会自动执行此优化。

  5. 最大程度地减少了基准测试循环中无关操作的影响。

手动编写微基准测试是一项艰巨的任务。因此,强烈建议使用适当的基准测试框架,例如 JMH 。 / p>

[Short answer: Bad benchmarking methodology. You'd think I'd have figured that out by now.]

The problem is presented as "find a method for calculating x^y quickly, where x and y are positive integers". A typical "fast" algorithm looks like this:

public long fastPower(int x, int y) {
  // Replaced my code with the "better" version described below,
  // but this version isn't measurably faster than what I had before
  long base = x; // otherwise, we may overflow at x *= x.
  long result = y % 2 == 1 ? x : 1;
  while (y > 1) {
    base *= base;
    y >>= 1;
    if (y % 2 == 1) result *= base;
  }

  return result;
}

I wanted to see how much faster this is than say, calling Math.pow(), or using a naive approach like multiplying x by itself y times, like this:

public long naivePower(int x, int y) {
  long result = 1;
  for (int i = 0; i < y; i++) {
    result *= x;
  }
  return result;
}

Edit: Okay, it's been pointed out to me (correctly) that my benchmarking code was not consuming the result, and that totally throws everything off. Once I started consuming the result, I'm still seeing the naive approach being about 25% faster than the "fast" approach.

Original text:

I was very surprised to find that the naive approach was 4x faster than the "fast" version, which was itself about 3x faster than the Math.pow() version.

My test is using 10,000,000 trials (and then 100 million, just to make absolutely sure the JIT has time to warm up), each using random values (to prevent calls from being optimized away) 2 <= x <= 3, and 25 <= y <= 29. I chose a narrow range of values that wouldn't produce a result greater than 2^63, but were biased with a larger exponent to attempt to give the "fast" version an advantage. I'm pre-generating 10,000 pseudorandom numbers to eliminate that part of the code from the timing.

I understand that for small exponents, the naive version could be expected to be faster. The "fast" version has two branches instead of one, and will generally do twice as many arithmetic/storage operations as the naive one - but I would expect for large exponents, that still would lead to the fast approach saving half as many operations in the best case, and being about the same in the worst case.

Anyone have any idea why the naive approach would be so much faster than the "fast" version, even with data biased toward the "fast" version (i.e. larger exponents)? Does the extra branch in that code account for that much of a difference at runtime?

Benchmarking code (yes I know I should use some framework for "official" benchmarks, but this is a toy problem) - updated to warm up, and consume results:

PowerIf[] powers = new PowerIf[] {
  new EasyPower(), // just calls Math.pow() and cast to int
  new NaivePower(),
  new FastPower()
};

Random rand = new Random(0); // same seed for each run
int randCount = 10000;
int[] bases = new int[randCount];
int[] exponents = new int[randCount];
for (int i = 0; i < randCount; i++) {
  bases[i] = 2 + rand.nextInt(2);
  exponents[i] = 25 + rand.nextInt(5);
}

int count = 1000000000;

for (int trial = 0; trial < powers.length; trial++) {
  long total = 0;
  for (int i = 0; i < count; i++) { // warm up
    final int x = bases[i % randCount];
    final int y = exponents[i % randCount];
    total += powers[trial].power(x, y);
  }
  long start = System.currentTimeMillis();
  for (int i = 0; i < count; i++) {
    final int x = bases[i % randCount];
    final int y = exponents[i % randCount];
    total += powers[trial].power(x, y);
  }
  long end = System.currentTimeMillis();
  System.out.printf("%25s: %d ms%n", powers[trial].toString(), (end - start)); 
  System.out.println(total);
}

Produces output:

                EasyPower: 7908 ms
-407261252961037760
               NaivePower: 1993 ms
-407261252961037760
                FastPower: 2394 ms
-407261252961037760

Playing with the parameters for the random numbers and the trials does change the output characteristics, but the ratios between the tests are always consistently the same as those shown.

解决方案

There are two issues with your fastPower:

  1. It's better to replace y % 2 == 0 with (y & 1) == 0; bitwise operations are faster.
  2. Your code always decrements y and performs extra multiplication, including the cases when y is even. It's better to put this part into else clause.

Anyway, I guess that your benchmarking method is not perfect. 4x performance difference sounds weird and cannot be explained without seeing the complete code.

After applying above improvements I've verified using JMH benchmark that fastPower is indeed faster than naivePower with the factor of 1.3x to 2x.

package bench;

import org.openjdk.jmh.annotations.*;

@State(Scope.Benchmark)
public class FastPow {
    @Param("3")
    int x;
    @Param({"25", "28", "31", "32"})
    int y;

    @Benchmark
    public long fast() {
        return fastPower(x, y);
    }

    @Benchmark
    public long naive() {
        return naivePower(x, y);
    }

    public static long fastPower(long x, int y) {
        long result = 1;
        while (y > 0) {
            if ((y & 1) == 0) {
                x *= x;
                y >>>= 1;
            } else {
                result *= x;
                y--;
            }
        }
        return result;
    }

    public static long naivePower(long x, int y) {
        long result = 1;
        for (int i = 0; i < y; i++) {
            result *= x;
        }
        return result;
    }
}

Results:

Benchmark      (x)  (y)   Mode  Cnt    Score   Error   Units
FastPow.fast     3   25  thrpt   10  103,406 ± 0,664  ops/us
FastPow.fast     3   28  thrpt   10  103,520 ± 0,351  ops/us
FastPow.fast     3   31  thrpt   10   85,390 ± 0,286  ops/us
FastPow.fast     3   32  thrpt   10  115,868 ± 0,294  ops/us
FastPow.naive    3   25  thrpt   10   76,331 ± 0,660  ops/us
FastPow.naive    3   28  thrpt   10   69,527 ± 0,464  ops/us
FastPow.naive    3   31  thrpt   10   54,407 ± 0,231  ops/us
FastPow.naive    3   32  thrpt   10   56,127 ± 0,207  ops/us

Note: Integer multiplication is quite fast operation, sometimes even faster than an extra comparison. Don't expect huge performance improvements with the values that fit into long. The advantage of fast power algorithm will be evident on BigInteger with larger exponents.

Update

Since the author posted the benchmark, I must admit that surprising performance results come from the common benchmarking pitfalls. I've improved the benchmark while retaining the original methodology, and now it shows that FastPower is indeed faster than NaivePower, see here.

What are the key changes in the improved version?

  1. Different algorithms should be tested separately in different JVM instances to prevent profile pollution.
  2. A benchmark must be called several times to allow proper compilation/recompilation until the steady state is reached.
  3. One benchmark trial should be placed in a separate method to avoid on-stack replacement issues.
  4. y % 2 is replaced with y & 1 since HotSpot does not perform this optimization automatically.
  5. Minimized the effect of unrelated operations in the main benchmark loop.

Writing microbenchmarks manually is a difficult tasks. That's why it is strongly recommended to use proper benchmarking frameworks like JMH.

这篇关于“快速” Java的整数能力的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆