Java 8嵌套循环与流和&性能 [英] Java 8 nested loops with streams & performance

查看:91
本文介绍了Java 8嵌套循环与流和&性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了练习Java 8流,我尝试将以下嵌套循环转换为Java 8流API。它计算a ^ b(a,b< 100)的最大数字总和,并在我的Core i5 760上花费~0.135s。

In order to practise the Java 8 streams I tried converting the following nested loop to the Java 8 stream API. It calculates the largest digit sum of a^b (a,b < 100) and takes ~0.135s on my Core i5 760.

public static int digitSum(BigInteger x)
{
    int sum = 0;
    for(char c: x.toString().toCharArray()) {sum+=Integer.valueOf(c+"");}
    return sum;
}

@Test public void solve()
    {
        int max = 0;
        for(int i=1;i<100;i++)
            for(int j=1;j<100;j++)
                max = Math.max(max,digitSum(BigInteger.valueOf(i).pow(j)));
        System.out.println(max);
    }

我的解决方案,我预计会更快,因为并行性实际上需要0.25 s(0.19s没有 parallel()):

My solution, which I expected to be faster because of the paralellism actually took 0.25s (0.19s without the parallel()):

int max =   IntStream.range(1,100).parallel()
            .map(i -> IntStream.range(1, 100)
            .map(j->digitSum(BigInteger.valueOf(i).pow(j)))
            .max().getAsInt()).max().getAsInt();

我的问题


  • 我做了正确的转换,还是有更好的方法将嵌套循环转换为流计算?

  • 为什么流变量要慢得多旧的?

  • 为什么parallel()语句实际上将时间从0.19s增加到0.25s?

我知道microbenchmarks是脆弱的,并行性只对大问题是值得的,但对于CPU来说,甚至0.1s都是永恒的,对吗?

I know that microbenchmarks are fragile and parallelism is only worth it for big problems but for a CPU, even 0.1s is an eternity, right?

更新

我使用Eclipse Kepler中的Junit 4框架进行测量(它显示了执行测试所需的时间)。

I measure with the Junit 4 framework in Eclipse Kepler (it shows the time taken for executing a test).

我的结果为a,b< 1000而不是100:

My results for a,b<1000 instead of 100:


  • 传统循环186s

  • 顺序流193s

  • 并行流55s

更新2
用<$ c $替换 sum + = Integer.valueOf(c +); c> sum + = c - '0'; (感谢Peter!)平行方法削减10整秒,使其达到45秒。没想到会有如此大的性能影响!

Update 2 Replacing sum+=Integer.valueOf(c+""); with sum+= c - '0'; (thanks Peter!) shaved off 10 whole seconds of the parallel method, bringing it to 45s. Didn't expect such a big performance impact!

此外,减少与CPU内核数量的并行性(在我的情况下为4)并没有做太多,因为它减少了时间只到44.8s(是的,它增加了a和b = 0,但我认为这不会对性能产生太大影响):

Also, reducing the parallelism to the number of CPU cores (4 in my case) didn't do much as it reduced the time only to 44.8s (yes, it adds a and b=0 but I think this won't impact the performance much):

int max = IntStream.range(0, 3).parallel().
          .map(m -> IntStream.range(0,250)
          .map(i -> IntStream.range(1, 1000)
          .map(j->.digitSum(BigInteger.valueOf(250*m+i).pow(j)))
          .max().getAsInt()).max().getAsInt()).max().getAsInt();


推荐答案

我创建了一个快速而肮脏的微基准测试基于你的代码。结果是:

I have created a quick and dirty micro benchmark based on your code. The results are:


循环:3192

lambda:3140

lambda parallel:868

loop: 3192
lambda: 3140
lambda parallel: 868

因此循环和lambda是等效的,并行流显着提高了性能。由于您的基准测试方法,我怀疑您的结果不可靠。

So the loop and lambda are equivalent and the parallel stream significantly improves the performance. I suspect your results are unreliable due to your benchmarking methodology.

public static void main(String[] args) {
    int sum = 0;

    //warmup
    for (int i = 0; i < 100; i++) {
        solve();
        solveLambda();
        solveLambdaParallel();
    }

    {
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sum += solve();
        }
        long end = System.nanoTime();
        System.out.println("loop: " + (end - start) / 1_000_000);
    }
    {
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sum += solveLambda();
        }
        long end = System.nanoTime();
        System.out.println("lambda: " + (end - start) / 1_000_000);
    }
    {
        long start = System.nanoTime();
        for (int i = 0; i < 100; i++) {
            sum += solveLambdaParallel();
        }
        long end = System.nanoTime();
        System.out.println("lambda parallel : " + (end - start) / 1_000_000);
    }
    System.out.println(sum);
}

public static int digitSum(BigInteger x) {
    int sum = 0;
    for (char c : x.toString().toCharArray()) {
        sum += Integer.valueOf(c + "");
    }
    return sum;
}

public static int solve() {
    int max = 0;
    for (int i = 1; i < 100; i++) {
        for (int j = 1; j < 100; j++) {
            max = Math.max(max, digitSum(BigInteger.valueOf(i).pow(j)));
        }
    }
    return max;
}

public static int solveLambda() {
    return  IntStream.range(1, 100)
            .map(i -> IntStream.range(1, 100).map(j -> digitSum(BigInteger.valueOf(i).pow(j))).max().getAsInt())
            .max().getAsInt();
}

public static int solveLambdaParallel() {
    return  IntStream.range(1, 100)
            .parallel()
            .map(i -> IntStream.range(1, 100).map(j -> digitSum(BigInteger.valueOf(i).pow(j))).max().getAsInt())
            .max().getAsInt();
}






我也运行过它jmh比手动测试更可靠。结果与上述一致(每次通话微秒数):


I have also run it with jmh which is more reliable than manual tests. The results are consistent with above (micro seconds per call):

Benchmark                                Mode   Mean        Units
c.a.p.SO21968918.solve                   avgt   32367.592   us/op
c.a.p.SO21968918.solveLambda             avgt   31423.123   us/op
c.a.p.SO21968918.solveLambdaParallel     avgt   8125.600    us/op

这篇关于Java 8嵌套循环与流和&amp;性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆