是Arrays.stream(array_name).SUM()比迭代方法慢? [英] Is Arrays.stream(array_name).sum() slower than iterative approach?

查看:625
本文介绍了是Arrays.stream(array_name).SUM()比迭代方法慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在编写一个LEET code问题: https://开头oj.leet code.com /问题/加油站/ 使用Java 8。

I was coding a leetcode problem : https://oj.leetcode.com/problems/gas-station/ using Java 8.

我的解决方案得到了TLE当我用 Arrays.stream(integer_array).SUM()来计算总和,而同样的解决方案,采用迭代计算元素的和被录取在数组中。针对此问题最好的时间复杂度为O(n)和我很惊讶地拿到TLE使用时流的API从Java 8.我已经实现了在O(n)的解决方案而已。

My solution got TLE when I used Arrays.stream(integer_array).sum() to compute sum while the same solution got accepted using iteration to calculate the sum of elements in array. The best possible time complexity for this problem is O(n) and I am surprised to get TLE when using streaming API's from Java 8. I have implemented the solution in O(n) only.

import java.util.Arrays;

public class GasStation {
    public int canCompleteCircuit(int[] gas, int[] cost) {
        int start = 0, i = 0, runningCost = 0, totalGas = 0, totalCost = 0; 
        totalGas = Arrays.stream(gas).sum();
        totalCost = Arrays.stream(cost).sum();

        // for (int item : gas) totalGas += item;
        // for (int item : cost) totalCost += item;

        if (totalGas < totalCost)
            return -1;

        while (start > i || (start == 0 && i < gas.length)) {
            runningCost += gas[i];
            if (runningCost >= cost[i]) {
                runningCost -= cost[i++];
            } else {
                runningCost -= gas[i];
                if (--start < 0)
                    start = gas.length - 1;
                runningCost += (gas[start] - cost[start]);
            }
        }
        return start;
    }

    public static void main(String[] args) {
        GasStation sol = new GasStation();
        int[] gas = new int[] { 10, 5, 7, 14, 9 };
        int[] cost = new int[] { 8, 5, 14, 3, 1 };
        System.out.println(sol.canCompleteCircuit(gas, cost));

        gas = new int[] { 10 };
        cost = new int[] { 8 };
        System.out.println(sol.canCompleteCircuit(gas, cost));
    }
}

该解决方案获得的接受的时候, 我评论如下两行:(计算总和使用流式)

The solution gets accepted when, I comment the following two lines: (calculating sum using streaming)

totalGas = Arrays.stream(gas).sum();
totalCost = Arrays.stream(cost).sum();

和取消以下两行(总和计算使用迭代):

and uncomment the following two lines (calculating sum using iteration):

//for (int item : gas) totalGas += item;
//for (int item : cost) totalCost += item;

现在该解决方案被接受。在Java中8为什么流API 更慢,大投入比迭代元

Now the solution gets accepted. Why streaming API in Java 8 is slower for large input than iteration for primitives?

推荐答案

在处理这类问题的第一步是把code到受控的环境中。这意味着,在你控制的JVM上运行它(和可调用)和运行测试了良好的标杆线束里面像的江铃控股。分析,不要猜测。

The first step in dealing with problems like this is to bring the code into a controlled environment. That means running it in the JVM you control (and can invoke) and running tests inside a good benchmark harness like JMH. Analyze, don't speculate.

下面是一个基准我刮起了使用江铃控股做一些这方面的分析:

Here's a benchmark I whipped up using JMH to do some analysis on this:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
public class ArraySum {
    static final long SEED = -897234L;

    @Param({"1000000"})
    int sz;

    int[] array;

    @Setup
    public void setup() {
        Random random = new Random(SEED);
        array = new int[sz];
        Arrays.setAll(array, i -> random.nextInt());
    }

    @Benchmark
    public int sumForLoop() {
        int sum = 0;
        for (int a : array)
            sum += a;
        return sum;
    }

    @Benchmark
    public int sumStream() {
        return Arrays.stream(array).sum();
    }
}

基本上,这创造了一百万整数数组并总结他们两次:用一个for循环一次,一次使用流。运行基准测试产生一组输出(省略了简洁和戏剧效果),但汇总结果如下:

Basically this creates an array of a million ints and sums them twice: once using a for-loop and once using streams. Running the benchmark produces a bunch of output (elided for brevity and for dramatic effect) but the summary results are below:

Benchmark                 (sz)  Mode  Samples     Score  Score error  Units
ArraySum.sumForLoop    1000000  avgt        3   514.473      398.512  us/op
ArraySum.sumStream     1000000  avgt        3  7355.971     3170.697  us/op

哇!这8 Java的流东西德SUXX0R!它比一个for循环,不使用它慢14倍!1!

Whoa! That Java 8 streams stuff is teh SUXX0R! It's 14 times slower than a for-loop, don't use it!!!1!

哦,不。首先让我们对这些结果,然后我们更仔细地看看我们是否可以找出发生了什么事情。

Well, no. First let's go over these results, and then look more closely to see if we can figure out what's going on.

摘要显示两种基准方法,以一百万的SZ参数。这是可能的,以改变该参数,但它不变成使在这种情况下的差。我也只跑了基准测试方法的3倍,你可以从样本一栏看到的。 (也有只有3预热迭代,不可见这里。)的分数是在每个操作微秒,并明确了流code是多少,比for循环code慢得多。不过需要注意的也是得分的错误:这是变异在不同的运行量。江铃控股有益打印出结果(此处未显示)的标准偏差,但你可以很容易地看到,分数报告错误得分的显著部分。这降低了我们的信心得分。

The summary shows the two benchmark methods, with the "sz" parameter of a million. It's possible to vary this parameter but it doesn't turn out to make a difference in this case. I also only ran the benchmark methods 3 times, as you can see from the "samples" column. (There were also only 3 warmup iterations, not visible here.) The score is in microseconds per operation, and clearly the stream code is much, much slower than the for-loop code. But note also the score error: that's the amount of variability in the different runs. JMH helpfully prints out the standard deviation of the results (not shown here) but you can easily see that the score error is a significant fraction of reported score. This reduces our confidence in the score.

运行更多的迭代应该有所帮助。更多的热身迭代将让JIT做更多的工作和运行基准测试,运行更标杆迭代将理顺其他地方在我的系统从短暂的活动中的任何错误,然后安顿下来。因此,让我们尝试10热身迭代和10基准迭代:

Running more iterations should help. More warmup iterations will let the JIT do more work and settle down before running the benchmarks, and running more benchmark iterations will smooth out any errors from transient activity elsewhere on my system. So let's try 10 warmup iterations and 10 benchmark iterations:

Benchmark                 (sz)  Mode  Samples     Score  Score error  Units
ArraySum.sumForLoop    1000000  avgt       10   504.803       34.010  us/op
ArraySum.sumStream     1000000  avgt       10  7128.942      178.688  us/op

业绩是整体快一点,而且测量误差也相当小了些,因此运行更多的迭代产生了预期的效果。但流code仍比for循环code要慢得多。这是怎么回事?

Performance is overall a little faster, and the measurement error is also quite a bit smaller, so running more iterations has had the desired effect. But the streams code is still considerably slower than the for-loop code. What's going on?

一个大线索可以通过查看所述流方法的各个定时来获得

A large clue can be obtained by looking at the individual timings of the streams method:

# Warmup Iteration   1: 570.490 us/op
# Warmup Iteration   2: 491.765 us/op
# Warmup Iteration   3: 756.951 us/op
# Warmup Iteration   4: 7033.500 us/op
# Warmup Iteration   5: 7350.080 us/op
# Warmup Iteration   6: 7425.829 us/op
# Warmup Iteration   7: 7029.441 us/op
# Warmup Iteration   8: 7208.584 us/op
# Warmup Iteration   9: 7104.160 us/op
# Warmup Iteration  10: 7372.298 us/op

发生了什么?最初的几个迭代是相当快的,但随后的第四个及以后的迭代(和所有追随基准迭代)突然变得非常慢。

What happened? The first few iterations were reasonably fast, but then the 4th and subsequent iterations (and all the benchmark iterations that follow) were suddenly much slower.

我以前见过这个。正是在这个问题和的this回答的SO其他地方。我建议你​​阅读这个问题的答案;它说明了在这种情况下,JVM的内联决策如何导致较差的性能。

I've seen this before. It was in this question and this answer elsewhere on SO. I recommend reading that answer; it explains how the JVM's inlining decisions in this case result in poorer performance.

背景这里有一点:一个for循环编译为一个非常简单的增量与测试循环中,并可以很容易地像循环剥离和展开常用的优化技术来处理。该流code,而不是在这种情况下很复杂,其实是相当复杂相比for循环code;有设置的公平位,每个回路至少需要一个方法调用。因此,JIT优化,特别是其内嵌决策,是使流code走快的关键。而且有可能为它去错了。

A bit of background here: a for-loop compiles to a very simple increment-and-test loop, and can easily be handled by usual optimization techniques like loop peeling and unrolling. The streams code, while not very complex in this case, is actually quite complex compared to the for-loop code; there's a fair bit of setup, and each loop requires at least one method call. Thus, the JIT optimizations, particularly its inlining decisions, are critical to making the streams code go fast. And it's possible for it to go wrong.

另外一个背景点是整数总和大约是最简单的操作,您能想到的做在一个循环或流。这往往使流设置的固定开销看起来相对比较昂贵。这也是非常简单,它会引发内联政策的病症。

Another background point is that integer summation is about the simplest possible operation you can think of to do in a loop or stream. This will tend to make the fixed overhead of stream setup look relatively more expensive. It is also so simple that it can trigger pathologies in the inlining policy.

从对方的回答中建议是添加JVM选项 -XX:MaxInlineLevel = 12 来增加code可内联的金额。重新运行与该选项的基准给出:

The suggestion from the other answer was to add the JVM option -XX:MaxInlineLevel=12 to increase the amount of code that can be inlined. Rerunning the benchmark with that option gives:

Benchmark                 (sz)  Mode  Samples    Score  Score error  Units
ArraySum.sumForLoop    1000000  avgt       10  502.379       27.859  us/op
ArraySum.sumStream     1000000  avgt       10  498.572       24.195  us/op

嗯,好多了。禁用分层编译使用 -XX:-TieredCompilation 也有避免病态行为的影响。我还发现,使得循环计算甚至是有点贵,如总结整数的平方 - 也就是说,将一个多 - 也避免了病态行为

Ah, much nicer. Disabling tiered compilation using -XX:-TieredCompilation also had the effect of avoiding the pathological behavior. I also found that making the loop computation even a bit more expensive, e.g. summing squares of integers -- that is, adding a single multiple -- also avoids the pathological behavior.

现在,你的问题是关于在莱特code 环境,这似乎运行code在JVM中的上下文中运行的,你不要'T有任何控制权,因此你不能改变的内联和编译选项。你可能不想让你的计算较为复杂,以避免病理无论是。因此,对于这种情况,你可能也仅仅坚持了好老for循环。但不要害怕使用流,即使处理基本数组。它可以从一些窄边情况下表现相当好,放在一边。

Now, your question is about running in the context of the leetcode environment, which seems to run the code in a JVM that you don't have any control over, so you can't change the inlining or compilation options. And you probably don't want to make your computation more complex to avoid the pathology either. So for this case, you might as well just stick to the good old for-loop. But don't be afraid to use streams, even for dealing with primitive arrays. It can perform quite well, aside from some narrow edge cases.

这篇关于是Arrays.stream(array_name).SUM()比迭代方法慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆