Collector#toList的运行时复杂度 [英] Runtime complexity of Collectors#toList

查看:132
本文介绍了Collector#toList的运行时复杂度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Java库源代码中,Collectors#toList方法的定义如下:

In the Java library source code, the Collectors#toList method is defined like this:

public static <T>
Collector<T, ?, List<T>> toList() {
    return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add,
                               (left, right) -> { left.addAll(right); return left; },
                               CH_ID);
}

我们将BinaryOperator视为CollectorImpl构造函数的第三个参数,该参数在线性时间内合并了两个子结果.

We see BinaryOperator as third parameter of CollectorImpl's contructor, which merges two sub-results in linear time.

这是否意味着在Stream#collect方法频繁使用此功能的情况下,我们可以获得平方计算时间?

Does it mean, that in case of frequent usage of this function by Stream#collect method, we can gain square computation time?

考虑以下代码:

List<Integer> desc = Stream.iterate(n, k -> k - 1).limit(n + 1)
    .collect(Collectors.toList());

desc.parallelStream()
    .map(k -> {
        try {
            Thread.sleep(k * 500);
        } catch (InterruptedException ignored) {
        }
        return k;
    })
    .collect(Collectors.toList());

第二个流的元素碰巧是按降序计算的.收集方法最简单的方法是将每个数字包装到List中,然后将所有下一个数字相加,总复杂度为平方,多么难过.

Elements of second stream happened to be computed in descend order. The easiest what collect method can do is for each number wrap it into List and add all next numbers to it, with square complexity in total, how sad.

推荐答案

在这种情况下,输入desc列表将根据系统具有的硬件线程数分为几个独立的部分.通常,它是4核系统上的16个零件(尽管未指定并且可能会更改).每个部分将使用累加器进行独立处理,然后将结果通过合并器合并在一起.因此它不会下降到二次复杂度,但是是的,将完成许多不必要的复制.

In this case the input desc list will be divided to the several independent parts depending on number of hardware threads your system has. Usually it's 16 parts on 4-core system (though it's not specified and may change). Each part will be processed independently using the accumulator, then results will be merged together using the combiner. So it won't drop to quadratic complexity, but yes, many unnecessary copying will be done.

实际上,使用toArray()方法效率更高.它检查流源的特性,在您的情况下,它特别优化,因为源是SIZED和SUBSIZED,因此可以将结果写入单个数组,而无需任何其他复制.如果需要List,则可以考虑使用Arrays.asList(desc.parallelStream()....toArray(Integer[]::new))

Actually it's much more efficient to use toArray() method. It checks the stream source characteristics and in your case it's especially optimized as the source is SIZED and SUBSIZED, so the result can be written into single array without any additional copying. If you need the List, you may consider using Arrays.asList(desc.parallelStream()....toArray(Integer[]::new))

这篇关于Collector#toList的运行时复杂度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆