为什么我应该在使用 collect 的并行流中使用并发特性? [英] Why should I use concurrent characteristic in parallel stream with collect?

查看:18
本文介绍了为什么我应该在使用 collect 的并行流中使用并发特性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么我应该在使用 collect 的并行流中使用并发特性:

List列表 =Collections.synchronizedList(new ArrayList<>(Arrays.asList(1, 2, 4)));映射<整数,整数>收集 = list.stream().parallel().collect(Collectors.toConcurrentMap(k -> k, v -> v, (c, c2) -> c + c2));

而不是:

Map收集 = list.stream().parallel().collect(Collectors.toMap(k -> k, v -> v, (c, c2) -> c + c2));

换句话说,不使用这个特性有什么副作用,它对内部流操作有用吗?

解决方案

这两个收集器的运作方式完全不同.

首先,Stream 框架会将工作负载拆分为可以并行处理的独立块(这就是为什么您不需要特殊集合作为源的原因,synchronizedList 是不必要的).

对于非并发收集器,每个块将通过使用收集器的供应商创建本地容器(此处为 Map)并将其累积到本地容器(放置条目)来处理.必须合并这些部分结果,即将一张地图放入另一张地图,以获得最终结果.

并发收集器支持并发累积,因此只会创建一个 ConcurrentMap 并且所有线程同时累积到该映射中.所以完成后不需要合并步骤,因为只有一张地图.


所以这两个收集器都是线程安全的,但可能表现出完全不同的性能特征,具体取决于任务.如果 Stream 在收集结果之前的工作量很大,则差异可能可以忽略不计.如果像您的示例一样,在收集操作之前没有相关工作,结果在很大程度上取决于必须合并映射的频率,即出现相同的键,以及实际目标 ConcurrentMap 如何处理并发案件中的争用.

如果您主要有不同的键,则非并发收集器的合并步骤可能与之前的放置一样昂贵,从而破坏了并行处理的任何好处.但是如果你有很多重复的键,需要合并值,同一个键的争用可能会降低并发收集器的性能.

所以没有简单的哪个更好"的答案(好吧,如果有这样的答案,为什么还要添加另一个变体).这取决于您的实际操作.您可以使用预期场景作为选择场景的起点,但随后应使用现实生活数据进行衡量.由于两者是等效的,您可以随时更改您的选择.

Why should I use concurrent characteristic in parallel stream with collect:

List<Integer> list =
        Collections.synchronizedList(new ArrayList<>(Arrays.asList(1, 2, 4)));

Map<Integer, Integer> collect = list.stream().parallel()
        .collect(Collectors.toConcurrentMap(k -> k, v -> v, (c, c2) -> c + c2));

And not:

Map<Integer, Integer> collect = list.stream().parallel()
        .collect(Collectors.toMap(k -> k, v -> v, (c, c2) -> c + c2));

In other words, what are the side effects to not using this characteristic, Is it useful for the internal stream operations?

解决方案

These two collectors operate in a fundamentally different way.

First, the Stream framework will split the workload into independent chunks that can be processed in parallel (that’s why you don’t need a special collection as the source, synchronizedList is unnecessary).

With a non-concurrent collector, each chunk will be processed by creating a local container (here, a Map) using the Collector’s supplier and accumulating it into the local container (putting entries). These partial results have to be merged, i.e. one map has been put into the other, to get a final result.

A concurrent collector supports accumulating concurrently, so only one ConcurrentMap will be created and all threads accumulate into that map at the same time. So after completion, no merging step is required, as there is only one map.


So both collectors are thread-safe, but might exhibit entirely different performance characteristics, depending on the task. If the Stream’s workload before collecting the result is heavy, the differences might be negligible. If like in your example, there is no relevant work before the collect operation, the outcome heavily depends on how often mappings have to be merged, i.e the same key occurs, and how the actual target ConcurrentMap deals with contention in the concurrent case.

If you mostly have distinct keys, the merging step of a non-concurrent collector can be as expensive as the previous putting, destroying any benefit of the parallel processing. But if you have lots of duplicate keys, requiring merging of the values, the contention on the same key may degrade the concurrent collector’s performance.

So there’s no simple "which is better" answer (well, if there was such an answer, why bother adding the other variant). It depends on your actual operation. You can use the expected scenario as a starting point for selecting one but should measure with the real-life data then. Since both are equivalent, you can change your choice at any time.

这篇关于为什么我应该在使用 collect 的并行流中使用并发特性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆