并行集合中scala折叠的行为 [英] Behavior of scala fold in parallel collections
问题描述
让我们多次运行以下代码行:
Let's run the following line of code several times:
Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
结果非常有趣:
scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res10: Int = 8
scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res11: Int = 20
但是很明显,它应该是顺序版本:
However clearly it should be like in its sequential version:
scala> Set(1,2,3,4,5,6,7).fold(0)(_ - _)
res15: Int = -28
我知道操作-
在整数上是非关联的,这就是这种行为的原因,但是我的问题很简单:不是说fold
不应在.par
的实现中并行化收藏吗?
I understand that operation -
is non-associative on integers and that's the reason behind such behavior, but my question is quite simple: doesn't it mean that fold
should not be parallelized in .par
implementation of collections?
推荐答案
When you look at the standard library documentation, you see that fold
is undeterministic here:
使用指定的关联二进制运算符折叠此序列的元素. 对元素执行操作的顺序不确定,可能不确定.
Folds the elements of this sequence using the specified associative binary operator. The order in which operations are performed on elements is unspecified and may be nondeterministic.
或者,还有foldLeft
:
将二进制运算符应用于起始值和该序列的所有元素,从左到右. 将二进制运算符应用于起始值以及该集合或迭代器的所有元素,从左到右.
Applies a binary operator to a start value and all elements of this sequence, going left to right. Applies a binary operator to a start value and all elements of this collection or iterator, going left to right.
注意:对于不同的运行,可能会返回不同的结果,除非已对基础集合类型进行了排序或运算符具有关联性和可交换性.
Note: might return different results for different runs, unless the underlying collection type is ordered or the operator is associative and commutative.
由于Set
不是有序集合,因此没有可以折叠元素的规范顺序,因此即使对于foldLeft
,标准库也允许自己不确定.如果您在此处使用有序序列,则foldLeft
在这种情况下将是确定性的.
As Set
is not an ordered collection, there's no canonical order in which the elements could be folded, so the standard library allows itself to be undeterministic even for foldLeft
. If you would use an ordered sequence here, foldLeft
would be deterministic in that case.
这篇关于并行集合中scala折叠的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!