减少()VS倍星火 [英] reduce() Vs fold in Spark

查看:179
本文介绍了减少()VS倍星火的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是降低VS倍,相对于他们的技术实施之间的区别?

What is the difference between reduce Vs fold with respect to their technical implementation ?

据我所知,他们通过自己的签名折不同接受它被添加到每个分区输出额外的参数(即初始值)。

I understand that they differ by their signature as fold accepts additional parameter (i.e. initial value) which gets added to each partition output.

有人能告诉使用案例,这两个动作?

Can someone tell about use case for these two actions?

这将有更好的表现在哪种情况考虑是用于折叠0?

Which would perform better in which scenario consider 0 is used for fold?

在此先感谢。

推荐答案

有没有实际的区别,当涉及到任何的表现:

There is no practical difference when it comes to performance whatsoever:


  • 折叠行动中使用折叠在其上使用实施分区之前迭代器 foldLeft

  • 减少使用 reduceLeft 上的分区迭代器

  • fold action is using fold on the parition iterators which is implemented using foldLeft
  • reduce is using reduceLefton the partition iterators

这两种方法都用简单的循环保持蓄电池和工艺顺序分区与可变<一个href=\"https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/TraversableOnce.scala#L155\"相对=nofollow> foldLeft 这样实现的:

Both methods keep mutable accumulator and process partitions sequentially using simple loops with foldLeft implemented like this:

foreach (x => result = op(result, x))

和<一个href=\"https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/TraversableOnce.scala#L178\"相对=nofollow> reduceLeft 这样的:

for (x <- self) {
  if (first) {
    ...
  }
  else acc = op(acc, x)
}

在星火这些方法之间的实际差别只与他们的空收藏和使用可变的缓冲能力的行为(可以说这是关乎性能)。

Practical difference between these methods in Spark is only related to their behavior on empty collections and ability to use mutable buffer (arguably it is related to performance).

您会发现在一些讨论为什么是必要的,星火折行动?

这篇关于减少()VS倍星火的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆