Spark:在Range中使用累加器时,其无法正常工作 [英] Spark: Accumulators does not work properly when I use it in Range

查看:68
本文介绍了Spark:在Range中使用累加器时,其无法正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么Spark无法正确更新我的累加器.

I don't understand why my accumulator hasn't been updated properly by Spark.

object AccumulatorsExample extends App {
  val acc = sc.accumulator(0L, "acc")
  sc range(0, 20000, step = 25) map { _ => acc += 1 } count()
  assert(acc.value == 800) // not equals
}

我的Spark配置:

setMaster("local[*]") // should use 8 cpu cores

我不确定Spark是否在每个内核上分配累加器的计算量,也许就是问题所在.

I'm not sure if Spark distribute computations of accumulator on every core and maybe that's the problem.

我的问题是如何将所有 acc 值合计为一个总和并获得正确的累加器值(800)?

My question is how can I aggregate all acc values in one single sum and get the right accumulator value (800)?

PS

如果我限制核心编号 setMaster("local [1]"),那么所有方法都可以正常工作.

If I restrict core number setMaster("local[1]") than all works fine.

推荐答案

这里有两个不同的问题:

There are two different issues here:

  • 您正在扩展 App ,而不是实现 main 方法.有一些与此方法有关的已知问题,包括不正确的累加器行为,因此

  • You are extending App instead of implementing main method. There are some known issues related to this approach including incorrect accumulator behavior and because of that it shouldn't be used in Spark applications. This is most likely the source of the problem.

例如,请参见 SPARK-4170 ,以了解与扩展 App .

See for example SPARK-4170 for other possible issues related to extending App.

您正在转换中使用累加器.这意味着累加器可以任意增加次数(成功完成给定操作至少一次).

You are using accumulators inside transformations. It means that accumulator can incremented arbitrary number of times (at least once when given job is successful).

一般来说,您需要精确的结果,应该只在诸如 foreach foreachPartition 之类的动作中使用累加器,尽管它不太可能在诸如玩具之类的应用程序中遇到任何问题这个.

In general you require exact results you should use accumulators only inside actions like foreach and foreachPartition although it it rather unlikely you'll experience any issues in a toy application like this.

这篇关于Spark:在Range中使用累加器时,其无法正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆