在foreach中计算的Scala返回值 [英] Scala return value calculated in foreach

查看:370
本文介绍了在foreach中计算的Scala返回值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 scala and spark 的新手,并试图了解一些基本知识.

I am new new to scala and spark and trying to understand few basic stuff out here.

Spark版本使用了1.5.

Spark version used 1.5.

为什么在下面的foreach循环中sum的值不更新.

why does value of sum does not get updated in below foreach loop.

var sum=1;
df.select("column1").distinct().foreach(row=>{ 
sum = sum +1
})
println("SUM = "sum)

-> SUM = 1

我试图了解for-each中提到的变量范围.如果我需要在内部做一些数学运算并在for循环之外获取结果,该怎么办?

I am trying to understand whats scope of variable referred in for-each. What if i need to do some math inside and get the result of it outside the for loop.

我上面要理解的用例是在循环中获取唯一值并将其附加到String列表中.

My use case to understand above is to get unique values in loop and append it to list of String.

推荐答案

您对程序的推理方式是错误的. foreach在每个执行程序上独立执行,并修改其自己的sum副本.这里没有全局共享状态.只需直接计算值即可:

The way you reason about the program is wrong. foreach is executed independently on each executor and modifies its own copy of sum. There is no global shared state here. Just count values directly:

df.select("column1").distinct.count

如果您真的想手动处理此问题,则需要某种类型的reduce:

If you really want to handle this manually you'll need some type of reduce:

df.select("column1").distinct.rdd.map(_ => 1L).reduce(_ + _)

这篇关于在foreach中计算的Scala返回值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆