Spark-行值之和 [英] Spark - Sum of row values
问题描述
我有以下DataFrame:
I have the following DataFrame:
January | February | March
-----------------------------
10 | 10 | 10
20 | 20 | 20
50 | 50 | 50
我正在尝试为此添加一列,这是每一行的值之和.
I'm trying to add a column to this which is the sum of the values of each row.
January | February | March | TOTAL
----------------------------------
10 | 10 | 10 | 30
20 | 20 | 20 | 60
50 | 50 | 50 | 150
据我所知,所有内置的聚合函数似乎都是用于计算单列中的值.如何在每行的基础上跨列使用值(使用Scala)?
As far as I can see, all the built in aggregate functions seem to be for calculating values in single columns. How do I go about using values across columns on a per row basis (using Scala)?
我已经达到了
val newDf: DataFrame = df.select(colsToSum.map(col):_*).foreach ...
推荐答案
您与此非常接近:
val newDf: DataFrame = df.select(colsToSum.map(col):_*).foreach ...
相反,请尝试以下操作:
Instead, try this:
val newDf = df.select(colsToSum.map(col).reduce((c1, c2) => c1 + c2) as "sum")
我认为这是最好的答案,因为它与使用硬编码的SQL查询的答案一样快,并且与使用UDF
的答案一样方便.这是两全其美的方法,而且我什至没有添加完整的代码行!
I think this is the best of the the answers, because it is as fast as the answer with the hard-coded SQL query, and as convenient as the one that uses the UDF
. It's the best of both worlds -- and I didn't even add a full line of code!
这篇关于Spark-行值之和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!