Spark-行值之和 [英] Spark - Sum of row values

查看：87 发布时间：2020/9/4 1:17:15 scala apache-spark

本文介绍了Spark-行值之和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下DataFrame:

I have the following DataFrame:

January | February | March
-----------------------------
  10    |    10    |  10
  20    |    20    |  20
  50    |    50    |  50

我正在尝试为此添加一列，这是每一行的值之和.

I'm trying to add a column to this which is the sum of the values of each row.

January | February | March  | TOTAL
----------------------------------
  10    |    10    |   10   |  30
  20    |    20    |   20   |  60
  50    |    50    |   50   |  150

据我所知，所有内置的聚合函数似乎都是用于计算单列中的值.如何在每行的基础上跨列使用值(使用Scala)?

As far as I can see, all the built in aggregate functions seem to be for calculating values in single columns. How do I go about using values across columns on a per row basis (using Scala)?

我已经达到了

val newDf: DataFrame = df.select(colsToSum.map(col):_*).foreach ...

推荐答案

您与此非常接近:

val newDf: DataFrame = df.select(colsToSum.map(col):_*).foreach ...

相反，请尝试以下操作:

Instead, try this:

val newDf = df.select(colsToSum.map(col).reduce((c1, c2) => c1 + c2) as "sum")

我认为这是最好的答案，因为它与使用硬编码的SQL查询的答案一样快，并且与使用UDF的答案一样方便.这是两全其美的方法，而且我什至没有添加完整的代码行！

I think this is the best of the the answers, because it is as fast as the answer with the hard-coded SQL query, and as convenient as the one that uses the UDF. It's the best of both worlds -- and I didn't even add a full line of code!

这篇关于Spark-行值之和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark-行值之和 [英] Spark - Sum of row values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark-行值之和 [英] Spark - Sum of row values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭