如何计算跨列的总数,但只有一个? [英] How to calculate total across columns but one?
本文介绍了如何计算跨列的总数,但只有一个?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想在数据框中创建一个总计"行.
I want to create a "Total" row in a dataframe.
这将添加除 uid 单元格之外的所有行.
This will add all rows EXCEPT the uid cell.
uid val1 val2 val3
3213 1 2 3
要创建此:
uid val1 val2 val3 Total
3213 1 2 3 6
所以,我需要过滤掉 UID,然后求和.但是,如果我在求和之前删除 UID,那么我将无法在求和后重新加入表(因为连接必须在 UID 上).
So, I need to filter out the UID, then sum. However, if I drop the UID before summing, then I won't be able to rejoin the tables after summing (as the join would have to be on UID).
我正在使用过滤器,但找不到在过滤器中获取列名称的方法.
I was playing with filter, but I cannot find a way to get the Column Name in filter.
所以我到目前为止是:
val dfvReducedTotalled = dfvReduced.withColumn("TOTAL", dfvReduced.columns
.filter(col=> !col.?????? == "UID")
.map(c => col(c)).reduce((c1, c2) => c1 + c2))
推荐答案
您可以先收集不是 uid
的列名,然后使用 构建
然后创建 sum
表达式reduceTotal
列:
You can collect column names that are not uid
firstly, build the sum
expressions using reduce
and then create the Total
column:
val row_sum_expr = df.columns.collect{ case x if x != "uid" => col(x) }.reduce(_ + _)
df.withColumn("Total", row_sum_expr).show
+----+----+----+----+-----+
| uid|val1|val2|val3|Total|
+----+----+----+----+-----+
|3213| 1| 2| 3| 6|
+----+----+----+----+-----+
这篇关于如何计算跨列的总数,但只有一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文