使用列名称数组聚合Spark数据框，并保留名称 [英] Aggregate a Spark data frame using an array of column names, retaining the names

查看：162 发布时间：2020/6/2 20:48:03 scala apache-spark apache-spark-sql aggregate-functions

本文介绍了使用列名称数组聚合Spark数据框，并保留名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我想使用列名称数组作为输入来汇总Spark数据帧，同时保留列的原始名称。

I would like to aggregate a Spark data frame using an array of column names as input, and at the same time retain the original names of the columns.

df.groupBy($"id").sum(colNames:_*)

这有效，但无法保留名称。在此处中找到的答案启发了我尝试过这个：

This works but fails to preserve the names. Inspired by the answer found here I unsucessfully tried this:

df.groupBy($"id").agg(sum(colNames:_*).alias(colNames:_*))
error: no `: _*' annotation allowed here

它可以采用单个元素，例如

It works to take a single element like

df.groupBy($"id").agg(sum(colNames(2)).alias(colNames(2)))

如何在整个阵列中做到这一点？

How can make this happen for the entire array?

只需提供一系列具有别名的列：

Just provide an sequence of columns with aliases:

val colNames: Seq[String] = ???
val exprs = colNames.map(c => sum(c).alias(c))
df.groupBy($"id").agg(exprs.head, exprs.tail: _*)

这篇关于使用列名称数组聚合Spark数据框，并保留名称的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文