Spark Dataframe groupBy 以序列作为键参数 [英] Spark Dataframe groupBy with sequence as keys arguments
问题描述
我有一个 spark dataFrame,我想通过多个键聚合值
正如 spark 文档所建议的那样:
<块引用>def groupBy(col1: String, cols: String*): GroupedDataDataFrame 使用指定的列,因此我们可以对它们进行聚合
所以我做了以下
val keys = Seq("a", "b", "c")dataframe.groupBy(keys:_*).agg(...)
Intellij Idea 向我抛出以下错误:
<块引用>- 非重复参数的扩展
- 类型不匹配:预期 Seq[Column],实际 Seq[String]
但是,我可以手动传递多个参数而不会出错:
dataframe.groupBy("a", "b", "c").agg(...)
所以,我的问题是:如何以编程方式执行此操作?
使用带有 groupBy(cols: Column*)
import org.apache.spark.sql.functions.colval 键 = Seq("a", "b", "c").map(col(_))dataframe.groupBy(keys:_*).agg(...)
或 head
/tail
with groupBy(col1: String, cols: String*)
:
val keys = Seq("a", "b", "c")dataframe.groupBy(keys.head, keys.tail: _*).agg(...)
I have a spark dataFrame and I want to aggregate values by multiple keys
As spark documentation suggests:
def groupBy(col1: String, cols: String*): GroupedData Groups the DataFrame using the specified columns, so we can run aggregation on them
So I do the following
val keys = Seq("a", "b", "c")
dataframe.groupBy(keys:_*).agg(...)
Intellij Idea throws me following errors:
- expansion for non repeated parameters
- Type mismatch: expected Seq[Column], actual Seq[String]
However, I can pass multiple arguments manually without errors:
dataframe.groupBy("a", "b", "c").agg(...)
So, my question is: How can I do this programmatically?
Either use columns with groupBy(cols: Column*)
import org.apache.spark.sql.functions.col
val keys = Seq("a", "b", "c").map(col(_))
dataframe.groupBy(keys:_*).agg(...)
or head
/ tail
with groupBy(col1: String, cols: String*)
:
val keys = Seq("a", "b", "c")
dataframe.groupBy(keys.head, keys.tail: _*).agg(...)
这篇关于Spark Dataframe groupBy 以序列作为键参数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!