如何使用Spark KeyValueGroupedDataset的agg方法? [英] How to use the agg method of Spark KeyValueGroupedDataset?
本文介绍了如何使用Spark KeyValueGroupedDataset的agg方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我们有一些类似这样的代码:
We have some code like this:
// think of class A as a table with two columns
case class A(property1: String, property2: Long)
// class B adds a column to class A
case class B(property1: String, property2: Long, property3: String)
df.as[A].map[B](a => {
val my_udf = // some code here which creates a user defined function
new B(a.property1, a.property2, my_udf(a))
})
其中df是一个DataFrame.接下来,我们要创建C类型的数据集
where df is a DataFrame. next we want to create a dataset of type C
// we want to group objects of type B by properties 1 and 3 and compute the average of property2 and also want to store a count
case class C(property1: String, property3: String, average: Long, count: Long)
我们将像这样在sql中创建
which we would have created in sql like this
select property1, property3, avg(property2), count(*) from B group by property1, property3
我们如何才能做到这一点?我们正在尝试使用提供 KeyValueGroupedDataSet 与 agg
How can we do this in spark? we are trying to use the groupByKey that gives a KeyValueGroupedDataSet together with agg but unable to get it to work. Can't figure out how to use agg
推荐答案
如果您有一个名为 ds_c
的C类型数据集,则可以这样做(使用 groupBy.agg
):
If you have a dataset of type C called ds_c
, then you can do (use groupBy.agg
):
ds_c.groupBy("property1", "property3").agg(count($"property2").as("count"),
avg($"property2").as("mean"))
这篇关于如何使用Spark KeyValueGroupedDataset的agg方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文