如何使用Spark KeyValueGroupedDataset的agg方法? [英] How to use the agg method of Spark KeyValueGroupedDataset?

查看：54 发布时间：2021/4/8 20:22:05 scala apache-spark

本文介绍了如何使用Spark KeyValueGroupedDataset的agg方法?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有一些类似这样的代码:

We have some code like this:

// think of class A as a table with two columns
case class A(property1: String, property2: Long)

// class B adds a column to class A
case class B(property1: String, property2: Long, property3: String)

df.as[A].map[B](a => {
      val my_udf = // some code here which creates a user defined function
      new B(a.property1, a.property2, my_udf(a))
    })

其中df是一个DataFrame.接下来，我们要创建C类型的数据集

where df is a DataFrame. next we want to create a dataset of type C

// we want to group objects of type B by properties 1 and 3 and compute the average of property2 and also want to store a count
case class C(property1: String, property3: String, average: Long, count: Long)

我们将像这样在sql中创建

which we would have created in sql like this

select property1, property3, avg(property2), count(*) from B group by property1, property3

我们如何才能做到这一点?我们正在尝试使用提供 KeyValueGroupedDataSet 与

How can we do this in spark? we are trying to use the groupByKey that gives a KeyValueGroupedDataSet together with agg but unable to get it to work. Can't figure out how to use agg

推荐答案

如果您有一个名为 ds_c 的C类型数据集，则可以这样做(使用 groupBy.agg ):

If you have a dataset of type C called ds_c, then you can do (use groupBy.agg):

ds_c.groupBy("property1", "property3").agg(count($"property2").as("count"), 
                                           avg($"property2").as("mean"))

这篇关于如何使用Spark KeyValueGroupedDataset的agg方法?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用Spark KeyValueGroupedDataset的agg方法? [英] How to use the agg method of Spark KeyValueGroupedDataset?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Spark KeyValueGroupedDataset的agg方法? [英] How to use the agg method of Spark KeyValueGroupedDataset?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭