Spark Scala Data Frame 具有单个 Group By 的多个聚合 [英] Spark Scala Data Frame to have multiple aggregation of single Group By
本文介绍了Spark Scala Data Frame 具有单个 Group By 的多个聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Spark Scala Data Frame 具有单个分组的多个聚合.例如
Spark Scala Data Frame to have multiple aggregation of single group by. eg
val groupped = df.groupBy("firstName", "lastName").sum("Amount").toDF()
但是如果我需要计数、总和、最大值等怎么办
But What if I need Count, Sum, Max etc
/* Below Does Not Work , but this is what the intention is
val groupped = df.groupBy("firstName", "lastName").sum("Amount").count().toDF()
*/
输出groupped.show()
--------------------------------------------------
| firstName | lastName| Amount|count | Max | Min |
--------------------------------------------------
推荐答案
case class soExample(firstName: String, lastName: String, Amount: Int)
val df = Seq(soExample("me", "zack", 100)).toDF
import org.apache.spark.sql.functions._
val groupped = df.groupBy("firstName", "lastName").agg(
sum("Amount"),
mean("Amount"),
stddev("Amount"),
count(lit(1)).alias("numOfRecords")
).toDF()
display(groupped)
这篇关于Spark Scala Data Frame 具有单个 Group By 的多个聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文