Spark Scala数据框具有单个Group By的多个聚合 [英] Spark Scala Data Frame to have multiple aggregation of single Group By
本文介绍了Spark Scala数据框具有单个Group By的多个聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
Spark Scala数据帧具有单个group by的多个聚合.例如
Spark Scala Data Frame to have multiple aggregation of single group by. eg
val groupped = df.groupBy("firstName", "lastName").sum("Amount").toDF()
但是如果我需要计数,总和,最大值等
But What if I need Count, Sum, Max etc
/* Below Does Not Work , but this is what the intention is
val groupped = df.groupBy("firstName", "lastName").sum("Amount").count().toDF()
*/
输出 groupped.show()
--------------------------------------------------
| firstName | lastName| Amount|count | Max | Min |
--------------------------------------------------
推荐答案
case class soExample(firstName: String, lastName: String, Amount: Int)
val df = Seq(soExample("me", "zack", 100)).toDF
import org.apache.spark.sql.functions._
val groupped = df.groupBy("firstName", "lastName").agg(
sum("Amount"),
mean("Amount"),
stddev("Amount"),
count(lit(1)).alias("numOfRecords")
).toDF()
display(groupped)
这篇关于Spark Scala数据框具有单个Group By的多个聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文