如何最大化并保留所有列(对于每组最大记录)? [英] How to max value and keep all columns (for max records per group)?
本文介绍了如何最大化并保留所有列(对于每组最大记录)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给定以下数据帧:
+----+-----+---+-----+
| uid| k| v|count|
+----+-----+---+-----+
| a|pref1| b| 168|
| a|pref3| h| 168|
| a|pref3| t| 63|
| a|pref3| k| 84|
| a|pref1| e| 84|
| a|pref2| z| 105|
+----+-----+---+-----+
如何从 uid
、k
获取最大值但包含 v
?
How can I get the max value from uid
, k
but include v
?
+----+-----+---+----------+
| uid| k| v|max(count)|
+----+-----+---+----------+
| a|pref1| b| 168|
| a|pref3| h| 168|
| a|pref2| z| 105|
+----+-----+---+----------+
我可以做这样的事情,但它会删除列v":
I can do something like this but it will drop the column "v" :
df.groupBy("uid", "k").max("count")
推荐答案
这是窗口操作符(使用 over
函数)或 join
的完美示例.
It's the perfect example for window operators (using over
function) or join
.
既然你已经知道如何使用 Windows,我只专注于 join
.
Since you've already figured out how to use windows, I focus on join
exclusively.
scala> val inventory = Seq(
| ("a", "pref1", "b", 168),
| ("a", "pref3", "h", 168),
| ("a", "pref3", "t", 63)).toDF("uid", "k", "v", "count")
inventory: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 2 more fields]
scala> val maxCount = inventory.groupBy("uid", "k").max("count")
maxCount: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 1 more field]
scala> maxCount.show
+---+-----+----------+
|uid| k|max(count)|
+---+-----+----------+
| a|pref3| 168|
| a|pref1| 168|
+---+-----+----------+
scala> val maxCount = inventory.groupBy("uid", "k").agg(max("count") as "max")
maxCount: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 1 more field]
scala> maxCount.show
+---+-----+---+
|uid| k|max|
+---+-----+---+
| a|pref3|168|
| a|pref1|168|
+---+-----+---+
scala> maxCount.join(inventory, Seq("uid", "k")).where($"max" === $"count").show
+---+-----+---+---+-----+
|uid| k|max| v|count|
+---+-----+---+---+-----+
| a|pref3|168| h| 168|
| a|pref1|168| b| 168|
+---+-----+---+---+-----+
这篇关于如何最大化并保留所有列(对于每组最大记录)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文