如何最大化值并保留所有列(每个组的最大记录数)? [英] How to max value and keep all columns (for max records per group)?
本文介绍了如何最大化值并保留所有列(每个组的最大记录数)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给出以下数据框:
+----+-----+---+-----+
| uid| k| v|count|
+----+-----+---+-----+
| a|pref1| b| 168|
| a|pref3| h| 168|
| a|pref3| t| 63|
| a|pref3| k| 84|
| a|pref1| e| 84|
| a|pref2| z| 105|
+----+-----+---+-----+
如何从uid
,k
中获取最大值但包含v
?
How can I get the max value from uid
, k
but include v
?
+----+-----+---+----------+
| uid| k| v|max(count)|
+----+-----+---+----------+
| a|pref1| b| 168|
| a|pref3| h| 168|
| a|pref2| z| 105|
+----+-----+---+----------+
我可以做这样的事情,但是它将删除"v"列:
I can do something like this but it will drop the column "v" :
df.groupBy("uid", "k").max("count")
推荐答案
这是窗口运算符(使用over
函数)或join
的完美示例.
It's the perfect example for window operators (using over
function) or join
.
由于您已经了解了如何使用Windows,因此我仅关注join
.
Since you've already figured out how to use windows, I focus on join
exclusively.
scala> val inventory = Seq(
| ("a", "pref1", "b", 168),
| ("a", "pref3", "h", 168),
| ("a", "pref3", "t", 63)).toDF("uid", "k", "v", "count")
inventory: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 2 more fields]
scala> val maxCount = inventory.groupBy("uid", "k").max("count")
maxCount: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 1 more field]
scala> maxCount.show
+---+-----+----------+
|uid| k|max(count)|
+---+-----+----------+
| a|pref3| 168|
| a|pref1| 168|
+---+-----+----------+
scala> val maxCount = inventory.groupBy("uid", "k").agg(max("count") as "max")
maxCount: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 1 more field]
scala> maxCount.show
+---+-----+---+
|uid| k|max|
+---+-----+---+
| a|pref3|168|
| a|pref1|168|
+---+-----+---+
scala> maxCount.join(inventory, Seq("uid", "k")).where($"max" === $"count").show
+---+-----+---+---+-----+
|uid| k|max| v|count|
+---+-----+---+---+-----+
| a|pref3|168| h| 168|
| a|pref1|168| b| 168|
+---+-----+---+---+-----+
这篇关于如何最大化值并保留所有列(每个组的最大记录数)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文