最大和最小火花 [英] Max and Min of Spark
问题描述
我是Spark的新手,对SparkSQL中的聚合函数MAX
和MIN
有一些疑问
I am new to Spark and I have some questions about the aggregation function MAX
and MIN
in SparkSQL
在SparkSQL中,当我使用MAX
/MIN
函数时,仅返回MAX(value)
/MIN(value)
.
但是,如果我还想要其他对应的列怎么办?
In SparkSQL, when I use the MAX
/ MIN
function only MAX(value)
/ MIN(value)
is returned.
But How about if I also want other corresponding column?
例如给定一个具有time
,value
和label
列的数据框,我怎么能得到time
与label
分组的MIN(Value)
?
For e.g. Given a dataframe with columns time
, value
and label
, how can I get the time
with the MIN(Value)
grouped by label
?
谢谢.
推荐答案
您需要先执行groupBy
,然后再执行join
,将其还原回原始的DataFrame
.在Scala中,它看起来像这样:
You need to do a first do a groupBy
, and then join
that back to the original DataFrame
. In Scala, it looks like this:
df.join(
df.groupBy($"label").agg(min($"value") as "min_value").withColumnRenamed("label", "min_label"),
$"min_label" === $"label" && $"min_value" === $"value"
).drop("min_label").drop("min_value").show
我不使用Python,但是看起来与上面的很接近.
I don't use Python, but it would look close to the above.
您甚至可以一次完成max()
和min()
:
You can even do max()
and min()
in one pass:
df.join(
df.groupBy($"label")
.agg(min($"value") as "min_value", max($"value") as "max_value")
.withColumnRenamed("label", "r_label"),
$"r_label" === $"label" && ($"min_value" === $"value" || $"max_value" === $"value")
).drop("r_label")
这篇关于最大和最小火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!