Spark 的最大值和最小值 [英] Max and Min of Spark
问题描述
我是 Spark 新手,我对 SparkSQL 中的聚合函数 MAX
和 MIN
有一些疑问
I am new to Spark and I have some questions about the aggregation function MAX
and MIN
in SparkSQL
在 SparkSQL 中,当我使用 MAX
/MIN
函数时仅 MAX(value)
/MIN(value)代码> 被返回.但是如果我还想要其他相应的列呢?
In SparkSQL, when I use the MAX
/ MIN
function only MAX(value)
/ MIN(value)
is returned.
But How about if I also want other corresponding column?
例如给定一个包含 time
、value
和 label
列的数据框,如何使用 获取
按time
MIN(Value)label
?
For e.g. Given a dataframe with columns time
, value
and label
, how can I get the time
with the MIN(Value)
grouped by label
?
谢谢.
推荐答案
你需要先做一个groupBy
,然后join
那回到原来的<代码>数据帧代码>.在 Scala 中,它看起来像这样:
You need to do a first do a groupBy
, and then join
that back to the original DataFrame
. In Scala, it looks like this:
df.join(
df.groupBy($"label").agg(min($"value") as "min_value").withColumnRenamed("label", "min_label"),
$"min_label" === $"label" && $"min_value" === $"value"
).drop("min_label").drop("min_value").show
我不使用 Python,但它看起来与上面的很接近.
I don't use Python, but it would look close to the above.
你甚至可以一次完成max()
和min()
:
You can even do max()
and min()
in one pass:
df.join(
df.groupBy($"label")
.agg(min($"value") as "min_value", max($"value") as "max_value")
.withColumnRenamed("label", "r_label"),
$"r_label" === $"label" && ($"min_value" === $"value" || $"max_value" === $"value")
).drop("r_label")
这篇关于Spark 的最大值和最小值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!