最大和最小火花 [英] Max and Min of Spark

查看：78 发布时间：2020/9/4 20:07:06 apache-spark pyspark apache-spark-sql pyspark-sql

本文介绍了最大和最小火花的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是Spark的新手，对SparkSQL中的聚合函数MAX和MIN有一些疑问

I am new to Spark and I have some questions about the aggregation function MAX and MIN in SparkSQL

在SparkSQL中，当我使用MAX/MIN函数时，仅返回MAX(value)/MIN(value). 但是，如果我还想要其他对应的列怎么办?

In SparkSQL, when I use the MAX / MIN function only MAX(value) / MIN(value) is returned. But How about if I also want other corresponding column?

例如给定一个具有time，value和label列的数据框，我怎么能得到time与label分组的MIN(Value)?

For e.g. Given a dataframe with columns time, value and label, how can I get the time with the MIN(Value) grouped by label?

谢谢.

推荐答案

您需要先执行groupBy，然后再执行join，将其还原回原始的DataFrame.在Scala中，它看起来像这样:

You need to do a first do a groupBy, and then join that back to the original DataFrame. In Scala, it looks like this:

df.join(
  df.groupBy($"label").agg(min($"value") as "min_value").withColumnRenamed("label", "min_label"), 
  $"min_label" === $"label" && $"min_value" === $"value"
).drop("min_label").drop("min_value").show

我不使用Python，但是看起来与上面的很接近.

I don't use Python, but it would look close to the above.

您甚至可以一次完成max()和min():

You can even do max() and min() in one pass:

df.join(
  df.groupBy($"label")
    .agg(min($"value") as "min_value", max($"value") as "max_value")
    .withColumnRenamed("label", "r_label"), 
  $"r_label" === $"label" && ($"min_value" === $"value" || $"max_value" === $"value")
).drop("r_label")

这篇关于最大和最小火花的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

最大和最小火花 [英] Max and Min of Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

最大和最小火花 [英] Max and Min of Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭