通过 Spark groupBy 数据帧查找时间戳的最小值 [英] Find minimum for a timestamp through Spark groupBy dataframe

查看：36 发布时间：2021/11/14 21:58:47 sql scala apache-spark apache-spark-sql

本文介绍了通过 Spark groupBy 数据帧查找时间戳的最小值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我尝试将我的数据框分组到一列然后尝试找到每个分组的最小值 groupbyDatafram.min('timestampCol') 似乎我无法在非数字列上执行此操作.那么如何正确过滤 groupby 上的最小(最早)日期?

When I try to group my dataframe on a column then try to find the minimum for each grouping groupbyDatafram.min('timestampCol') it appears I cannot do it on non numerical columns. Then how can I properly filter the minimum (earliest) date on the groupby?

我正在从 postgresql S3 实例流式传输数据帧，因此数据已经配置.

I am streaming the dataframe from a postgresql S3 instance, so that data is already configured.

推荐答案

直接执行聚合，而不是使用 min helper:

Just perform aggregation directly instead of using min helper:

import org.apache.spark.sql.functions.min

val sqlContext: SQLContext = ???

import sqlContext.implicits._

val df = Seq((1L, "2016-04-05 15:10:00"), (1L, "2014-01-01 15:10:00"))
  .toDF("id", "ts")
  .withColumn("ts", $"ts".cast("timestamp"))

df.groupBy($"id").agg(min($"ts")).show

// +---+--------------------+
// | id|             min(ts)|
// +---+--------------------+
// |  1|2014-01-01 15:10:...|
// +---+--------------------+

与 min 不同，它适用于任何 Orderable 类型.

Unlike min it will work on any Orderable type.

这篇关于通过 Spark groupBy 数据帧查找时间戳的最小值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过 Spark groupBy 数据帧查找时间戳的最小值 [英] Find minimum for a timestamp through Spark groupBy dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过 Spark groupBy 数据帧查找时间戳的最小值 [英] Find minimum for a timestamp through Spark groupBy dataframe

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭