将 12 小时添加到 Spark 中的日期时间列 [英] Adding 12 hours to datetime column in Spark

查看:55
本文介绍了将 12 小时添加到 Spark 中的日期时间列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试搜索了很多,但只能在 Spark SQL 中找到 add_month 函数,因此最终在这里打开了一个新线程.如果有人可以提供任何帮助,我们将不胜感激.

I have tried to search quite a bit, but could only found add_month function in Spark SQL, so ending up opening a new thread here. Would appreciate any help someone could offer.

我正在尝试使用 sqlContext 将 12、24 和 48 小时添加到 Spark SQL 中的日期列.我使用的是 1.6.1 版本的 Spark,我需要这样的东西:

I am trying to add hours 12, 24, and 48 to a date column in Spark SQL using sqlContext. I am using 1.6.1 version of Spark and I need something like this:

SELECT N1.subject_id, '12-HOUR' AS notes_period, N1.chartdate_start, N2.chartdate, N2.text
FROM NOTEEVENTS N2,
(SELECT subject_id, MIN(chartdate) chartdate_start
  FROM NOTEEVENTS
  WHERE subject_id = 283
  AND category != 'Discharge summary'
GROUP BY subject_id) N1
WHERE N2.subject_id = N1.subject_id
and n2.chartdate < n1.chartdate_start + interval '1 hour' * 12

请注意最后一个子句,它是用 PostgreSql 编写的,也是我在 Spark SQL 中所需要的.我真的很感激我能得到的任何帮助.

Please notice the last clause, which is written in PostgreSql, and is what I need in Spark SQL. I'd really appreciate any help I could get.

谢谢.

推荐答案

目前没有这个功能,但是可以写UDF:

Currently there's no such function, but you can write UDF:

sqlContext.udf.register("add_hours", (datetime : Timestamp, hours : Int) => {
    new Timestamp(datetime.getTime() + hours * 60 * 60 * 1000 )
});

例如:

SELECT N1.subject_id, '12-HOUR' AS notes_period, N1.chartdate_start, N2.chartdate, N2.text
    FROM NOTEEVENTS N2,
    (SELECT subject_id, MIN(chartdate) chartdate_start
      FROM NOTEEVENTS
      WHERE subject_id = 283
      AND category != 'Discharge summary'
    GROUP BY subject_id) N1
    WHERE N2.subject_id = N1.subject_id
    and n2.chartdate < add_hours(n1.chartdate_start, 12)

您还可以使用 unix_timestamp 函数来计算新日期.我认为它的可读性较差,但可以使用受 Anton Okolnychyi 其他答案启发的 WholeStage Code Gen. 代码

You can also use unix_timestamp function to calculate new date. It's less readable in my opinion, but can use WholeStage Code Gen. Code inspired by Anton Okolnychyi other answer

import org.apache.spark.sql.functions._
val addMonths = (datetime : Column, hours : Column) => {
     from_unixtime(unix_timestamp(n1.chartdate_start) + 12 * 60 * 60)
}

这篇关于将 12 小时添加到 Spark 中的日期时间列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆