使用 Scala 以毫秒为单位的 Spark 2.0 时间戳差异 [英] Spark 2.0 Timestamp Difference in Milliseconds using Scala

查看:47
本文介绍了使用 Scala 以毫秒为单位的 Spark 2.0 时间戳差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Spark 2.0,并正在寻找一种在 Scala 中实现以下目标的方法:

I am using Spark 2.0 and looking for a way to achieve the following in Scala:

需要两个 Data-frame 列值之间的时间戳差异(以毫秒为单位).

Need the time-stamp difference in milliseconds between two Data-frame column values.

Value_1 = 06/13/2017 16:44:20.044
Value_2 = 06/13/2017 16:44:21.067

两者的数据类型都是时间戳.

Data-types for both is timestamp.

注意:将函数 unix_timestamp(Column s) 应用于两个值和减法工作,但不能达到要求的毫秒值.

Note:Applying the function unix_timestamp(Column s) on both values and subtracting works but not upto the milliseconds value which is the requirement.

最终查询如下所示:

Select **timestamp_diff**(Value_2,Value_1) from table1

这应该返回以下输出:

1023 毫秒

其中 timestamp_diff 是计算毫秒差异的函数.

where timestamp_diff is the function that would calculate the difference in milliseconds.

推荐答案

一种方法是使用 Unix 纪元时间,即自 1970 年 1 月 1 日以来的毫秒数.以下是使用 UDF 的示例,它需要两个时间戳,并以毫秒为单位返回它们之间的差异.

One way would be to use Unix epoch time, the number of milliseconds since 1 January 1970. Below is an example using an UDF, it takes two timestamps and returns the difference between them in milliseconds.

val timestamp_diff = udf((startTime: Timestamp, endTime: Timestamp) => {
  (startTime.getTime() - endTime.getTime())
})

val df = // dataframe with two timestamp columns (col1 and col2)
  .withColumn("diff", timestamp_diff(col("col2"), col("col1")))

或者,您可以注册函数以与 SQL 命令一起使用:

Alternatively, you can register the function to use with SQL commands:

val timestamp_diff = (startTime: Timestamp, endTime: Timestamp) => {
  (startTime.getTime() - endTime.getTime())
}

spark.sqlContext.udf.register("timestamp_diff", timestamp_diff)
df.createOrReplaceTempView("table1")

val df2 = spark.sqlContext.sql("SELECT *, timestamp_diff(col2, col1) as diff from table1")

这篇关于使用 Scala 以毫秒为单位的 Spark 2.0 时间戳差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆