使用 Scala 以毫秒为单位的 Spark 2.0 时间戳差异 [英] Spark 2.0 Timestamp Difference in Milliseconds using Scala
问题描述
我正在使用 Spark 2.0,并正在寻找一种在 Scala 中实现以下目标的方法:
I am using Spark 2.0 and looking for a way to achieve the following in Scala:
需要两个 Data-frame 列值之间的时间戳差异(以毫秒为单位).
Need the time-stamp difference in milliseconds between two Data-frame column values.
Value_1 = 06/13/2017 16:44:20.044
Value_2 = 06/13/2017 16:44:21.067
两者的数据类型都是时间戳.
Data-types for both is timestamp.
注意:将函数 unix_timestamp(Column s) 应用于两个值和减法工作,但不能达到要求的毫秒值.
Note:Applying the function unix_timestamp(Column s) on both values and subtracting works but not upto the milliseconds value which is the requirement.
最终查询如下所示:
Select **timestamp_diff**(Value_2,Value_1) from table1
这应该返回以下输出:
1023 毫秒
其中 timestamp_diff
是计算毫秒差异的函数.
where timestamp_diff
is the function that would calculate the difference in milliseconds.
推荐答案
一种方法是使用 Unix 纪元时间,即自 1970 年 1 月 1 日以来的毫秒数.以下是使用 UDF
的示例,它需要两个时间戳,并以毫秒为单位返回它们之间的差异.
One way would be to use Unix epoch time, the number of milliseconds since 1 January 1970. Below is an example using an UDF
, it takes two timestamps and returns the difference between them in milliseconds.
val timestamp_diff = udf((startTime: Timestamp, endTime: Timestamp) => {
(startTime.getTime() - endTime.getTime())
})
val df = // dataframe with two timestamp columns (col1 and col2)
.withColumn("diff", timestamp_diff(col("col2"), col("col1")))
或者,您可以注册函数以与 SQL 命令一起使用:
Alternatively, you can register the function to use with SQL commands:
val timestamp_diff = (startTime: Timestamp, endTime: Timestamp) => {
(startTime.getTime() - endTime.getTime())
}
spark.sqlContext.udf.register("timestamp_diff", timestamp_diff)
df.createOrReplaceTempView("table1")
val df2 = spark.sqlContext.sql("SELECT *, timestamp_diff(col2, col1) as diff from table1")
这篇关于使用 Scala 以毫秒为单位的 Spark 2.0 时间戳差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!