使用Scala的Spark 2.0时间戳差异(以毫秒为单位) [英] Spark 2.0 Timestamp Difference in Milliseconds using Scala
问题描述
我正在使用Spark 2.0,并在Scala中寻找一种实现以下目标的方法:
I am using Spark 2.0 and looking for a way to achieve the following in Scala:
两个数据帧列值之间需要时间戳差异(以毫秒为单位).
Need the time-stamp difference in milliseconds between two Data-frame column values.
Value_1 = 06/13/2017 16:44:20.044
Value_2 = 06/13/2017 16:44:21.067
两者的数据类型都是时间戳.
Data-types for both is timestamp.
注意:在两个值上都应用函数 unix_timestamp(Column s)并进行减法运算,但不能达到要求的毫秒值.
Note:Applying the function unix_timestamp(Column s) on both values and subtracting works but not upto the milliseconds value which is the requirement.
最终查询如下:
Select **timestamp_diff**(Value_2,Value_1) from table1
这应该返回以下输出:
1023毫秒
其中timestamp_diff
是用于计算差值(以毫秒为单位)的函数.
where timestamp_diff
is the function that would calculate the difference in milliseconds.
推荐答案
一种方法是使用Unix纪元时间,即1970年1月1日以来的毫秒数.下面是使用UDF
的示例,它需要两个时间戳记并返回它们之间的差(以毫秒为单位).
One way would be to use Unix epoch time, the number of milliseconds since 1 January 1970. Below is an example using an UDF
, it takes two timestamps and returns the difference between them in milliseconds.
val timestamp_diff = udf((startTime: Timestamp, endTime: Timestamp) => {
(startTime.getTime() - endTime.getTime())
})
val df = // dataframe with two timestamp columns (col1 and col2)
.withColumn("diff", timestamp_diff(col("col2"), col("col1")))
或者,您可以注册要与SQL命令一起使用的函数:
Alternatively, you can register the function to use with SQL commands:
val timestamp_diff = (startTime: Timestamp, endTime: Timestamp) => {
(startTime.getTime() - endTime.getTime())
}
spark.sqlContext.udf.register("timestamp_diff", timestamp_diff)
df.createOrReplaceTempView("table1")
val df2 = spark.sqlContext.sql("SELECT *, timestamp_diff(col2, col1) as diff from table1")
这篇关于使用Scala的Spark 2.0时间戳差异(以毫秒为单位)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!