Spark将毫秒转换为UTC日期时间 [英] Spark convert milliseconds to UTC datetime

查看:60
本文介绍了Spark将毫秒转换为UTC日期时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中 1 列是表示毫秒的 long.我想获取此数字在 UTC 中表示的时间戳 (yyyy-MM-dd HH:mm:ss).基本上我想要与

我的问题是,有没有办法让 Spark 代码将毫秒长字段转换为 UTC 时间戳?我使用本机 Spark 代码所能得到的只是将那么长的时间转换为我的本地时间 (EST):

from pyspark import SparkContext从 pyspark.sql 导入 SQLContext从 pyspark.sql 导入类型为 Tfrom pyspark.sql 导入函数为 Fsc = SparkContext()spark = SQLContext(sc)df = spark.read.json(sc.parallelize([{'millis':1582749601000}]))df.withColumn('as_date', F.from_unixtime((F.col('millis')/1000))).show()+------------+--------------------+|毫厘|as_date|+------------+--------------------+|1582749601000|2020-02-26 15:40:01|+------------+--------------------+

我已经能够通过强制整个 Spark 会话的时区转换为 UTC.不过,我想避免这种情况,因为必须为该作业中的特定用例更改整个 Spark 会话时区是错误的.

spark.sparkSession.builder.master('local[1]').config("spark.sql.session.timeZone", "UTC").getOrCreate()

我还想避免自定义定义的函数,因为我希望能够在 Scala 和 Python 中部署它,而无需在每个函数中编写特定于语言的代码.

解决方案

使用 to_utc_timestamp 指定您的时区(EST).

 from pyspark.sql import 函数为 Fdf.withColumn("as_date", F.to_utc_timestamp(F.from_unixtime(F.col("millis")/1000,'yyyy-MM-dd HH:mm:ss'),'EST')).show()+------------+--------------------+|毫厘|as_date|+------------+--------------------+|1582749601000|2020-02-26 20:40:01|+------------+--------------------+

I've got a dataset where 1 column is a long that represents milliseconds. I want to obtain the timestamp (yyyy-MM-dd HH:mm:ss) that this number represents in UTC. Basically I want the same behaviour as https://currentmillis.com/

My question is, is there a way to have Spark code convert a milliseconds long field to a timestamp in UTC? All I've been able to get with native Spark code is the conversion of that long to my local time (EST):

from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import types as T
from pyspark.sql import functions as F

sc = SparkContext()
spark = SQLContext(sc)

df = spark.read.json(sc.parallelize([{'millis':1582749601000}]))

df.withColumn('as_date', F.from_unixtime((F.col('millis')/1000))).show()

+-------------+-------------------+
|       millis|            as_date|
+-------------+-------------------+
|1582749601000|2020-02-26 15:40:01|
+-------------+-------------------+

I've been able to convert to UTC by forcing the timezone of the whole Spark session. I'd like to avoid this though, because it feels wrong to have to change the whole Spark session timezone for a specific use case within that job.

spark.sparkSession.builder.master('local[1]').config("spark.sql.session.timeZone", "UTC").getOrCreate()

I would also like to avoid custom defined functions as I want to be able to deploy this in Scala and Python, without writing language-specific code in each.

解决方案

Use to_utc_timestamp to specify your timezone(EST).

    from pyspark.sql import functions as F
    df.withColumn("as_date", F.to_utc_timestamp(F.from_unixtime(F.col("millis")/1000,'yyyy-MM-dd HH:mm:ss'),'EST')).show()

    +-------------+-------------------+
    |       millis|            as_date|
    +-------------+-------------------+
    |1582749601000|2020-02-26 20:40:01|
    +-------------+-------------------+

这篇关于Spark将毫秒转换为UTC日期时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆