Spark Strutured Streaming自动将时间戳转换为本地时间 [英] Spark Strutured Streaming automatically converts timestamp to local time

查看:864
本文介绍了Spark Strutured Streaming自动将时间戳转换为本地时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的UTC和ISO8601的时间戳,但使用结构化流,它会自动转换为本地时间。有没有办法阻止这种转换?我想在UTC中使用它。

I have my timestamp in UTC and ISO8601, but using Structured Streaming, it gets automatically converted into the local time. Is there a way to stop this conversion? I would like to have it in UTC.

我正在从Kafka读取json数据,然后使用 from_json Spark函数。

I'm reading json data from Kafka and then parsing them using the from_json Spark function.

输入:

{"Timestamp":"2015-01-01T00:00:06.222Z"}

流程:

SparkSession
  .builder()
  .master("local[*]")
  .appName("my-app")
  .getOrCreate()
  .readStream()
  .format("kafka")
  ... //some magic
  .writeStream()
  .format("console")
  .start()
  .awaitTermination();

架构:

StructType schema = DataTypes.createStructType(new StructField[] {
        DataTypes.createStructField("Timestamp", DataTypes.TimestampType, true),});

输出:

+--------------------+
|           Timestamp|
+--------------------+
|2015-01-01 01:00:...|
|2015-01-01 01:00:...|
+--------------------+

如你所见,小时数自动增加。

As you can see, the hour has incremented by itself.

PS:我试图尝试 from_utc_timestamp Spark函数,但没有运气。

PS: I tried to experiment with the from_utc_timestamp Spark function, but no luck.

推荐答案

对我而言,它可以使用:

For me it worked to use:

spark.conf.set("spark.sql.session.timeZone", "UTC")

它告诉spark SQL使用UTC作为时间戳的默认时区。我在spark SQL中使用它例如:

It tells the spark SQL to use UTC as a default timezone for timestamps. I used it in spark SQL for example:

select *, cast('2017-01-01 10:10:10' as timestamp) from someTable

我知道它在2.0.1中不起作用。但适用于Spark 2.2。我也在 SQLTransformer 中使用过它。

I know it does not work in 2.0.1. but works in Spark 2.2. I used in SQLTransformer also and it worked.

我不确定流媒体。

这篇关于Spark Strutured Streaming自动将时间戳转换为本地时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆