在 Spark 数据帧中将时间戳转换为日期 [英] Convert timestamp to date in Spark dataframe

查看:228
本文介绍了在 Spark 数据帧中将时间戳转换为日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我见过(这里:如何转换时间戳到 DataFrame 中的日期格式?)在日期类型中转换时间戳的方法,但是,至少对我来说,它不起作用.

I've seen (here: How to convert Timestamp to Date format in DataFrame?) the way to convert a timestamp in datetype, but,at least for me, it doesn't work.

这是我尝试过的:

# Create dataframe
df_test = spark.createDataFrame([('20170809',), ('20171007',)], ['date',])

# Convert to timestamp
df_test2 = df_test.withColumn('timestamp',func.when((df_test.date.isNull() | (df_test.date == '')) , '0')\
.otherwise(func.unix_timestamp(df_test.date,'yyyyMMdd')))\

# Convert timestamp to date again
df_test2.withColumn('date_again', df_test2['timestamp'].cast(stypes.DateType())).show()

但这会在 date_again 列中返回 null:

But this returns null in the column date_again:

+--------+----------+----------+
|    date| timestamp|date_again|
+--------+----------+----------+
|20170809|1502229600|      null|
|20171007|1507327200|      null|
+--------+----------+----------+

知道什么失败了吗?

推荐答案

以下内容:

func.when((df_test.date.isNull() | (df_test.date == '')) , '0')\
  .otherwise(func.unix_timestamp(df_test.date,'yyyyMMdd'))

不起作用,因为它的类型不一致 - 第一个子句返回 string 而第二个子句返回 bigint.因此,如果 dataNOT NULL 且不为空,它将始终返回 NULL.

doesn't work because it is type inconsistent - the first clause returns string while the second clause returns bigint. As a result it will always return NULL if data is NOT NULL and not empty.

它也已过时 - SQL 函数是 NULL 并且格式错误是安全的.无需额外检查.

It is also obsolete - SQL functions are NULL and malformed format safe. There is no need for additional checks.

In [1]: spark.sql("SELECT unix_timestamp(NULL, 'yyyyMMdd')").show()
+----------------------------------------------+
|unix_timestamp(CAST(NULL AS STRING), yyyyMMdd)|
+----------------------------------------------+
|                                          null|
+----------------------------------------------+


In [2]: spark.sql("SELECT unix_timestamp('', 'yyyyMMdd')").show()
+--------------------------+
|unix_timestamp(, yyyyMMdd)|
+--------------------------+
|                      null|
+--------------------------+

并且在 Spark 2.2 或更高版本中您不需要中间步骤:

And you don't need intermediate step in Spark 2.2 or later:

from pyspark.sql.functions import to_date

to_date("date", "yyyyMMdd")

这篇关于在 Spark 数据帧中将时间戳转换为日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆