使用 udf 在 PySpark 数据框中将纪元转换为日期时间 [英] Converting epoch to datetime in PySpark data frame using udf

查看：32 发布时间：2021/11/14 21:53:35 python apache-spark pyspark apache-spark-sql

本文介绍了使用 udf 在 PySpark 数据框中将纪元转换为日期时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个带有此架构的 PySpark 数据框:

I have a PySpark dataframe with this schema:

root
 |-- epoch: double (nullable = true)
 |-- var1: double (nullable = true)
 |-- var2: double (nullable = true)

其中纪元以秒为单位，应转换为日期时间.为此，我定义了一个用户定义函数 (udf)，如下所示:

Where epoch is in seconds and should be converted to date time. In order to do so, I define a user defined function (udf) as follows:

from pyspark.sql.functions import udf    
import time
def epoch_to_datetime(x):
    return time.localtime(x)
    # return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(x))
    # return x * 0 + 1

epoch_to_datetime_udf = udf(epoch_to_datetime, DoubleType())
df.withColumn("datetime", epoch_to_datetime(df2.epoch)).show()

我收到此错误:

---> 21     return time.localtime(x)
    22     # return x * 0 + 1
    23 
    TypeError: a float is required

如果我只是在函数中返回 x + 1 ，它就可以工作.在 time.localtime(x) 中尝试 float(x) 或 float(str(x)) 或 numpy.float(x)) 没有帮助，我仍然收到错误消息.在 udf 之外，time.localtime(1.514687216E9) 或其他数字工作正常.使用 datetime 包将 epoch 转换为 datetim 会导致类似的错误.

If I simply return x + 1 in the function, it works. Trying float(x) or float(str(x)) or numpy.float(x) in time.localtime(x) does not help and I still get an error. Outside of udf, time.localtime(1.514687216E9) or other numbers works fine. Using datetime package to convert epoch to datetim results in similar errors.

time 和 datetime 包似乎不喜欢使用来自 PySpark 的 DoubleType.有什么想法可以解决这个问题吗?谢谢.

It seems that time and datetime packages do not like to fed with DoubleType from PySpark. Any ideas how I can solve this issue? Thanks.

推荐答案

您不需要 udf 函数

You don't need a udf function for that

您只需要将双纪元列转换为timestampType()，然后使用data_format 函数，如下所示

All you need is to cast the double epoch column to timestampType() and then use data_format function as below

from pyspark.sql import functions as f
from pyspark.sql import types as t
df.withColumn('epoch', f.date_format(df.epoch.cast(dataType=t.TimestampType()), "yyyy-MM-dd"))

这会给你一个字符串日期

this will give you a string date

root
 |-- epoch: string (nullable = true)
 |-- var1: double (nullable = true)
 |-- var2: double (nullable = true)

你可以使用to_date函数如下

from pyspark.sql import functions as f
from pyspark.sql import types as t
df.withColumn('epoch', f.to_date(df.epoch.cast(dataType=t.TimestampType())))

这会给你 date 作为 datatype 到 epoch 列

which would give you date as datatype to epoch column

root
 |-- epoch: date (nullable = true)
 |-- var1: double (nullable = true)
 |-- var2: double (nullable = true)

希望回答对你有帮助

这篇关于使用 udf 在 PySpark 数据框中将纪元转换为日期时间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 udf 在 PySpark 数据框中将纪元转换为日期时间 [英] Converting epoch to datetime in PySpark data frame using udf

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用 udf 在 PySpark 数据框中将纪元转换为日期时间 [英] Converting epoch to datetime in PySpark data frame using udf

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭