使用udf在PySpark数据框中将纪元转换为日期时间 [英] Converting epoch to datetime in PySpark data frame using udf

查看：95 发布时间：2020/9/4 3:35:27 python apache-spark pyspark apache-spark-sql

本文介绍了使用udf在PySpark数据框中将纪元转换为日期时间的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个具有以下架构的PySpark数据框:

I have a PySpark dataframe with this schema:

root
 |-- epoch: double (nullable = true)
 |-- var1: double (nullable = true)
 |-- var2: double (nullable = true)

以秒为单位的纪元，应将其转换为日期时间.为此，我定义了一个用户定义的函数(udf)，如下所示:

Where epoch is in seconds and should be converted to date time. In order to do so, I define a user defined function (udf) as follows:

from pyspark.sql.functions import udf    
import time
def epoch_to_datetime(x):
    return time.localtime(x)
    # return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(x))
    # return x * 0 + 1

epoch_to_datetime_udf = udf(epoch_to_datetime, DoubleType())
df.withColumn("datetime", epoch_to_datetime(df2.epoch)).show()

我收到此错误:

---> 21     return time.localtime(x)
    22     # return x * 0 + 1
    23 
    TypeError: a float is required

如果我只在函数中返回x + 1，它就可以工作.在time.localtime(x)中尝试float(x)或float(str(x))或numpy.float(x)并没有帮助，但仍然出现错误.在udf，time.localtime(1.514687216E9)或其他数字之外可以正常工作.使用datetime包将epoch转换为datetim会导致类似错误.

If I simply return x + 1 in the function, it works. Trying float(x) or float(str(x)) or numpy.float(x) in time.localtime(x) does not help and I still get an error. Outside of udf, time.localtime(1.514687216E9) or other numbers works fine. Using datetime package to convert epoch to datetim results in similar errors.

似乎time和datetime软件包不喜欢从PySpark使用DoubleType进行填充.有什么想法可以解决这个问题吗?谢谢.

It seems that time and datetime packages do not like to fed with DoubleType from PySpark. Any ideas how I can solve this issue? Thanks.

推荐答案

您不需要udf函数

You don't need a udf function for that

您需要做的是将双历元列投射到timestampType() ，然后使用如下所示的data_format函数

All you need is to cast the double epoch column to timestampType() and then use data_format function as below

from pyspark.sql import functions as f
from pyspark.sql import types as t
df.withColumn('epoch', f.date_format(df.epoch.cast(dataType=t.TimestampType()), "yyyy-MM-dd"))

这将为您提供一个字符串日期

this will give you a string date

root
 |-- epoch: string (nullable = true)
 |-- var1: double (nullable = true)
 |-- var2: double (nullable = true)

并且您可以使用to_date函数，如下所示

And you can use to_date function as following

from pyspark.sql import functions as f
from pyspark.sql import types as t
df.withColumn('epoch', f.to_date(df.epoch.cast(dataType=t.TimestampType())))

这将为datatype赋予date到epoch列

root
 |-- epoch: date (nullable = true)
 |-- var1: double (nullable = true)
 |-- var2: double (nullable = true)

我希望答案会有所帮助

这篇关于使用udf在PySpark数据框中将纪元转换为日期时间的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用udf在PySpark数据框中将纪元转换为日期时间 [英] Converting epoch to datetime in PySpark data frame using udf

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用udf在PySpark数据框中将纪元转换为日期时间 [英] Converting epoch to datetime in PySpark data frame using udf

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭