Pyspark:从Datetime值中提取日期 [英] Pyspark: Extract date from Datetime value
问题描述
我试图弄清楚如何使用Pyspark sql从日期时间值中提取日期。
日期时间值如下:
DateTime
2018-05-21T00:00:00.000-04:00
2016-02-22T02:00:02.234 -06:00
当我现在将其加载到spark数据框中并尝试提取日期时(通过
Date()或
Timestamp()然后是Date()
我总是会得到一个错误,期望有日期或时间戳值,但是提供了DateTime值。
有人可以帮助我从此值中检索日期吗?我想,您需要为此提供一个时区-但由于我仅提取日期时遇到了问题,因此我首先想解决这个问题。
谢谢您的好意。
Pyspark的 to_date
函数从时间戳中提取日期。在您的示例中,您可以创建一个通过执行以下操作仅显示日期的ew列:
df = df.withColumn( date_only,func.to_date(func .col( DateTime)))
如果您要转换的列是字符串,可以设置 to_date
的 format
参数,指定字符串的日期时间格式。
您可以在文档
I am trying to figure out, how to extract a date from a datetime value using Pyspark sql.
The datetime values look like this:
DateTime
2018-05-21T00:00:00.000-04:00
2016-02-22T02:00:02.234-06:00
When I now load this into a spark dataframe and try to extract the date (via
Date() or
Timestamp() and then Date()
I always get the error, that a date or timestamp value is expected, but a DateTime value was provided.
Can someone help me with retrieving the date from this value? I think, you need to provide a timezone for that - but since I already had problems extracting only the date, I first wanted to solve this.
Thank you and kind regards.
Pyspark has a to_date
function to extract the date from a timestamp. In your example you could create a new column with just the date by doing the following:
df = df.withColumn("date_only", func.to_date(func.col("DateTime")))
If the column you are trying to convert is a string you can set the format
parameter of to_date
specifying the datetime format of the string.
You can read more about to_date
in the documentation here.
这篇关于Pyspark:从Datetime值中提取日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!