如何计算pyspark中的日期差异? [英] How to calculate date difference in pyspark?

查看:289
本文介绍了如何计算pyspark中的日期差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的数据:

df = sqlContext.createDataFrame([
    ('1986/10/15', 'z', 'null'), 
    ('1986/10/15', 'z', 'null'),
    ('1986/10/15', 'c', 'null'),
    ('1986/10/15', 'null', 'null'),
    ('1986/10/16', 'null', '4.0')],
    ('low', 'high', 'normal'))

我想计算low列和2017-05-02之间的日期差,并用差值替换low列.我已经尝试过有关stackoverflow的相关解决方案,但是它们都不起作用.

I want to calculate the date difference between low column and 2017-05-02 and replace low column with the difference. I've tried related solutions on stackoverflow but neither of them works.

推荐答案

您需要将列low强制转换为课程日期,然后才能将datediff()lit()结合使用.使用 Spark 2.2 :

You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2:

from pyspark.sql.functions import datediff, to_date, lit

df.withColumn("test", 
              datediff(to_date(lit("2017-05-02")),
                       to_date("low","yyyy/MM/dd"))).show()
+----------+----+------+-----+
|       low|high|normal| test|
+----------+----+------+-----+
|1986/10/15|   z|  null|11157|
|1986/10/15|   z|  null|11157|
|1986/10/15|   c|  null|11157|
|1986/10/15|null|  null|11157|
|1986/10/16|null|   4.0|11156|
+----------+----+------+-----+

使用< Spark 2.2 ,我们需要先将low列转换为类timestamp:

Using < Spark 2.2, we need to convert the the low column to class timestamp first:

from pyspark.sql.functions import datediff, to_date, lit, unix_timestamp

df.withColumn("test", 
              datediff(to_date(lit("2017-05-02")),
                       to_date(unix_timestamp('low', "yyyy/MM/dd").cast("timestamp")))).show()

这篇关于如何计算pyspark中的日期差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆