在 pyspark 中动态添加时间到时间戳 [英] Adding hours to timestamp in pyspark dynamically

查看:112
本文介绍了在 pyspark 中动态添加时间到时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import pyspark.sql.functions as F从日期时间导入日期时间数据 = [(1, datetime(2017, 3, 12, 3, 19, 58), 'Raising',2),(2, datetime(2017, 3, 12, 3, 21, 30), 'sleeping',1),(3, datetime(2017, 3, 12, 3, 29, 40), 'walking',3),(4, datetime(2017, 3, 12, 3, 31, 23), 'talking',5),(5, datetime(2017, 3, 12, 4, 19, 47), '吃',6),(6, datetime(2017, 3, 12, 4, 33, 51), 'working',7),]df.show()|身份证|测试时间|测试名称|班次||1|2017-03-12 03:19:58|提高|2||2|2017-03-12 03:21:30|睡觉|1||3|2017-03-12 03:29:40|走路|3||4|2017-03-12 03:31:23|说话|5||5|2017-03-12 04:19:47|吃|6||6|2017-03-12 04:33:51|工作|7|

现在我想将班次(小时)添加到测试时间.有人能帮我快速解决吗?

解决方案

您可以使用如下所示的内容.您需要将 shift 字段转换为秒,所以我将其乘以 3600

<预><代码>>>>df.withColumn("testing_time", (F.unix_timestamp("testing_time") + F.col("shift")*3600).cast('timestamp')).show()+---+-------------------+---------+-----+|身份证|测试时间|测试名称|班次|+---+-------------------+---------+-----+|1|2017-03-12 05:19:58|提高|2||2|2017-03-12 04:21:30|睡觉|1||3|2017-03-12 06:29:40|走路|3||4|2017-03-12 08:31:23|说话|5||5|2017-03-12 10:19:47|吃|6||6|2017-03-12 11:33:51|工作|7|+---+-------------------+---------+-----+

import pyspark.sql.functions as F
from datetime import datetime

data = [
  (1, datetime(2017, 3, 12, 3, 19, 58), 'Raising',2),
  (2, datetime(2017, 3, 12, 3, 21, 30), 'sleeping',1),
  (3, datetime(2017, 3, 12, 3, 29, 40), 'walking',3),
  (4, datetime(2017, 3, 12, 3, 31, 23), 'talking',5),
  (5, datetime(2017, 3, 12, 4, 19, 47), 'eating',6),
  (6, datetime(2017, 3, 12, 4, 33, 51), 'working',7),
]
df.show()

| id|       testing_time|test_name|shift|
|  1|2017-03-12 03:19:58|  Raising|    2|
|  2|2017-03-12 03:21:30| sleeping|    1|
|  3|2017-03-12 03:29:40|  walking|    3|
|  4|2017-03-12 03:31:23|  talking|    5|
|  5|2017-03-12 04:19:47|   eating|    6|
|  6|2017-03-12 04:33:51|  working|    7|

Now I want to add shift (hours) to the testing time. Can anybody help me out with a quick solution?

解决方案

You can use something like below. You need to convert shift field to seconds so I multiply it with 3600

>>> df.withColumn("testing_time", (F.unix_timestamp("testing_time") + F.col("shift")*3600).cast('timestamp')).show()
+---+-------------------+---------+-----+
| id|       testing_time|test_name|shift|
+---+-------------------+---------+-----+
|  1|2017-03-12 05:19:58|  Raising|    2|
|  2|2017-03-12 04:21:30| sleeping|    1|
|  3|2017-03-12 06:29:40|  walking|    3|
|  4|2017-03-12 08:31:23|  talking|    5|
|  5|2017-03-12 10:19:47|   eating|    6|
|  6|2017-03-12 11:33:51|  working|    7|
+---+-------------------+---------+-----+

这篇关于在 pyspark 中动态添加时间到时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆