在 pyspark 中动态添加时间到时间戳 [英] Adding hours to timestamp in pyspark dynamically
本文介绍了在 pyspark 中动态添加时间到时间戳的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
import pyspark.sql.functions as F从日期时间导入日期时间数据 = [(1, datetime(2017, 3, 12, 3, 19, 58), 'Raising',2),(2, datetime(2017, 3, 12, 3, 21, 30), 'sleeping',1),(3, datetime(2017, 3, 12, 3, 29, 40), 'walking',3),(4, datetime(2017, 3, 12, 3, 31, 23), 'talking',5),(5, datetime(2017, 3, 12, 4, 19, 47), '吃',6),(6, datetime(2017, 3, 12, 4, 33, 51), 'working',7),]df.show()|身份证|测试时间|测试名称|班次||1|2017-03-12 03:19:58|提高|2||2|2017-03-12 03:21:30|睡觉|1||3|2017-03-12 03:29:40|走路|3||4|2017-03-12 03:31:23|说话|5||5|2017-03-12 04:19:47|吃|6||6|2017-03-12 04:33:51|工作|7|
现在我想将班次(小时)添加到测试时间.有人能帮我快速解决吗?
解决方案
您可以使用如下所示的内容.您需要将 shift 字段转换为秒,所以我将其乘以 3600
<预><代码>>>>df.withColumn("testing_time", (F.unix_timestamp("testing_time") + F.col("shift")*3600).cast('timestamp')).show()+---+-------------------+---------+-----+|身份证|测试时间|测试名称|班次|+---+-------------------+---------+-----+|1|2017-03-12 05:19:58|提高|2||2|2017-03-12 04:21:30|睡觉|1||3|2017-03-12 06:29:40|走路|3||4|2017-03-12 08:31:23|说话|5||5|2017-03-12 10:19:47|吃|6||6|2017-03-12 11:33:51|工作|7|+---+-------------------+---------+-----+import pyspark.sql.functions as F
from datetime import datetime
data = [
(1, datetime(2017, 3, 12, 3, 19, 58), 'Raising',2),
(2, datetime(2017, 3, 12, 3, 21, 30), 'sleeping',1),
(3, datetime(2017, 3, 12, 3, 29, 40), 'walking',3),
(4, datetime(2017, 3, 12, 3, 31, 23), 'talking',5),
(5, datetime(2017, 3, 12, 4, 19, 47), 'eating',6),
(6, datetime(2017, 3, 12, 4, 33, 51), 'working',7),
]
df.show()
| id| testing_time|test_name|shift|
| 1|2017-03-12 03:19:58| Raising| 2|
| 2|2017-03-12 03:21:30| sleeping| 1|
| 3|2017-03-12 03:29:40| walking| 3|
| 4|2017-03-12 03:31:23| talking| 5|
| 5|2017-03-12 04:19:47| eating| 6|
| 6|2017-03-12 04:33:51| working| 7|
Now I want to add shift (hours) to the testing time. Can anybody help me out with a quick solution?
解决方案
You can use something like below. You need to convert shift field to seconds so I multiply it with 3600
>>> df.withColumn("testing_time", (F.unix_timestamp("testing_time") + F.col("shift")*3600).cast('timestamp')).show()
+---+-------------------+---------+-----+
| id| testing_time|test_name|shift|
+---+-------------------+---------+-----+
| 1|2017-03-12 05:19:58| Raising| 2|
| 2|2017-03-12 04:21:30| sleeping| 1|
| 3|2017-03-12 06:29:40| walking| 3|
| 4|2017-03-12 08:31:23| talking| 5|
| 5|2017-03-12 10:19:47| eating| 6|
| 6|2017-03-12 11:33:51| working| 7|
+---+-------------------+---------+-----+
这篇关于在 pyspark 中动态添加时间到时间戳的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文