向 Pyspark 数据框中的时间戳列添加 1 小时 [英] Adding 1 hours to time stamp columns in Pyspark data frame

查看：65 发布时间：2021/6/24 20:35:24 python pyspark

本文介绍了向 Pyspark 数据框中的时间戳列添加 1 小时的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在 pyspark 中有一个名为 test_time 的列.这是一个 timestamp 列.

In pyspark I have a column called test_time. This is a timestamp column.

该列有如下记录.

2017-03-12 03:19:51.0
2017-03-12 03:29:51.0

现在我想将 1 hours 添加到 test_time 列中的记录.

Now I want to add 1 hour to the records in the test_time columns.

结果:

2017-03-12 04:19:51.0
2017-03-12 04:29:51.0

我怎样才能实现我的结果.

How can I achieve my result.

我像下面这样

df['test_time'] = df['test_time'].apply(lambda x: x - pd.DateOffset(hours=1))

出现以下错误

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Column' object is not callable

推荐答案

将其转换为 UTC 时间戳后应该非常容易.这是一种方法:

Should be very easy once you convert it to a UTC timestamp. Here is one way to do it :

from pyspark.sql.functions import to_utc_timestamp,from_utc_timestamp
from datetime import timedelta

## Create a dummy dataframe
df = sqlContext.createDataFrame([('1997-02-28 10:30:00',)], ['t'])

## Add column to convert time to utc timestamp in PST
df2 = df.withColumn('utc_timestamp',to_utc_timestamp(df.t,"PST"))

## Add one hour with the timedelta function
df3 = df2.map(lambda x: (x.t,x.utc_timestamp+timedelta(hours=1))).toDF(['t','new_utc_timestamp'])

## Convert back to original time zone and format
df4 = df3.withColumn('new_t',from_utc_timestamp(df3.new_utc_timestamp,"PST"))

df4 中的new_t"列是您需要的列，根据您的系统转换回适当的时区.

The "new_t" column in df4 is your required column converted back to the appropriate time zone according to your system.

这篇关于向 Pyspark 数据框中的时间戳列添加 1 小时的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

向 Pyspark 数据框中的时间戳列添加 1 小时 [英] Adding 1 hours to time stamp columns in Pyspark data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

向 Pyspark 数据框中的时间戳列添加 1 小时 [英] Adding 1 hours to time stamp columns in Pyspark data frame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭