将"TimeStamp"列截断为pandas DataFrame中的小时精度 [英] Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

查看:470
本文介绍了将"TimeStamp"列截断为pandas DataFrame中的小时精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为dfpandas.DataFrame,它具有一个自动生成的索引,其列为dt:

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt:

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

我想做的是创建一个新的列,将其截断为小时精度.我当前正在使用:

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

这行得通,所以很好.但是,我想知道有一种使用pandas.tseries.offsets或创建DatetimeIndex或类似方法的好方法.

This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar.

那么,如果可能的话,是否有一些pandas向导可以做到这一点?

So if possible, is there some pandas wizardry to do this?

推荐答案

在pandas 0.18.0及更高版本中,日期时间为

In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00


这是截断时间戳的另一种方法.与floor不同,它支持截断到年或月之类的精度.


Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

您可以临时调整基础NumPy datetime64数据类型的精度单位,将其从[ns]更改为[h]:

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

df['dt'].values.astype('<M8[h]')

这会将所有内容截断为小时精度.例如:

This truncates everything to hour precision. For example:

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

对于其他任何单位,同样的方法也应适用:月份'M',分钟'm',依此类推:

The same method should work for any other unit: months 'M', minutes 'm', and so on:

  • 保留至一年:'<M8[Y]'
  • 保持月份:'<M8[M]'
  • 保持一天:'<M8[D]'
  • 保持最新:'<M8[m]'
  • 紧跟第二:'<M8[s]'
  • Keep up to year: '<M8[Y]'
  • Keep up to month: '<M8[M]'
  • Keep up to day: '<M8[D]'
  • Keep up to minute: '<M8[m]'
  • Keep up to second: '<M8[s]'

这篇关于将"TimeStamp"列截断为pandas DataFrame中的小时精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆