pandas :时间戳记索引四舍五入到最接近的第5分钟 [英] Pandas: Timestamp index rounding to the nearest 5th minute

查看:194
本文介绍了 pandas :时间戳记索引四舍五入到最接近的第5分钟的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个df,具有通常的时间戳作为索引:

I have a df with the usual timestamps as an index:

    2011-04-01 09:30:00
    2011-04-01 09:30:10
    ...
    2011-04-01 09:36:20
    ...
    2011-04-01 09:37:30

如何为该数据框创建一个具有相同时间戳但四舍五入到最接近的第5分钟间隔的列?像这样:

How can I create a column to this dataframe with the same timestamp but rounded to the nearest 5th minute interval? Like this:

    index                 new_col
    2011-04-01 09:30:00   2011-04-01 09:35:00        
    2011-04-01 09:30:10   2011-04-01 09:35:00
    2011-04-01 09:36:20   2011-04-01 09:40:00
    2011-04-01 09:37:30   2011-04-01 09:40:00

推荐答案

使用timedelta算术round_to_5min(t)解决方案是正确,但复杂且非常缓慢.而是在熊猫中使用漂亮的Timstamp:

The round_to_5min(t) solution using timedelta arithmetic is correct but complicated and very slow. Instead make use of the nice Timstamp in pandas:

import numpy as np
import pandas as pd

ns5min=5*60*1000000000   # 5 minutes in nanoseconds 
pd.to_datetime(((df.index.astype(np.int64) // ns5min + 1 ) * ns5min))

让我们比较一下速度:

rng = pd.date_range('1/1/2014', '1/2/2014', freq='S')

print len(rng)
# 86401

# ipython %timeit 
%timeit pd.to_datetime(((rng.astype(np.int64) // ns5min + 1 ) * ns5min))
# 1000 loops, best of 3: 1.01 ms per loop

%timeit rng.map(round_to_5min)
# 1 loops, best of 3: 1.03 s per loop

快大约1000倍!

这篇关于 pandas :时间戳记索引四舍五入到最接近的第5分钟的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆