Python Pandas DataFrame-以编程方式设置频率的任何方式吗? [英] Python pandas dataframe - any way to set frequency programmatically?

查看:102
本文介绍了Python Pandas DataFrame-以编程方式设置频率的任何方式吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试像这样处理CSV文件:

I'm trying to process CSV files like this:

df = pd.read_csv("raw_hl.csv", index_col='time', parse_dates = True))
df.head(2)
                    high        low 
time                
2014-01-01 17:00:00 1.376235    1.375945
2014-01-01 17:01:00 1.376005    1.375775
2014-01-01 17:02:00 1.375795    1.375445
2014-01-01 17:07:00 NaN         NaN 
...
2014-01-01 17:49:00 1.375645    1.375445

type(df.index)
pandas.tseries.index.DatetimeIndex

但是这些不会自动具有频率:

But these don't automatically have a frequency:

print df.index.freq
None

如果它们具有不同的频率,则能够自动设置一个会很方便.最简单的方法是比较前两行:

In case they have differing frequencies, it would be handy to be able to set one automatically. The simplest way would be to compare the first two rows:

tdelta = df.index[1] - df.index[0]
tdelta
datetime.timedelta(0, 60) 

到目前为止,效果很好,但是将频率直接设置为该timedelta失败:

So far so good, but setting frequency directly to this timedelta fails:

df.index.freq = tdelta
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-25-3f24abacf9de> in <module>()
----> 1 df.index.freq = tdelta

AttributeError: can't set attribute

有没有办法(理想情况下相对来说比较轻松!)?

Is there a way (ideally relatively painless!) to do this?

答案:熊猫给数据框指定了index.inferred_freq属性-可能是为了避免覆盖用户定义的频率. df.index.inferred_freq ='T'

ANSWER: Pandas has given the dataframe has a index.inferred_freq attribute - perhaps to avoid overwriting a user defined frequency. df.index.inferred_freq = 'T'

因此,这似乎只是使用它而不是df.index.freq的问题.感谢Jeff,他还在下面提供了更多详细信息:)

So it just seems to be a matter of using this instead of df.index.freq. Thanks to Jeff, who also provides more details below :)

推荐答案

如果您拥有常规频率,则在查看df.index.freq

If you have a regular frequency it will be reported when you look at df.index.freq

In [20]: df = DataFrame({'A' : np.arange(5)},index=pd.date_range('20130101 09:00:00',freq='3T',periods=5))

In [21]: df
Out[21]: 
                     A
2013-01-01 09:00:00  0
2013-01-01 09:03:00  1
2013-01-01 09:06:00  2
2013-01-01 09:09:00  3
2013-01-01 09:12:00  4

In [22]: df.index.freq
Out[22]: <3 * Minutes>

具有不规则频率将返回None

In [23]: df.index = df.index[0:2].tolist() + [Timestamp('20130101 09:05:00')] + df.index[-2:].tolist()

In [24]: df
Out[24]: 
                     A
2013-01-01 09:00:00  0
2013-01-01 09:03:00  1
2013-01-01 09:05:00  2
2013-01-01 09:09:00  3
2013-01-01 09:12:00  4

In [25]: df.index.freq

您可以通过执行此操作恢复正常频率.向下采样到较低的频率(没有重叠的值),进行前填充,然后重新索引到所需的频率和端点.

You can recover a regular frequency by doing this. Downsampling to a lower freq (where you don't have overlapping values), forward filling, then reindexing to the desired frequency and end-points).

In [31]: df.resample('T').ffill().reindex(pd.date_range(df.index[0],df.index[-1],freq='3T'))
Out[31]: 
                     A
2013-01-01 09:00:00  0
2013-01-01 09:03:00  1
2013-01-01 09:06:00  2
2013-01-01 09:09:00  3
2013-01-01 09:12:00  4

这篇关于Python Pandas DataFrame-以编程方式设置频率的任何方式吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆