Python Pandas DataFrame-以编程方式设置频率的任何方式吗? [英] Python pandas dataframe - any way to set frequency programmatically?
问题描述
我正在尝试像这样处理CSV文件:
I'm trying to process CSV files like this:
df = pd.read_csv("raw_hl.csv", index_col='time', parse_dates = True))
df.head(2)
high low
time
2014-01-01 17:00:00 1.376235 1.375945
2014-01-01 17:01:00 1.376005 1.375775
2014-01-01 17:02:00 1.375795 1.375445
2014-01-01 17:07:00 NaN NaN
...
2014-01-01 17:49:00 1.375645 1.375445
type(df.index)
pandas.tseries.index.DatetimeIndex
但是这些不会自动具有频率:
But these don't automatically have a frequency:
print df.index.freq
None
如果它们具有不同的频率,则能够自动设置一个会很方便.最简单的方法是比较前两行:
In case they have differing frequencies, it would be handy to be able to set one automatically. The simplest way would be to compare the first two rows:
tdelta = df.index[1] - df.index[0]
tdelta
datetime.timedelta(0, 60)
到目前为止,效果很好,但是将频率直接设置为该timedelta失败:
So far so good, but setting frequency directly to this timedelta fails:
df.index.freq = tdelta
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-25-3f24abacf9de> in <module>()
----> 1 df.index.freq = tdelta
AttributeError: can't set attribute
有没有办法(理想情况下相对来说比较轻松!)?
Is there a way (ideally relatively painless!) to do this?
答案:熊猫给数据框指定了index.inferred_freq属性-可能是为了避免覆盖用户定义的频率. df.index.inferred_freq ='T'
ANSWER: Pandas has given the dataframe has a index.inferred_freq attribute - perhaps to avoid overwriting a user defined frequency. df.index.inferred_freq = 'T'
因此,这似乎只是使用它而不是df.index.freq的问题.感谢Jeff,他还在下面提供了更多详细信息:)
So it just seems to be a matter of using this instead of df.index.freq. Thanks to Jeff, who also provides more details below :)
推荐答案
如果您拥有常规频率,则在查看df.index.freq
If you have a regular frequency it will be reported when you look at df.index.freq
In [20]: df = DataFrame({'A' : np.arange(5)},index=pd.date_range('20130101 09:00:00',freq='3T',periods=5))
In [21]: df
Out[21]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:06:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
In [22]: df.index.freq
Out[22]: <3 * Minutes>
具有不规则频率将返回None
In [23]: df.index = df.index[0:2].tolist() + [Timestamp('20130101 09:05:00')] + df.index[-2:].tolist()
In [24]: df
Out[24]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:05:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
In [25]: df.index.freq
您可以通过执行此操作恢复正常频率.向下采样到较低的频率(没有重叠的值),进行前填充,然后重新索引到所需的频率和端点.
You can recover a regular frequency by doing this. Downsampling to a lower freq (where you don't have overlapping values), forward filling, then reindexing to the desired frequency and end-points).
In [31]: df.resample('T').ffill().reindex(pd.date_range(df.index[0],df.index[-1],freq='3T'))
Out[31]:
A
2013-01-01 09:00:00 0
2013-01-01 09:03:00 1
2013-01-01 09:06:00 2
2013-01-01 09:09:00 3
2013-01-01 09:12:00 4
这篇关于Python Pandas DataFrame-以编程方式设置频率的任何方式吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!