pandas 0.22,仅对全部垃圾箱重新采样,丢弃部分垃圾 [英] pandas 0.22, resample only full bins, drop partials
问题描述
我正在尝试将1h交易数据箱重新采样为4h箱. 问题是我的熊猫代码还提供了尚未关闭的部分垃圾箱的输出.
I am trying to resample 1h bins of trading data into 4h bins. The problem is that my pandas code gives also output of partial bins that are not closed yet.
我的输入:
close high low open symbol turnover \
timestamp
2018-05-08 03:00:00 9418.0 9449.0 9408.5 9412.5 XBTUSD 1091577940325
2018-05-08 04:00:00 9423.5 9435.0 9390.0 9418.0 XBTUSD 801492831858
2018-05-08 05:00:00 9414.0 9428.5 9393.5 9423.5 XBTUSD 445420257388
2018-05-08 06:00:00 9337.0 9414.0 9314.5 9414.0 XBTUSD 1349710247828
2018-05-08 07:00:00 9328.5 9359.5 9305.0 9337.0 XBTUSD 1103092129997
2018-05-08 08:00:00 9355.0 9359.5 9328.5 9328.5 XBTUSD 647813850343
2018-05-08 09:00:00 9376.0 9383.0 9355.0 9355.0 XBTUSD 597066647876
2018-05-08 10:00:00 9312.0 9376.5 9241.5 9376.0 XBTUSD 1933554301163
2018-05-08 11:00:00 9296.0 9338.0 9275.5 9312.0 XBTUSD 1318169059747
2018-05-08 12:00:00 9201.5 9305.0 9178.0 9296.0 XBTUSD 2058057970783
我的输出:
open high low close volume vwap \
timestamp
2018-05-08 04:00:00 9418.0 9435.0 9305.0 9328.5 346736372 9380.972675
2018-05-08 08:00:00 9328.5 9383.0 9241.5 9296.0 419074812 9332.798550
2018-05-08 12:00:00 9296.0 9305.0 9178.0 9201.5 189922434 9228.497600
请注意,从12:00到16:00的4h间隔包含源的每小时12:00间隔的部分数据.
Note that the 4h interval from 12:00 till 16:00 contains the partial data from the 12:00 hourly interval of the source.
我所需的输出应如下所示:
My Desired output should look like:
open high low close volume vwap \
timestamp
2018-05-08 04:00:00 9418.0 9435.0 9305.0 9328.5 346736372 9380.972675
2018-05-08 08:00:00 9328.5 9383.0 9241.5 9296.0 419074812 9332.798550
因此,只有整个12:00间隔都必须关闭才能在重采样过程中提供数据.
So that only the whole 12:00 interval must be closed in order to give data in the resampling process.
到目前为止,我的代码:
My Code so far:
outputData = srcData.resample('4H').agg({'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'volume': 'sum',
'vwap': 'mean',
'turnover': 'sum',
'symbol': 'first'})
大熊猫中是否有一项功能对我有帮助,还是我必须找出一种方法来减少重采样后的部分时间间隔? 干杯 亚历克斯
Is there a function in pandas that would help me or do I have to figure out a way to cut the partial interval off after resampling? Cheers Alex
推荐答案
您可以将计数添加到agg方法中,然后使用该计数列来过滤结果数据框以仅显示满仓".
You could add a count into your agg method then use that count column to filter the resulting dataframe to show only "full bins".
df_out = df.resample('4H').agg({'open': 'first',
'high': 'max',
'low': 'min',
'close': 'last',
'turnover': 'sum',
'symbol': ['first','count']})
df_out.columns = df_out.columns.map('_'.join)
df_out.query('symbol_count == 4')
输出:
open_first high_max low_min close_last turnover_sum symbol_first symbol_count
timestamp
2018-05-08 04:00:00 9418.0 9435.0 9305.0 9328.5 3699715467071 XBTUSD 4
2018-05-08 08:00:00 9328.5 9383.0 9241.5 9296.0 4496603859129 XBTUSD 4
这篇关于 pandas 0.22,仅对全部垃圾箱重新采样,丢弃部分垃圾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!