pandas 0.22,仅对全部垃圾箱重新采样,丢弃部分垃圾 [英] pandas 0.22, resample only full bins, drop partials

查看:68
本文介绍了 pandas 0.22,仅对全部垃圾箱重新采样,丢弃部分垃圾的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将1h交易数据箱重新采样为4h箱. 问题是我的熊猫代码还提供了尚未关闭的部分垃圾箱的输出.

I am trying to resample 1h bins of trading data into 4h bins. The problem is that my pandas code gives also output of partial bins that are not closed yet.

我的输入:

                  close    high     low    open  symbol       turnover  \
timestamp
2018-05-08 03:00:00  9418.0  9449.0  9408.5  9412.5  XBTUSD  1091577940325
2018-05-08 04:00:00  9423.5  9435.0  9390.0  9418.0  XBTUSD   801492831858
2018-05-08 05:00:00  9414.0  9428.5  9393.5  9423.5  XBTUSD   445420257388
2018-05-08 06:00:00  9337.0  9414.0  9314.5  9414.0  XBTUSD  1349710247828
2018-05-08 07:00:00  9328.5  9359.5  9305.0  9337.0  XBTUSD  1103092129997
2018-05-08 08:00:00  9355.0  9359.5  9328.5  9328.5  XBTUSD   647813850343
2018-05-08 09:00:00  9376.0  9383.0  9355.0  9355.0  XBTUSD   597066647876
2018-05-08 10:00:00  9312.0  9376.5  9241.5  9376.0  XBTUSD  1933554301163
2018-05-08 11:00:00  9296.0  9338.0  9275.5  9312.0  XBTUSD  1318169059747
2018-05-08 12:00:00  9201.5  9305.0  9178.0  9296.0  XBTUSD  2058057970783

我的输出:

                   open    high     low   close     volume         vwap  \
timestamp
2018-05-08 04:00:00  9418.0  9435.0  9305.0  9328.5  346736372  9380.972675
2018-05-08 08:00:00  9328.5  9383.0  9241.5  9296.0  419074812  9332.798550
2018-05-08 12:00:00  9296.0  9305.0  9178.0  9201.5  189922434  9228.497600

请注意,从12:00到16:00的4h间隔包含源的每小时12:00间隔的部分数据.

Note that the 4h interval from 12:00 till 16:00 contains the partial data from the 12:00 hourly interval of the source.

我所需的输出应如下所示:

My Desired output should look like:

                     open    high     low   close     volume         vwap  \
timestamp
2018-05-08 04:00:00  9418.0  9435.0  9305.0  9328.5  346736372  9380.972675
2018-05-08 08:00:00  9328.5  9383.0  9241.5  9296.0  419074812  9332.798550

因此,只有整个12:00间隔都必须关闭才能在重采样过程中提供数据.

So that only the whole 12:00 interval must be closed in order to give data in the resampling process.

到目前为止,我的代码:

My Code so far:

outputData = srcData.resample('4H').agg({'open': 'first',
                                                'high': 'max',
                                                'low': 'min',
                                                'close': 'last',
                                                'volume': 'sum',
                                                'vwap': 'mean',
                                                'turnover': 'sum',
                                                'symbol': 'first'})

大熊猫中是否有一项功能对我有帮助,还是我必须找出一种方法来减少重采样后的部分时间间隔? 干杯 亚历克斯

Is there a function in pandas that would help me or do I have to figure out a way to cut the partial interval off after resampling? Cheers Alex

推荐答案

您可以将计数添加到agg方法中,然后使用该计数列来过滤结果数据框以仅显示满仓".

You could add a count into your agg method then use that count column to filter the resulting dataframe to show only "full bins".

df_out = df.resample('4H').agg({'open': 'first',
                                 'high': 'max',
                                                 'low': 'min',
                                                 'close': 'last',
                                                 'turnover': 'sum',
                                                 'symbol': ['first','count']})
df_out.columns = df_out.columns.map('_'.join)

df_out.query('symbol_count == 4')

输出:

                     open_first  high_max  low_min  close_last   turnover_sum symbol_first  symbol_count
timestamp                                                                                               
2018-05-08 04:00:00      9418.0    9435.0   9305.0      9328.5  3699715467071       XBTUSD             4
2018-05-08 08:00:00      9328.5    9383.0   9241.5      9296.0  4496603859129       XBTUSD             4

这篇关于 pandas 0.22,仅对全部垃圾箱重新采样,丢弃部分垃圾的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆