使用重新采样为 pandas 数据框中的不同列使用不同规则聚合数据 [英] using resample to aggregate data with different rules for different columns in a pandas dataframe

查看:51
本文介绍了使用重新采样为 pandas 数据框中的不同列使用不同规则聚合数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个经典的开盘高低成交量数据类型的数据框,在财务中很常见。每行1分钟。 720行。我从Kraken收集以下代码:

I have a dataframe of the classic "open high low close volume" data type, so common in finance. With each row being 1 minute. 720 rows. I gather it with this code from Kraken:

import urllib.request, json 

with urllib.request.urlopen("https://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1") as url:
    data = json.loads(url.read().decode())

columns=['time', 'open', 'high', 'low', 'close', 'vwap', 'volume', 'ount']
data_DF=pd.DataFrame(data['result']['XXBTZEUR'],columns=columns)
data_DF['open']=data_DF['open'].astype(float)
data_DF['high']=data_DF['high'].astype(float)
data_DF['low']=data_DF['low'].astype(float)
data_DF['close']=data_DF['close'].astype(float)
data_DF['volume']=data_DF['volume'].astype(float)
data_DF['vwap']=data_DF['vwap'].astype(float)
data_DF['ount']=data_DF['ount'].astype(int)
data_DF['time']=pd.to_datetime(data_DF['time'],unit='s')
data_DF.set_index('time',inplace=True)

I现在需要将其汇总为不同的时间段。为简单起见,让我们假设只有经典的5分钟。必须根据不同的规则生成每一列:

打开列必须是样本的打开列值的第一个值;

关闭列必须是最后一个值样本的关闭列值的最大值;

的最大值必须是样本的上限列值的最大值;

的最小值必须是样本的低列值的最小值样品;

I now need to aggregate it for different time periods. To keep things simple let us suppose just the classic 5 minutes. Each column must be generated according to a different rule:
The open column must be the first falue of the open column values of the sample;
The close column must be the last value of the close column values of the sample;
the high must be the max of the high column values of the sample;
the low must be the min of the low column values of the sample;

我尝试了

data_DF5=data_DF['vwap'].resample('5Min').OHLC()

,但会为每列。嗯,不是我要找的东西。

but it creates a series of open high low close for each column. Hmm, not what I was looking for.

我尝试过:

data_DF5=data_DF['time'].resample('5Min')
data_DF5['volume']=data_DF['volume'].resample('5Min').sum()
data_DF5['open']=data_DF['open'].resample('5Min').first()
data_DF5['close']=data_DF['close'].resample('5Min').last()
data_DF5['high']=data_DF['high'].resample('5Min').max()
data_DF5['low']=data_DF['low'].resample('5Min').min()

目的是一次建立一列数据框。

With the intent of building the dataframe one column at a time.

然后我得到


无法打开'hashtable_class_helper.pxi' :找不到文件
错误,我无法理解。如果我用

"Unable to open 'hashtable_class_helper.pxi': File not found " error which I cannot understand. If I change the first line with



data_DF5=data_DF['vwap'].resample('5Min').mean()

我得到了一个甚至无法解释的数据框[请参阅(* )]。

I get a dataframe which I cannot even interpret [see (*)].

如果我用

data_DF5=data_DF['vwap'].resample('5Min')

我得到:


'DatetimeIndexResampler'对象不支持项目分配。

'DatetimeIndexResampler' object does not support item assignment.



<我真的很茫然。我已经寻找了stackoverflow其他问题,但似乎没有一个案例可以解决。此外,手册页似乎也没有明确如何解决此问题。

I am really at a loss. I have looked for stackoverflow other questions, but none seem to cover this case. Also the manual page does not seem to be clear on how to solve this.

(*)



2018-12 -29 07:05:00 3417.8 2018-12-29 07:10:00 3411.12 2018-12-29 07:15:00 3408.98 2018-12-29 07:20:00 3409.46 2018-12-29 07:25: 00 3409.26 2018-12-29 07:30:00 2729.18 2018-12-29 07:35:00 3413.9 2018-12-29 07:40:00 2739.32 2018-12-29 07:45:00 3426.12 2018-12- 29 07:50:00 3423.46 2018-12-29 07:55:00 3433.22 2018-12-29 08:00:00 3424.14 2018-12-29 08:05:00 3426.44 2018-12-29 08:10:00 3424.6 2018-12-29 08:15:00 3425.22 2018-12-29 08:20:00 3425.6 2018-12-29 08:25:00 3425.72 2018-12-29 08:30:00 3427.96 2018-12-29 08:35:00 3427.64 2018-12-29 08:40:00 3427.06 2018-12-29 08:45:00 3426.06 2018-12-29 08:50:00 3423.38 2018-12-29 08:55:00 3426.42 2018-12-29 09:00:00 3441.08 2018-12-29 09:05:00 3439.68 2018-12-29 09:10:00 3 429.38 2018-12-29 09:15:00 3422.12 2018-12-29 09:20:00 3418.4 2018-12-29 09:25:00 3419 2018-12-29 09:30:00

3415.94
... 2018-12-29 17:05:00 3363.46 2018-12-29 17:10:00 3364.86 2018-12-29 17:15:00 3362.56 2018-12-29 17:20 :00 3360.88 2018-12-29 17:25:00 3358.98 2018-12-29 17:30:00 3353.8 2018-12-29 17:35:00 3371.62 2018-12-29 17:40:00 3365.38 2018-12 -29 17:45:00 3368.76 2018-12-29 1 7:50:00 3373.82 2018-12-29 17:55:00 3373.32 2018-12-29 18:00:00 3374.78 2018-12-29 18:05:00 3372.56 2018-12-29 18:10:00 3370.3 2018-12-29 18:15:00 3370.3 2018-12-29 18:20:00 3371.36 2018-12-29 18:25:00 3372.14 2018-12-29 18:30:00 3367.36 2018-12-29 18 :35:00 3371.3 2018-12-29 18:40:00 3367.08 2018-12-29 18:45:00 3363.3 2018-12-29 18:50:00 3357.66 2018-12-29 18:55:00 3357.64 2018-12-29 19:00:00 3357.64 2018-12-29 19:05:00 3356成交量2018-12-29 07:05:00 0.112311
2018-12 -...开放时间2018 -12-29 07:05:00 3418.9
2018-12-29 ...关闭时间2018-12-29 07:05:00

3416.8 2018-12-29 ...高时间2018-12-29 07:05:00 3418.9 2018-12-29 ...低时间2018-12-29 07:05:00 3416.8 2018-12-29 ...名称:vwap,长度:150, dtype:object

2018-12-29 07:05:00 3417.8 2018-12-29 07:10:00 3411.12 2018-12-29 07:15:00 3408.98 2018-12-29 07:20:00 3409.46 2018-12-29 07:25:00 3409.26 2018-12-29 07:30:00 2729.18 2018-12-29 07:35:00 3413.9 2018-12-29 07:40:00 2739.32 2018-12-29 07:45:00 3426.12 2018-12-29 07:50:00 3423.46 2018-12-29 07:55:00 3433.22 2018-12-29 08:00:00 3424.14 2018-12-29 08:05:00 3426.44 2018-12-29 08:10:00 3424.6 2018-12-29 08:15:00 3425.22 2018-12-29 08:20:00 3425.6 2018-12-29 08:25:00 3425.72 2018-12-29 08:30:00 3427.96 2018-12-29 08:35:00 3427.64 2018-12-29 08:40:00 3427.06 2018-12-29 08:45:00 3426.06 2018-12-29 08:50:00 3423.38 2018-12-29 08:55:00 3426.42 2018-12-29 09:00:00 3441.08 2018-12-29 09:05:00 3439.68 2018-12-29 09:10:00 3429.38 2018-12-29 09:15:00 3422.12 2018-12-29 09:20:00 3418.4 2018-12-29 09:25:00 3419 2018-12-29 09:30:00
3415.94 ... 2018-12-29 17:05:00 3363.46 2018-12-29 17:10:00 3364.86 2018-12-29 17:15:00 3362.56 2018-12-29 17:20:00 3360.88 2018-12-29 17:25:00 3358.98 2018-12-29 17:30:00 3353.8 2018-12-29 17:35:00 3371.62 2018-12-29 17:40:00 3365.38 2018-12-29 17:45:00 3368.76 2018-12-29 17:50:00 3373.82 2018-12-29 17:55:00 3373.32 2018-12-29 18:00:00 3374.78 2018-12-29 18:05:00 3372.56 2018-12-29 18:10:00 3370.3 2018-12-29 18:15:00 3370.3 2018-12-29 18:20:00 3371.36 2018-12-29 18:25:00 3372.14 2018-12-29 18:30:00 3367.36 2018-12-29 18:35:00 3371.3 2018-12-29 18:40:00 3367.08 2018-12-29 18:45:00 3363.3 2018-12-29 18:50:00 3357.66 2018-12-29 18:55:00 3357.64 2018-12-29 19:00:00 3357.64 2018-12-29 19:05:00 3356 volume time 2018-12-29 07:05:00 0.112311 2018-12-... open time 2018-12-29 07:05:00 3418.9 2018-12-29 ... close time 2018-12-29 07:05:00
3416.8 2018-12-29 ... high time 2018-12-29 07:05:00 3418.9 2018-12-29 ... low time 2018-12-29 07:05:00 3416.8 2018-12-29 ... Name: vwap, Length: 150, dtype: object



推荐答案

我认为您需要 pd.Grouper

I think you need pd.Grouper

data_DF = data_DF.groupby(pd.Grouper(freq='5min')).agg({'open':'first',
                                                        'close':'last',
                                                        'high':'max',
                                                        'low':'min'})

                       open   close    high     low
time                                               
2018-12-29 07:30:00  3411.4  3413.9  3413.9  3411.4
2018-12-29 07:35:00  3413.9  3413.1  3416.1  3411.9
2018-12-29 07:40:00  3413.1  3422.9  3427.5  3413.1
2018-12-29 07:45:00  3421.1  3423.8  3431.7  3418.0
2018-12-29 07:50:00  3423.8  3428.2  3428.2  3418.9

这篇关于使用重新采样为 pandas 数据框中的不同列使用不同规则聚合数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆