pandas -在多列上使用`.rolling()` [英] Pandas - Using `.rolling()` on multiple columns

查看:211
本文介绍了 pandas -在多列上使用`.rolling()`的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下面的熊猫DataFrame

      A     B     C
0  0.63  1.12  1.73
1  2.20 -2.16 -0.13
2  0.97 -0.68  1.09
3 -0.78 -1.22  0.96
4 -0.06 -0.02  2.18

我想使用函数.rolling()t = 0,1,2执行以下计算:

I would like to use the function .rolling() to perform the following calculation for t = 0,1,2:

  • tt+2
  • 中选择行
  • 从所有列中获取这3行中包含的9个值.将此集称为S
  • 计算S的第75个百分位数(或其他有关S的摘要统计信息)
  • Select the rows from t to t+2
  • Take the 9 values contained in those 3 rows, from all the columns. Call this set S
  • Compute the 75th percentile of S (or other summary statistics about S)


例如,对于t = 1,我们有 S = {2.2,-2.16,-0.13,0.97,-0.68,1.09,-0.78,-1.22,0.96},第75个百分位数是0.97.

For instance, for t = 1 we have S = { 2.2 , -2.16, -0.13, 0.97, -0.68, 1.09, -0.78, -1.22, 0.96 } and the 75th percentile is 0.97.

我找不到使它与.rolling()一起工作的方法,因为它显然将每一列分开.我现在依靠的是for循环,但这确实很慢.

I couldn't find a way to make it work with .rolling(), since it apparently takes each column separately. I'm now relying on a for loop, but it is really slow.

您对采用更有效的方法有何建议?

Do you have any suggestion for a more efficient approach?

推荐答案

一种解决方案是stack数据,然后将窗口大小乘以列数,然后将结果切入列数.另外,由于要使用前向窗口,因此请反转堆叠的DataFrame

One solution is to stack the data and then multiply your window size by the number of columns and slice the result by the number of columns. Also, since you want a forward looking window, reverse the order of the stacked DataFrame

wsize = 3
cols = len(df.columns)

df.stack(dropna=False)[::-1].rolling(window=wsize*cols).quantile(0.75)[cols-1::cols].reset_index(-1, drop=True).sort_index()

输出:

0    1.12
1    0.97
2    0.97
3     NaN
4     NaN
dtype: float64

如果有许多列和一个小窗口:

In the case of many columns and a small window:

import pandas as pd
import numpy as np

wsize = 3
df2 = pd.concat([df.shift(-x) for x in range(wsize)], 1)
s_quant = df2.quantile(0.75, 1)

# Only necessary if you need to enforce sufficient data. 
s_quant[df2.isnull().any(1)] = np.NaN

输出:s_quant

0    1.12
1    0.97
2    0.97
3     NaN
4     NaN
Name: 0.75, dtype: float64

这篇关于 pandas -在多列上使用`.rolling()`的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆