pandas 滚动窗百分等级 [英] Panda rolling window percentile rank

查看:110
本文介绍了 pandas 滚动窗百分等级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在滚动窗口中按列计算数据的百分等级.

I am trying to calculate the percentile rank of data by column within a rolling window.

test=pd.DataFrame(np.random.randn(20,3),pd.date_range('1/1/2000',periods=20),['A','B','C'])

test
Out[111]: 
                   A         B         C
2000-01-01 -0.566992 -1.494799  0.462330
2000-01-02 -0.550769 -0.699104  0.767778
2000-01-03 -0.270597  0.060836  0.057195
2000-01-04 -0.583784 -0.546418 -0.557850
2000-01-05  0.294073 -2.326211  0.262098
2000-01-06 -1.122543 -0.116279 -0.003088
2000-01-07  0.121387  0.763100  3.503757
2000-01-08  0.335564  0.076304  2.021757
2000-01-09  0.403170  0.108256  0.680739
2000-01-10 -0.254558 -0.497909 -0.454181
2000-01-11  0.167347  0.459264 -1.247459
2000-01-12 -1.243778  0.858444  0.338056
2000-01-13 -1.070655  0.924808  0.080867
2000-01-14 -1.175651 -0.559712 -0.372584
2000-01-15 -0.216708 -0.116188  0.511223
2000-01-16  0.597171  0.205529 -0.728783
2000-01-17 -0.624469  0.592436  0.832100
2000-01-18  0.259269  0.665585  0.126534
2000-01-19  1.150804  0.575759 -1.335835
2000-01-20 -0.909525  0.500366  2.120933

我尝试将.rolling与.apply一起使用,但我遗漏了一些东西.

I tried to use .rolling with .apply but I am missing something.

pctrank = lambda x: x.rank(pct=True)
rollingrank=test.rolling(window=10,centre=False).apply(pctrank)

对于A列,最终值将是从2000-01-11到2000-01-20在length = 10窗口内的-0.909525百分位等级.有什么想法吗?

For column A the final value would be the percentile rank of -0.909525 within the length=10 window from 2000-01-11 to 2000-01-20. Any ideas?

推荐答案

您的lambda会收到一个numpy数组,该数组没有.rank方法—.是熊猫的SeriesDataFrame拥有它.因此,您可以将其更改为

Your lambda receives a numpy array, which does not have a .rank method — it is pandas's Series and DataFrame that have it. You can thus change it to

pctrank = lambda x: pd.Series(x).rank(pct=True).iloc[-1]

或者您可以按照此SO答案的行使用纯numpy:

Or you could use pure numpy along the lines of this SO answer:

def pctrank(x):
    n = len(x)
    temp = x.argsort()
    ranks = np.empty(n)
    ranks[temp] = (np.arange(n) + 1) / n
    return ranks[-1]

这篇关于 pandas 滚动窗百分等级的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆