值与过去窗口之间的滚动比较,具有百分位数/分位数 [英] Rolling comparison between a value and a past window, with percentile/quantile

查看:161
本文介绍了值与过去窗口之间的滚动比较,具有百分位数/分位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将数组的每个值x与n个先前值的滚动窗口进行比较.更精确地说,我想看看如果我们将新值x添加到前一个窗口中,该值将在哪个百分位数上:

I'd like to compare each value x of an array with a rolling window of the n previous values. More precisely I'd like to see at which percentile this new value x would be, if we added it to the previous window:

import numpy as np
A = np.array([1, 4, 9, 28, 28.5, 2, 283, 3.2, 7, 15])
print A
n = 4  # window width
for i in range(len(A)-n):
    W = A[i:i+n]
    x = A[i+n]
    q = sum(W <= x) * 1.0 / n
    print 'Value:', x, ' Window before this value:', W, ' Quantile:', q

[1. 4. 9. 28. 28.5 2. 283. 3.2 7. 15.]
值:28.5此值之前的窗口:[1. 4. 9. 28.]分位数:1.0
值:2.0此值之前的窗口:[4. 9. 28. 28.5]分位数:0.0
值:283.0此值之前的窗口:[9. 28. 28.5 2.]分位数:1.0
值:3.2此值之前的窗口:[28. 28.5 2. 283.]分位数:0.25
值:7.0此值之前的窗口:[28.5 2. 283. 3.2]分位数:0.5
值:15.0此值之前的窗口:[2. 283. 3.2 7.]分位数:0.75

[ 1. 4. 9. 28. 28.5 2. 283. 3.2 7. 15. ]
Value: 28.5 Window before this value: [ 1. 4. 9. 28.] Quantile: 1.0
Value: 2.0 Window before this value: [ 4. 9. 28. 28.5] Quantile: 0.0
Value: 283.0 Window before this value: [ 9. 28. 28.5 2. ] Quantile: 1.0
Value: 3.2 Window before this value: [ 28. 28.5 2. 283. ] Quantile: 0.25
Value: 7.0 Window before this value: [ 28.5 2. 283. 3.2] Quantile: 0.5
Value: 15.0 Window before this value: [ 2. 283. 3.2 7. ] Quantile: 0.75

问题:此计算的名称是什么?有一种聪明的numpy方法可以在数百万个项目的数组(n可以为〜5000)上更有效地计算吗?

注意:这是一个模拟100万个商品且n = 5000的模拟,但需要大约2个小时:

Note: here is a simulation for 1M items and n=5000 but it would take ~ 2 hours:

import numpy as np
A = np.random.random(1000*1000)  # the following is not very interesting with a [0,1]
n = 5000                         # uniform random variable, but anyway...
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
    Q[i] = sum(A[i:i+n] <= A[i+n]) * 1.0 / n
    if i % 100 == 0: 
        print "%.2f %% already done. " % (i * 100.0 / len(A))

print Q

注意:这与推荐答案

您的代码是如此之慢,因为您使用的是Python自己的sum()而不是numpy.sum()numpy.array.sum(); Python的sum()必须在执行计算之前将所有原始值转换为Python对象,这确实很慢.只需将sum(...)更改为np.sum(...)(...).sum(),运行时间就会减少到20秒以内.

Your code is so slow because you're using Python's own sum() instead of numpy.sum() or numpy.array.sum(); Python's sum() has to convert all the raw values to Python objects before doing the calculations, which is really slow. Just by changing sum(...) to np.sum(...) or (...).sum(), the runtime drops to under 20 seconds.

这篇关于值与过去窗口之间的滚动比较,具有百分位数/分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆