值与过去窗口之间的滚动比较,具有百分位数/分位数 [英] Rolling comparison between a value and a past window, with percentile/quantile
问题描述
我想将数组的每个值x
与n个先前值的滚动窗口进行比较.更精确地说,我想看看如果我们将新值x
添加到前一个窗口中,该值将在哪个百分位数上:
I'd like to compare each value x
of an array with a rolling window of the n previous values. More precisely I'd like to see at which percentile this new value x
would be, if we added it to the previous window:
import numpy as np
A = np.array([1, 4, 9, 28, 28.5, 2, 283, 3.2, 7, 15])
print A
n = 4 # window width
for i in range(len(A)-n):
W = A[i:i+n]
x = A[i+n]
q = sum(W <= x) * 1.0 / n
print 'Value:', x, ' Window before this value:', W, ' Quantile:', q
[1. 4. 9. 28. 28.5 2. 283. 3.2 7. 15.]
值:28.5此值之前的窗口:[1. 4. 9. 28.]分位数:1.0
值:2.0此值之前的窗口:[4. 9. 28. 28.5]分位数:0.0
值:283.0此值之前的窗口:[9. 28. 28.5 2.]分位数:1.0
值:3.2此值之前的窗口:[28. 28.5 2. 283.]分位数:0.25
值:7.0此值之前的窗口:[28.5 2. 283. 3.2]分位数:0.5
值:15.0此值之前的窗口:[2. 283. 3.2 7.]分位数:0.75
[ 1. 4. 9. 28. 28.5 2. 283. 3.2 7. 15. ]
Value: 28.5 Window before this value: [ 1. 4. 9. 28.] Quantile: 1.0
Value: 2.0 Window before this value: [ 4. 9. 28. 28.5] Quantile: 0.0
Value: 283.0 Window before this value: [ 9. 28. 28.5 2. ] Quantile: 1.0
Value: 3.2 Window before this value: [ 28. 28.5 2. 283. ] Quantile: 0.25
Value: 7.0 Window before this value: [ 28.5 2. 283. 3.2] Quantile: 0.5
Value: 15.0 Window before this value: [ 2. 283. 3.2 7. ] Quantile: 0.75
问题:此计算的名称是什么?有一种聪明的numpy方法可以在数百万个项目的数组(n可以为〜5000)上更有效地计算吗?
注意:这是一个模拟100万个商品且n = 5000的模拟,但需要大约2个小时:
Note: here is a simulation for 1M items and n=5000 but it would take ~ 2 hours:
import numpy as np
A = np.random.random(1000*1000) # the following is not very interesting with a [0,1]
n = 5000 # uniform random variable, but anyway...
Q = np.zeros(len(A)-n)
for i in range(len(Q)):
Q[i] = sum(A[i:i+n] <= A[i+n]) * 1.0 / n
if i % 100 == 0:
print "%.2f %% already done. " % (i * 100.0 / len(A))
print Q