NumPy:计算累积中位数 [英] NumPy: calculate cumulative median

查看:47
本文介绍了NumPy:计算累积中位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有大小为 n 的样本.

I have sample with size = n.

我想为每个 i 计算:1 <= i <= n 在 numpy 中 sample[:i] 的中位数.例如,我计算每个 i 的平均值:

I want to calculate for each i: 1 <= i <= n median for sample[:i] in numpy. For example, I counted mean for each i:

cummean = np.cumsum(sample)/np.arange(1, n + 1)

我可以在没有循环和理解的情况下对中位数做类似的事情吗?

Can I do something similar for the median without cycles and comprehension?

推荐答案

这是一种沿行复制元素以提供 2D 数组的方法.然后,我们将用一个大数字填充上三角区域,以便稍后当我们沿着每一行对数组进行排序时,基本上可以对所有元素进行排序,直到对角元素为止,这模拟了累积窗口.然后,按照 median 的定义,选择中间的一个或两个中间的平均值(对于偶数个元素),我们将得到第一个位置的元素:(0,0),然后对于第二行:(1,0) & 的平均值(1,1),对于第三行:(2,1),对于第四行:(3,1) &(3,2) 等等.因此,我们将从已排序的数组中提取出这些元素,从而得到我们的中值.

Here's an approach that replicates elements along rows to give us a 2D array. Then, we would fill the upper triangular region with a big number so that later on when we sort the array along each row, would basically sort all elements till the diagonal elements and that simulates the cumulative windows. Then, following the definition of median that chooses the middle one or the mean of two middle ones (for even no. of elements), we would get the elements at the first position : (0,0), then for the second row : mean of (1,0) & (1,1), for the third row : (2,1), for the fourth row : mean of (3,1) & (3,2) and so on. So, we will extract out those elements from the sorted array and thus have our median values.

因此,实现将是 -

def cummedian_sorted(a):
    n = a.size
    maxn = a.max()+1
    a_tiled_sorted = np.tile(a,n).reshape(-1,n)
    mask = np.triu(np.ones((n,n),dtype=bool),1)

    a_tiled_sorted[mask] = maxn
    a_tiled_sorted.sort(1)

    all_rows = a_tiled_sorted[np.arange(n), np.arange(n)//2].astype(float)
    idx = np.arange(1,n,2)
    even_rows = a_tiled_sorted[idx, np.arange(1,1+(n//2))]
    all_rows[idx] += even_rows
    all_rows[1::2] /= 2.0
    return all_rows

运行时测试

方法 -

# Loopy solution from @Uriel's soln   
def cummedian_loopy(arr):
    return [median(a[:i]) for i in range(1,len(a)+1)]

# Nan-fill based solution from @Nickil Maveli's soln   
def cummedian_nanfill(arr):
    a = np.tril(arr).astype(float)
    a[np.triu_indices(a.shape[0], k=1)] = np.nan
    return np.nanmedian(a, axis=1)

时间 -

第 1 组:

In [43]: a = np.random.randint(0,100,(100))

In [44]: print np.allclose(cummedian_loopy(a), cummedian_sorted(a))
    ...: print np.allclose(cummedian_loopy(a), cummedian_nanfill(a))
    ...: 
True
True

In [45]: %timeit cummedian_loopy(a)
    ...: %timeit cummedian_nanfill(a)
    ...: %timeit cummedian_sorted(a)
    ...: 
1000 loops, best of 3: 856 µs per loop
1000 loops, best of 3: 778 µs per loop
10000 loops, best of 3: 200 µs per loop

第 2 组:

In [46]: a = np.random.randint(0,100,(1000))

In [47]: print np.allclose(cummedian_loopy(a), cummedian_sorted(a))
    ...: print np.allclose(cummedian_loopy(a), cummedian_nanfill(a))
    ...: 
True
True

In [48]: %timeit cummedian_loopy(a)
    ...: %timeit cummedian_nanfill(a)
    ...: %timeit cummedian_sorted(a)
    ...: 
10 loops, best of 3: 118 ms per loop
10 loops, best of 3: 47.6 ms per loop
100 loops, best of 3: 18.8 ms per loop

第 3 组:

In [49]: a = np.random.randint(0,100,(5000))

In [50]: print np.allclose(cummedian_loopy(a), cummedian_sorted(a))
    ...: print np.allclose(cummedian_loopy(a), cummedian_nanfill(a))

True
True

In [54]: %timeit cummedian_loopy(a)
    ...: %timeit cummedian_nanfill(a)
    ...: %timeit cummedian_sorted(a)
    ...: 
1 loops, best of 3: 3.36 s per loop
1 loops, best of 3: 583 ms per loop
1 loops, best of 3: 521 ms per loop

这篇关于NumPy:计算累积中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆