python:矢量化累计计数 [英] python: vectorized cumulative counting

查看:114
本文介绍了python:矢量化累计计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个numpy数组,但是想以累积的方式计算每个值的出现次数

I have a numpy array and would like to count the number of occurences for each value, however, in a cumulative way

in  = [0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0, ...]
out = [0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4, ...]

我想知道是否最好创建一个在col = i和row = in [i]处的矩阵(稀疏)

I'm wondering if it is best to create a (sparse) matrix with ones at col = i and row = in[i]

       1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0
       0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0
       0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0
       0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0

然后我们可以计算行上的累积量,并从累积量增加的位置提取数字.

Then we could compute the cumsums along the rows and extract the numbers from the locations where the cumsums increment.

但是,如果我们对一个稀疏矩阵求和,它不会变得密集吗?有有效的方法吗?

However, if we cumsum a sparse matrix, doesn't become dense? Is there an efficient way of doing it?

推荐答案

这是使用sorting-

def cumcount(a):
    # Store length of array
    n = len(a)

    # Get sorted indices (use later on too) and store the sorted array
    sidx = a.argsort()
    b = a[sidx]

    # Mask of shifts/groups
    m = b[1:] != b[:-1]

    # Get indices of those shifts
    idx = np.flatnonzero(m)

    # ID array that will store the cumulative nature at the very end
    id_arr = np.ones(n,dtype=int)
    id_arr[idx[1:]+1] = -np.diff(idx)+1
    id_arr[idx[0]+1] = -idx[0]
    id_arr[0] = 0
    c = id_arr.cumsum()

    # Finally re-arrange those cumulative values back to original order
    out = np.empty(n, dtype=int)
    out[sidx] = c
    return out

样品运行-

In [66]: a
Out[66]: array([0, 1, 0, 1, 2, 3, 0, 0, 2, 1, 1, 3, 3, 0])

In [67]: cumcount(a)
Out[67]: array([0, 0, 1, 1, 0, 0, 2, 3, 1, 2, 3, 1, 2, 4])

这篇关于python:矢量化累计计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆