基于其值的数组中的总和元素 [英] Numpy sum elements in array based on its value

查看:51
本文介绍了基于其值的数组中的总和元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有未排序的索引数组:

I have unsorted array of indexes:

i = np.array([1,5,2,6,4,3,6,7,4,3,2])

我还有一个长度相同的值数组:

I also have an array of values of the same length:

v = np.array([2,5,2,3,4,1,2,1,6,4,2])

我有一个期望值为零的数组:

I have array with zeros of desired values:

d = np.zeros(10)

现在我想基于v在i的索引将其添加到v的d值中.

Now I want to add to elements in d values of v based on it's index in i.

如果我用普通的python来做,我会这样:

If I do it in plain python I would do it like this:

for index,value in enumerate(v):
    idx = i[index]
    d[idx] += v[index]

这很丑陋且效率低下.我该如何更改?

It is ugly and inefficient. How can I change it?

推荐答案

我们可以使用 np.bincount 据说对于这种累积加权计数非常有效,所以这里就是一个-

We can use np.bincount which is supposedly pretty efficient for such accumulative weighted counting, so here's one with that -

counts = np.bincount(i,v)
d[:counts.size] = counts

或者,使用minlength输入参数,对于一般情况,当d可以是任何数组并且我们要添加到其中时-

Alternatively, using minlength input argument and for a generic case when d could be any array and we want to add into it -

d += np.bincount(i,v,minlength=d.size).astype(d.dtype, copy=False)

运行时测试

本节将 other post 中列出的基于np.add.at的方法与前面列出的基于np.bincount的方法进行了比较在这篇文章中.

This section compares np.add.at based approach listed in the other post with the np.bincount based one listed earlier in this post.

In [61]: def bincount_based(d,i,v):
    ...:     counts = np.bincount(i,v)
    ...:     d[:counts.size] = counts
    ...: 
    ...: def add_at_based(d,i,v):
    ...:     np.add.at(d, i, v)
    ...:     

In [62]: # Inputs (random numbers)
    ...: N = 10000
    ...: i = np.random.randint(0,1000,(N))
    ...: v = np.random.randint(0,1000,(N))
    ...: 
    ...: # Setup output arrays for two approaches
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [63]: bincount_based(d1,i,v) # Run approaches
    ...: add_at_based(d2,i,v)
    ...: 

In [64]: np.allclose(d1,d2)  # Verify outputs
Out[64]: True

In [67]: # Setup output arrays for two approaches again for timing
    ...: M = 12000
    ...: d1 = np.zeros(M)
    ...: d2 = np.zeros(M)
    ...: 

In [68]: %timeit add_at_based(d2,i,v)
1000 loops, best of 3: 1.83 ms per loop

In [69]: %timeit bincount_based(d1,i,v)
10000 loops, best of 3: 52.7 µs per loop

这篇关于基于其值的数组中的总和元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆