numpy:ravel_multi_index递增从遍历索引循环获得的不同结果 [英] numpy: ravel_multi_index increment different results from iterating over indices loop

查看:250
本文介绍了numpy:ravel_multi_index递增从遍历索引循环获得的不同结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个索引数组(可能有重复项),在这些数组中,我将每个索引在另一个2D矩阵中的值每增加1.有一些建议,并且这

I have an array of indices (possible duplicates) where I increment each these of indices in another 2D matrix by 1. There have been several several suggestions and this answer proposes to use np.ravel_multi_index.

所以,我已经尝试过了,但是他们似乎并没有给我同样的答案.知道为什么吗?

So, I've tried it out but they don't seem to give me the same set of answers. Any idea why?

raveled = np.ravel_multi_index(legit_indices.T, acc.shape)
counts = np.bincount(raveled)
acc = np.resize(counts, acc.shape)

acc2 = np.zeros(acc2.shape)
for i in legit_indices:
    acc2[i[0], i[1]] += 1

(Pdb) np.array_equal(acc, acc2)
False

(Pdb) acc[493][5]
135
(Pdb) acc2[493][5]
0.0

推荐答案

您当前的方法存在一些问题.首先,np.bincount(x) 会为您提供每个正整数值x 从0开始的计数 并以max(x)结尾:

There are a few problems with your current approach. Firstly, np.bincount(x) will give you the counts for every positive integer value of x starting at 0 and ending at max(x):

print(np.bincount([1, 1, 3, 3, 3, 4]))
# [0, 2, 0, 3, 1]
# i.e. [count for 0, count for 1, count for 2, count for 3, count for 4]

因此,如果不是acc.flat中的每个位置都被索引,则 np.bincount(raveled)将大于唯一索引的数量.什么 您实际上想要的是acc.flat中那些位置的 only 计数 至少索引一次.

Therefore, if not every location in acc.flat gets indexed, the length of np.bincount(raveled) will be greater than the number of unique indices. What you actually want is the counts only for those locations in acc.flat that are indexed at least once.

第二,您要做的是将垃圾箱计数分配给相应的垃圾箱 索引到acc.flat.您对np.resize的调用是重复部分 为了使它与acc.flat的大小相同,请对您的bincount数组进行排序, 然后将其重塑为与acc相同的形状.这不会导致垃圾箱 计数被分配到acc

Secondly, what you want to do is assign the bin counts to the corresponding indices into acc.flat. What your call to np.resize does is to repeat parts of your array of bincounts in order to make it the same size as acc.flat, then reshape it to the same shape as acc. This will not result in the bin counts being assigned to the correct locations in acc!

我要解决此问题的方法是使用 np.unique 而不是 np.bincount,并使用它返回唯一索引及其对应的索引 计数.然后可以使用这些将正确的计数分配给acc中的正确的唯一位置:

The way I would solve this problem would be to use np.unique instead of np.bincount, and use it to return both the unique indices and their corresponding counts. These can then be used to assign the correct counts to the correct unique locations within acc:

import numpy as np

# some example data
acc = np.zeros((4, 3))
legit_indices = np.array([[0, 1],
                          [0, 1],
                          [1, 2],
                          [1, 0],
                          [1, 0],
                          [1, 0]])

# convert the index array into a set of indices into acc.flat
flat_idx = np.ravel_multi_index(legit_indices.T, acc.shape)

# get the set of unique indices and their corresponding counts
uidx, ucounts = np.unique(flat_idx, return_counts=True)

# assign the count value to each unique index in acc.flat
acc.flat[uidx] = ucounts

# confirm that this matches the result of your for loop
acc2 = np.zeros_like(acc)
for ii, jj in legit_indices:
    acc2[ii, jj] += 1

assert np.array_equal(acc, acc2)

这篇关于numpy:ravel_multi_index递增从遍历索引循环获得的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆