处理 NumPy 赋值中的重复索引 [英] Handling of duplicate indices in NumPy assignments

查看:28
本文介绍了处理 NumPy 赋值中的重复索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在设置二维数组中多个元素的值,但是我的数据有时包含给定索引的多个值.

I am setting the values of multiple elements in a 2D array, however my data sometimes contains multiple values for a given index.

似乎总是分配稍后"值(参见下面的示例),但这种行为是否有保证,或者我是否有可能得到不一致的结果?我怎么知道我可以按照我在矢量化分配中想要的方式解释稍后"?

It seems that the "later" value is always assigned (see examples below) but is this behaviour guaranteed or is there a chance I will get inconsistent results? How do I know that I can interpret "later" in the way that I would like in a vectorized assignment?

即在我的第一个例子中 a 肯定总是包含 4 而在第二个例子中它会打印 values[0] 吗?

i.e. in my first example will a definitely always contain 4 and in the second example would it ever print values[0]?

非常简单的例子:

import numpy as np
indices = np.zeros(5,dtype=np.int)
a[indices] = np.arange(5)
a # array([4])

另一个例子

import numpy as np

grid = np.zeros((1000, 800))

# generate indices and values
xs = np.random.randint(0, grid.shape[0], 100)
ys = np.random.randint(0, grid.shape[1], 100)
values = np.random.rand(100)

# make sure we have a duplicate index
print values[0], values[5]
xs[0] = xs[5]
ys[0] = ys[5]

grid[xs, ys] = values

print "output value is", grid[xs[0], ys[0]]
# always prints value of values[5]

推荐答案

在 NumPy 1.9 及更高版本中,这通常不会得到很好的定义.

In NumPy 1.9 and later this will in general not be well defined.

当前实现使用单独的迭代器同时迭代所有(广播的)花式索引(和赋值数组),并且这些迭代器都使用 C 顺序.换句话说,目前,是的,你可以.因为您可能想更准确地了解它.如果你比较 NumPy 中处理这些事情的 mapping.c,你会看到它使用了 PyArray_ITER_NEXT,也就是 记录为 C 顺序.

The current implementation iterates over all (broadcasted) fancy indexes (and the assignment array) at the same time using separate iterators, and these iterators all use C-order. In other words, currently, yes you can. Since you maybe want to know it more exact. If you compare mapping.c in NumPy, which handles these things, you will see that it uses PyArray_ITER_NEXT, which is documented to be in C-order.

对于未来,我会以不同的方式描绘这幅画.我认为使用较新的迭代器将所有索引 + 赋值数组一起迭代会很好.如果这样做,那么顺序可以保持打开状态,以便迭代器决定最快的方式.如果你对迭代器保持开放,很难说会发生什么,但你不能确定你的例子是否有效(可能一维情况你仍然可以,但是......).

For the future I would paint the picture differently. I think it would be good to iterate all indices + the assignment array together using the newer iterator. If this is done, then the order could be kept open for the iterator to decide the fastest way. If you keep it open to the iterator, it is hard to say what would happen, but you cannot be certain that your example works (probably the 1-d case you still can, but...).

所以,据我所知它目前有效,但它没有记录(据我所知)所以如果你真的认为应该确保这一点,你需要游说它并最好编写一些测试来肯定可以保证的.因为至少我很想说:如果它让事情变得更快,就没有理由确保 C-order,但当然也许有一个很好的理由隐藏在某处......

So, as far as I can tell it works currently, but it is undocumented (for all I know) so if you actually think that this should be ensured, you would need to lobby for it and best write some tests to make sure it can be guaranteed. Because at least am tempted to say: if it makes things faster, there is no reason to ensure C-order, but of course maybe there is a good reason hidden somewhere...

这里真正的问题是:你为什么要那样做?;)

The real question here is: Why do you want that anyway? ;)

这篇关于处理 NumPy 赋值中的重复索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆