NumPy分配中重复索引的处理 [英] Handling of duplicate indices in NumPy assignments

查看:215
本文介绍了NumPy分配中重复索引的处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在设置2D数组中多个元素的值,但是我的数据有时包含给定索引的多个值.

I am setting the values of multiple elements in a 2D array, however my data sometimes contains multiple values for a given index.

似乎总是分配了更高"的值(请参见下面的示例),但是这种行为是否得到保证?或者是否有机会获得不一致的结果?我怎么知道我可以按照矢量分配中的要求来解释以后"?

It seems that the "later" value is always assigned (see examples below) but is this behaviour guaranteed or is there a chance I will get inconsistent results? How do I know that I can interpret "later" in the way that I would like in a vectorized assignment?

即在我的第一个示例中,a肯定总是包含4,而在第二个示例中,它会打印values[0]吗?

i.e. in my first example will a definitely always contain 4 and in the second example would it ever print values[0]?

非常简单的示例:

import numpy as np
indices = np.zeros(5,dtype=np.int)
a[indices] = np.arange(5)
a # array([4])

另一个例子

import numpy as np

grid = np.zeros((1000, 800))

# generate indices and values
xs = np.random.randint(0, grid.shape[0], 100)
ys = np.random.randint(0, grid.shape[1], 100)
values = np.random.rand(100)

# make sure we have a duplicate index
print values[0], values[5]
xs[0] = xs[5]
ys[0] = ys[5]

grid[xs, ys] = values

print "output value is", grid[xs[0], ys[0]]
# always prints value of values[5]

推荐答案

在NumPy 1.9及更高版本中,通常情况下定义不清.

In NumPy 1.9 and later this will in general not be well defined.

当前实现使用单独的迭代器同时遍历所有(广播的)花式索引(和分配数组),并且这些迭代器均使用C阶.换句话说,目前可以.由于您可能想更准确地了解它.如果在处理这些问题的NumPy中比较mapping.c,您会发现它使用了PyArray_ITER_NEXT,即

The current implementation iterates over all (broadcasted) fancy indexes (and the assignment array) at the same time using separate iterators, and these iterators all use C-order. In other words, currently, yes you can. Since you maybe want to know it more exact. If you compare mapping.c in NumPy, which handles these things, you will see that it uses PyArray_ITER_NEXT, which is documented to be in C-order.

为了将来,我会以不同的方式描绘这幅画.我认为使用更新的迭代器将所有索引+赋值数组一起迭代将是很好的.如果这样做,则可以保留订单以供迭代器决定最快的方法.如果您对迭代器保持开放状态,很难说会发生什么,但是您不能确定您的示例是否有效(可能仍然是一维情况,但是...).

For the future I would paint the picture differently. I think it would be good to iterate all indices + the assignment array together using the newer iterator. If this is done, then the order could be kept open for the iterator to decide the fastest way. If you keep it open to the iterator, it is hard to say what would happen, but you cannot be certain that your example works (probably the 1-d case you still can, but...).

据我所知,它目前可以使用,但尚未记录(据我所知),因此,如果您实际上认为应该确保这样做,则需要游说并最好编写一些测试来进行确保可以保证.因为至少有人倾向于说:如果它使事情变得更快,就没有理由确保C阶,但是当然也许有一个很好的理由隐藏在某处...

So, as far as I can tell it works currently, but it is undocumented (for all I know) so if you actually think that this should be ensured, you would need to lobby for it and best write some tests to make sure it can be guaranteed. Because at least am tempted to say: if it makes things faster, there is no reason to ensure C-order, but of course maybe there is a good reason hidden somewhere...

这里的真正问题是:您为什么仍要这么做? ;)

The real question here is: Why do you want that anyway? ;)

这篇关于NumPy分配中重复索引的处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆