numpy:通过关联从关联中找到最小值和最大值 [英] Numpy: Finding minimum and maximum values from associations through binning
问题描述
这是一个源自此帖子的问题.因此,对该问题的一些介绍将类似于该帖子.
This is a question derived from this post. So, some of the introduction of the problem will be similar to that post.
假设result
是2D数组,而values
是1D数组. values
保留一些与result
中的每个元素关联的值. values
中的元素到result
的映射存储在x_mapping
和y_mapping
中. result
中的位置可以与不同的值关联.现在,我必须找到按关联分组的最小值和最大值.
Let's say result
is a 2D array and values
is a 1D array. values
holds some values associated with each element in result
. The mapping of an element in values
to result
is stored in x_mapping
and y_mapping
. A position in result
can be associated with different values. Now, I have to find the minimum and maximum of the values grouped by associations.
一个更好地说明问题的例子.
An example for better clarification.
min_result
数组:
[[0, 0],
[0, 0],
[0, 0],
[0, 0]]
max_result
数组:
[[0, 0],
[0, 0],
[0, 0],
[0, 0]]
values
数组:
[ 1., 2., 3., 4., 5., 6., 7., 8.]
注意:这里result
数组和values
具有相同数量的元素.但事实并非如此.大小之间根本没有关系.
Note: Here result
arrays and values
have the same number of elements. But it might not be the case. There is no relation between the sizes at all.
x_mapping
和y_mapping
具有从1D values
到2D result
的映射(最小和最大). x_mapping
,y_mapping
和values
的大小将相同.
x_mapping
and y_mapping
have mappings from 1D values
to 2D result
(both min and max). The sizes of x_mapping
, y_mapping
and values
will be the same.
x_mapping
-[0, 1, 0, 0, 0, 0, 0, 0]
y_mapping
-[0, 3, 2, 2, 0, 3, 2, 1]
此处,第一个值(values[0]
)和第五个值(values[4]
)的x为0,y为0(x_mapping[0]
和y_mappping[0]
),因此与result[0, 0]
相关联.如果我们从该组计算最小值和最大值,则结果将分别为1和5.因此,min_result[0, 0]
将具有1,max_result[0, 0]
将具有5.
Here, 1st value(values[0]
) and 5th value(values[4]
) have x as 0 and y as 0(x_mapping[0]
and y_mappping[0]
) and hence associated with result[0, 0]
. If we compute the minimum and maximum from this group, we will have 1 and 5 as results respectively. So, min_result[0, 0]
will have 1 and max_result[0, 0]
will have 5.
请注意,如果根本没有关联,则result
的默认值为零.
Note that if there is no association at all then the default value for result
will be zero.
x_mapping = np.array([0, 1, 0, 0, 0, 0, 0, 0])
y_mapping = np.array([0, 3, 2, 2, 0, 3, 2, 1])
values = np.array([ 1., 2., 3., 4., 5., 6., 7., 8.], dtype=np.float32)
max_result = np.zeros([4, 2], dtype=np.float32)
min_result = np.zeros([4, 2], dtype=np.float32)
min_result[-y_mapping, x_mapping] = values # randomly initialising from values
for i in range(values.size):
x = x_mapping[i]
y = y_mapping[i]
# maximum
if values[i] > max_result[-y, x]:
max_result[-y, x] = values[i]
# minimum
if values[i] < min_result[-y, x]:
min_result[-y, x] = values[i]
min_result
,
[[1., 0.],
[6., 2.],
[3., 0.],
[8., 0.]]
max_result
,
[[5., 0.],
[6., 2.],
[7., 0.],
[8., 0.]]
失败的解决方案
#1
min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-17-126de899a90e> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, [-y_mapping, x_mapping], out=min_result)
ValueError: object too deep for desired array
#2
min_result = np.zeros([4, 2], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-07e8c75ccaa5> in <module>()
1 min_result = np.zeros([4, 2], dtype=np.float32)
----> 2 np.minimum.reduceat(values, lidx, out= min_result)
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (4,2)->(4,) (8,)->() (8,)->(8,)
#3
lidx = ((-y_mapping) % 4) * 2 + x_mapping #from mentioned post
min_result = np.zeros([8], dtype=np.float32)
np.minimum.reduceat(values, lidx, out= min_result).reshape(4,2)
[[1., 4.],
[5., 5.],
[1., 3.],
[5., 7.]]
问题
如何使用np.minimum.reduceat
和np.maximum.reduceat
解决此问题?我正在寻找针对运行时进行优化的解决方案.
Question
How to use np.minimum.reduceat
and np.maximum.reduceat
for solving this problem? I'm looking for a solution that is optimised for runtime.
我正在将Numpy版本1.14.3与Python 3.5.2结合使用
I'm using Numpy version 1.14.3 with Python 3.5.2
推荐答案
方法1
Again, the most intuitive ones would be with numpy.ufunc.at
.
Now, since, these reductions would be performed against the existing values, we need to initialize the output with max values for minimum reductions and min values for maximum ones. Hence, the implementation would be -
min_result[-y_mapping, x_mapping] = values.max()
max_result[-y_mapping, x_mapping] = values.min()
np.minimum.at(min_result, [-y_mapping, x_mapping], values)
np.maximum.at(max_result, [-y_mapping, x_mapping], values)
方法2
要利用np.ufunc.reduceat
,我们需要对数据进行排序-
To leverage np.ufunc.reduceat
, we need to sort data -
m,n = max_result.shape
out_dtype = max_result.dtype
lidx = ((-y_mapping)%m)*n + x_mapping
sidx = lidx.argsort()
idx = lidx[sidx]
val = values[sidx]
m_idx = np.flatnonzero(np.r_[True,idx[:-1] != idx[1:]])
unq_ids = idx[m_idx]
max_result_out.flat[unq_ids] = np.maximum.reduceat(val, m_idx)
min_result_out.flat[unq_ids] = np.minimum.reduceat(val, m_idx)
这篇关于numpy:通过关联从关联中找到最小值和最大值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!