numpy数组:行/列明智argmax,具有随机关系 [英] Numpy arrays: row/column wise argmax with random ties

查看:108
本文介绍了numpy数组:行/列明智argmax,具有随机关系的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我要在Python 2.7中使用Numpy进行的操作.假设我有以下定义的数组a:

Here is what I am trying to do with Numpy in Python 2.7. Suppose I have an array a defined by the following:

a = np.array([[1,3,3],[4,5,6],[7,8,1]])

我可以执行a.argmax(0)a.argmax(1)来获取行/列明智的argmax:

I can do a.argmax(0) or a.argmax(1) to get the row/column wise argmax:

a.argmax(0)
Out[329]: array([2, 2, 1], dtype=int64)
a.argmax(1)
Out[330]: array([1, 2, 1], dtype=int64)

但是,当在a的第一行中有一个领带时,我想获得在领带之间随机确定的argmax(默认情况下,只要在argmax或argmin中出现领带,Numpy都会返回第一个元素)

However, when there is a tie like in a's first row, I would like to get the argmax decided randomly between the ties (by default, Numpy returns the first element whenever a tie occurs in argmax or argmin).

去年,有人对随机解决Numpy argmax/argmin关系提出了疑问:

Last year, someone put a question on solving Numpy argmax/argmin ties randomly: Select One Element in Each Row of a Numpy Array by Column Indices

但是,该问题针对一维数组.在那里,投票率最高的答案非常适合.还有第二个答案,它也尝试解决多维数组的问题,但不起作用-即它不会返回,对于每一行/列,最大值的索引(带有随机解决的联系).

However, the question aimed at uni-dimensional arrays. There, the most voted answer works well for that. There is a second answer that attempts to solve the problem also for multidimensional arrays but doesn't work - i.e. it does not return, for each row/column the index of the maximum value with ties solved randomly.

既然我正在处理大型数组,那么最有效的方法是什么?

What would be the most performent way to do that, since I am working with big arrays?

推荐答案

每个案例中选择一个通用案例解决方案

要解决从指定列表范围的数字列表/数字数组中选择随机数的一般情况,我们将使用一个技巧来创建一个统一的rand数组,添加由间隔长度指定的偏移量,然后执行argsort.实现看起来像这样-

Generic case solution to pick one per group

To solve a general case of picking a random number from a list/array of numbers that specify the ranges for the picks, we would use a trick of creating a uniform rand array, add offset specified by the interval lengths and then perform argsort. The implementation would look something like this -

def random_num_per_grp(L):
    # For each element in L pick a random number within range specified by it
    r1 = np.random.rand(np.sum(L)) + np.repeat(np.arange(len(L)),L)
    offset = np.r_[0,np.cumsum(L[:-1])]
    return r1.argsort()[offset] - offset

示例案例-

In [217]: L = [5,4,2]

In [218]: random_num_per_grp(L) # i.e. select one per [0-5,0-4,0-2]
Out[218]: array([2, 0, 1])

因此,输出将具有与输入L中相同的元素数量,第一个输出元素将位于[0,5)中,第二个输出元素位于[0,4)中,依此类推.

So, the output would have same number of elements as in input L and the first output element would be in [0,5), second in [0,4) and so on.

要在此处解决问题,我们将使用修改后的版本(具体是在函数的末尾删除偏移量删除部分,就像这样-

To solve our case here, we would use a modified version (specifically remove the offset removal part at the end of the func, like so -

def random_num_per_grp_cumsumed(L):
    # For each element in L pick a random number within range specified by it
    # The final output would be a cumsumed one for use with indexing, etc.
    r1 = np.random.rand(np.sum(L)) + np.repeat(np.arange(len(L)),L)
    offset = np.r_[0,np.cumsum(L[:-1])]
    return r1.argsort()[offset] 

方法1

一种解决方案可以像这样使用它-

One solution could use it like so -

def argmax_per_row_randtie(a):
    max_mask = a==a.max(1,keepdims=1)
    m,n = a.shape
    all_argmax_idx = np.flatnonzero(max_mask)
    offset = np.arange(m)*n
    return all_argmax_idx[random_num_per_grp_cumsumed(max_mask.sum(1))] - offset

验证

让我们对给定的样本进行大量的运行测试,并对每一行中每个索引的出现次数进行计数

Let's test out on the given sample with a huge number of runs and count number of occurences for each index for each row

In [235]: a
Out[235]: 
array([[1, 3, 3],
       [4, 5, 6],
       [7, 8, 1]])

In [225]: all_out = np.array([argmax_per_row_randtie(a) for i in range(10000)])

# The first element (row=0) should have similar probabilities for 1 and 2
In [236]: (all_out[:,0]==1).mean()
Out[236]: 0.504

In [237]: (all_out[:,0]==2).mean()
Out[237]: 0.496

# The second element (row=1) should only have 2
In [238]: (all_out[:,1]==2).mean()
Out[238]: 1.0

# The third element (row=2) should only have 1
In [239]: (all_out[:,2]==1).mean()
Out[239]: 1.0

方法2:使用masking提高效果

Approach #2 : Use masking for performance

我们可以使用masking,因此避免flatnonzero的目的是提高性能,就像通常使用布尔数组一样.另外,我们将概括性地覆盖行(轴= 1)和列(轴= 0),以便为自己修改后的内容,例如-

We could make use of masking and hence avoid that flatnonzero with the intention of gaining performance as working with boolean arrays generally is. Also, we would generalize to cover both rows (axis=1) and columns(axis=0) to give ourselves a modified one, like so -

def argmax_randtie_masking_generic(a, axis=1): 
    max_mask = a==a.max(axis=axis,keepdims=True)
    m,n = a.shape
    L = max_mask.sum(axis=axis)
    set_mask = np.zeros(L.sum(), dtype=bool)
    select_idx = random_num_per_grp_cumsumed(L)
    set_mask[select_idx] = True
    if axis==0:
        max_mask.T[max_mask.T] = set_mask
    else:
        max_mask[max_mask] = set_mask
    return max_mask.argmax(axis=axis) 

样品在axis=0axis=1-

In [423]: a
Out[423]: 
array([[1, 3, 3],
       [4, 5, 6],
       [7, 8, 1]])
In [424]: argmax_randtie_masking_generic(a, axis=1)
Out[424]: array([1, 2, 1])

In [425]: argmax_randtie_masking_generic(a, axis=1)
Out[425]: array([2, 2, 1])

In [426]: a[1,1] = 8

In [427]: a
Out[427]: 
array([[1, 3, 3],
       [4, 8, 6],
       [7, 8, 1]])

In [428]: argmax_randtie_masking_generic(a, axis=0)
Out[428]: array([2, 1, 1])

In [429]: argmax_randtie_masking_generic(a, axis=0)
Out[429]: array([2, 1, 1])

In [430]: argmax_randtie_masking_generic(a, axis=0)
Out[430]: array([2, 2, 1])

这篇关于numpy数组:行/列明智argmax,具有随机关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆