Numpy 随机选择以生成具有所有唯一值的二维数组 [英] Numpy random choice to produce a 2D-array with all unique values

查看:53
本文介绍了Numpy 随机选择以生成具有所有唯一值的二维数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我想知道是否有更有效的解决方案来使用 np.random.choice 生成二维数组,其中每一行都有唯一的值.

例如,对于形状为 (3,4) 的数组,我们期望输出:

# 给定形状的预期输出 (3,4)数组([[0, 1, 3, 2],[2, 3, 1, 0],[1, 3, 2, 0]])

这意味着每一行的值在列数方面必须是唯一的.所以对于 out 中的每一行,整数应该只在 0 到 3 之间.

我知道我可以通过将 False 传递给 replace 参数来实现它.但我只能为每一行而不是整个矩阵.例如,我可以这样做:

<预><代码>>>>np.random.choice(4, size=(1,4), replace=False)数组([[0,2,3,1]])

但是当我尝试这样做时:

<预><代码>>>>np.random.choice(4, size=(3,4), replace=False)

我收到这样的错误:

 文件",第 1 行,在  中文件mtrand.pyx",第 1150 行,在 mtrand.RandomState.choice 中(numpy\random\mtrand\mtrand.c:18113)ValueError: 不能采用比总体更大的样本'替换=假'

我认为这是因为它试图绘制 3 x 4 = 12 个样本,因为矩阵的大小没有替换,但我只给它一个 4.

我知道我可以通过使用 for-loop 来解决它:

 >>>a = (np.random.choice(4,size=4,replace=False) for _ in range(3))>>>np.vstack(a)数组([[3, 1, 2, 0],[1, 2, 0, 3],[2, 0, 3, 1]])

但我想知道是否有不使用任何 for 循环的解决方法?(我有点假设如果我的行数大于 1000,添加 for 循环可能会使它变慢.但正如你所看到的,我实际上是在 a 中创建一个生成器,所以我也不知道到底有没有效果.)

解决方案

我经常使用的一个技巧是生成一个随机数组并使用 argsort 来获取唯一索引作为所需的唯一数字.因此,我们可以这样做 -

def random_choice_noreplace(m,n,axis=-1):# m, n 是行数,输出的列数返回 np.random.rand(m,n).argsort(axis=axis)

样品运行 -

在[98]中:random_choice_noreplace(3,7)出[98]:数组([[0, 4, 3, 2, 6, 5, 1],[5, 1, 4, 6, 0, 2, 3],[6, 1, 0, 4, 5, 3, 2]])在 [99]: random_choice_noreplace(5,7, axis=0) # 沿列的唯一 nums出[99]:数组([[0, 2, 4, 4, 1, 0, 2],[1, 4, 3, 2, 4, 1, 3],[3, 1, 1, 3, 2, 3, 0],[2, 3, 0, 0, 0, 2, 4],[4, 0, 2, 1, 3, 4, 1]])

运行时测试 -

# 原始方法def loopy_app(m,n):a = (np.random.choice(n,size=n,replace=False) for _ in range(m))返回 np.vstack(a)

时间 -

在 [108]: %timeit loopy_app(1000,100)10 个循环,最好的 3 个:每个循环 20.6 毫秒在 [109]: %timeit random_choice_noreplace(1000,100)100 个循环,最好的 3 个:每个循环 3.66 毫秒

So I am wondering if there's a more efficient solution in generating a 2-D array using np.random.choice where each row has unique values.

For example, for an array with shape (3,4), we expect an output of:

# Expected output given a shape (3,4)
array([[0, 1, 3, 2],
       [2, 3, 1, 0],
       [1, 3, 2, 0]])

This means that the values for each row must be unique with respect to the number of columns. So for each row in out, the integers should only fall between 0 to 3.

I know that I can achieve it by passing False to the replace argument. But I can only do it for each row and not for the whole matrix. For instance, I can do this:

>>> np.random.choice(4, size=(1,4), replace=False)
array([[0,2,3,1]])

But when I try to do this:

>>> np.random.choice(4, size=(3,4), replace=False)

I get an error like this:

 File "<stdin>", line 1, in <module>
 File "mtrand.pyx", line 1150, in mtrand.RandomState.choice 
 (numpy\random\mtrand\mtrand.c:18113)
 ValueError: Cannot take a larger sample than population when 
 'replace=False'

I assume it's because it's trying to draw 3 x 4 = 12 samples due to the size of the matrix without replacement but I'm only giving it a limit of 4.

I know that I can solve it by using a for-loop:

 >>> a = (np.random.choice(4,size=4,replace=False) for _ in range(3))
 >>> np.vstack(a)
 array([[3, 1, 2, 0],
        [1, 2, 0, 3],
        [2, 0, 3, 1]])

But I wanted to know if there's a workaround without using any for-loops? (I'm kinda assuming that adding for-loops might make it slower if I have a number of rows greater than 1000. But as you can see I am actually creating a generator in a so I'm also not sure if it has an effect after all.)

解决方案

One trick I have used often is generating a random array and using argsort to get unique indices as the required unique numbers. Thus, we could do -

def random_choice_noreplace(m,n, axis=-1):
    # m, n are the number of rows, cols of output
    return np.random.rand(m,n).argsort(axis=axis)

Sample runs -

In [98]: random_choice_noreplace(3,7)
Out[98]: 
array([[0, 4, 3, 2, 6, 5, 1],
       [5, 1, 4, 6, 0, 2, 3],
       [6, 1, 0, 4, 5, 3, 2]])

In [99]: random_choice_noreplace(5,7, axis=0) # unique nums along cols
Out[99]: 
array([[0, 2, 4, 4, 1, 0, 2],
       [1, 4, 3, 2, 4, 1, 3],
       [3, 1, 1, 3, 2, 3, 0],
       [2, 3, 0, 0, 0, 2, 4],
       [4, 0, 2, 1, 3, 4, 1]])

Runtime test -

# Original approach
def loopy_app(m,n):
    a = (np.random.choice(n,size=n,replace=False) for _ in range(m))
    return np.vstack(a)

Timings -

In [108]: %timeit loopy_app(1000,100)
10 loops, best of 3: 20.6 ms per loop

In [109]: %timeit random_choice_noreplace(1000,100)
100 loops, best of 3: 3.66 ms per loop

这篇关于Numpy 随机选择以生成具有所有唯一值的二维数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆