更好地协调两个numpy数组的更好方法 [英] Better way to shuffle two numpy arrays in unison

查看:69
本文介绍了更好地协调两个numpy数组的更好方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个不同形状的numpy数组,但是长度相同(前导尺寸).我想对它们中的每一个进行混洗,以使相应的元素继续对应-即相对于它们的前导索引一致地对它们进行混洗.

I have two numpy arrays of different shapes, but with the same length (leading dimension). I want to shuffle each of them, such that corresponding elements continue to correspond -- i.e. shuffle them in unison with respect to their leading indices.

此代码有效,并说明了我的目标:

This code works, and illustrates my goals:

def shuffle_in_unison(a, b):
    assert len(a) == len(b)
    shuffled_a = numpy.empty(a.shape, dtype=a.dtype)
    shuffled_b = numpy.empty(b.shape, dtype=b.dtype)
    permutation = numpy.random.permutation(len(a))
    for old_index, new_index in enumerate(permutation):
        shuffled_a[new_index] = a[old_index]
        shuffled_b[new_index] = b[old_index]
    return shuffled_a, shuffled_b

例如:

>>> a = numpy.asarray([[1, 1], [2, 2], [3, 3]])
>>> b = numpy.asarray([1, 2, 3])
>>> shuffle_in_unison(a, b)
(array([[2, 2],
       [1, 1],
       [3, 3]]), array([2, 1, 3]))

但是,这感觉笨拙,效率低下且缓慢,并且它需要复制数组-我宁愿就地对它们进行混洗,因为它们会很大.

However, this feels clunky, inefficient, and slow, and it requires making a copy of the arrays -- I'd rather shuffle them in-place, since they'll be quite large.

是否有更好的方法来解决此问题?更快的执行速度和更低的内存使用是我的主要目标,但是优美的代码也将是不错的选择.

Is there a better way to go about this? Faster execution and lower memory usage are my primary goals, but elegant code would be nice, too.

另一个我以为是的

def shuffle_in_unison_scary(a, b):
    rng_state = numpy.random.get_state()
    numpy.random.shuffle(a)
    numpy.random.set_state(rng_state)
    numpy.random.shuffle(b)

这行得通...但是有点吓人,因为我几乎看不到它会继续起作用-例如,它看起来像不能保证在numpy版本中生存的那种东西.

This works...but it's a little scary, as I see little guarantee it'll continue to work -- it doesn't look like the sort of thing that's guaranteed to survive across numpy version, for example.

推荐答案

您的吓人"解决方案对我来说并不可怕.对长度相同的两个序列调用shuffle()会导致对随机数生成器的调用次数相同,这是随机播放算法中唯一的随机"元素.通过重置状态,可以确保对随机数生成器的调用将在第二次对shuffle()的调用中给出相同的结果,因此整个算法将生成相同的排列.

Your "scary" solution does not appear scary to me. Calling shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to shuffle(), so the whole algorithm will generate the same permutation.

如果您不喜欢这样,另一种解决方案是将数据存储在一个数组中,而不是从一开始就存储在两个数组中,然后在此单个数组中创建两个视图以模拟您现在拥有的两个数组.您可以将单个数组用于改组,并将视图用于所有其他目的.

If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.

示例:假设数组ab看起来像这样:

Example: Let's assume the arrays a and b look like this:

a = numpy.array([[[  0.,   1.,   2.],
                  [  3.,   4.,   5.]],

                 [[  6.,   7.,   8.],
                  [  9.,  10.,  11.]],

                 [[ 12.,  13.,  14.],
                  [ 15.,  16.,  17.]]])

b = numpy.array([[ 0.,  1.],
                 [ 2.,  3.],
                 [ 4.,  5.]])

我们现在可以构造一个包含所有数据的数组:

We can now construct a single array containing all the data:

c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)]
# array([[  0.,   1.,   2.,   3.,   4.,   5.,   0.,   1.],
#        [  6.,   7.,   8.,   9.,  10.,  11.,   2.,   3.],
#        [ 12.,  13.,  14.,  15.,  16.,  17.,   4.,   5.]])

现在,我们创建模拟原始ab的视图:

Now we create views simulating the original a and b:

a2 = c[:, :a.size//len(a)].reshape(a.shape)
b2 = c[:, a.size//len(a):].reshape(b.shape)

a2b2的数据与c共享.要同时对两个数组进行混洗,请使用numpy.random.shuffle(c).

The data of a2 and b2 is shared with c. To shuffle both arrays simultaneously, use numpy.random.shuffle(c).

在生产代码中,您当然会尝试完全避免创建原始的ab,而立即创建ca2b2.

In production code, you would of course try to avoid creating the original a and b at all and right away create c, a2 and b2.

该解决方案可以适应ab具有不同dtype的情况.

This solution could be adapted to the case that a and b have different dtypes.

这篇关于更好地协调两个numpy数组的更好方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆