numpy:按多维数组对多维数组进行排序 [英] Numpy: Sorting a multidimensional array by a multidimensional array

查看:449
本文介绍了numpy:按多维数组对多维数组进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请原谅我这是多余的还是超基本的.我要从R进入Python/Numpy,并且很难在脑海里翻转事物.

Forgive me if this is redundant or super basic. I'm coming to Python/Numpy from R and having a hard time flipping things around in my head.

我有一个n维数组,我想使用索引值的另一个n维数组进行排序.我知道我可以将其包装成一个循环,但是似乎应该有一种非常简洁的Numpyonic方式将其击败并提交.这是我的示例代码,用于设置n = 2的问题:

I have a n dimensional array which I want to sort using another n dimensional array of index values. I know I could wrap this in a loop but it seems like there should be a really concise Numpyonic way of beating this into submission. Here's my example code to set up the problem where n=2:

a1 = random.standard_normal(size=[2,5]) 
index = array([[0,1,2,4,3] , [0,1,2,3,4] ]) 

所以现在我有一个2 x 5的随机数数组和2 x 5的索引.现在,我已经阅读了take()的帮助大约10次,但是显然我的大脑并没有停滞不前.

so now I have a 2 x 5 array of random numbers and a 2 x 5 index. I've read the help for take() about 10 times now but my brain is not groking it, obviously.

我以为这可以带我去那里

I thought this might get me there:

take(a1, index)

array([[ 0.29589188, -0.71279375, -0.18154864, -1.12184984,  0.25698875],
       [ 0.29589188, -0.71279375, -0.18154864,  0.25698875, -1.12184984]])

但这显然只是对第一个元素重新排序(我认为是由于展平).

but that's clearly reordering only the first element (I presume because of flattening).

关于我如何从我所处的位置获得任何提示的解决方案,这些解决方案将a1的元素0按索引...元素n的元素0排序?

Any tips on how I get from where I am to a solution that sorts element 0 of a1 by element 0 of the index ... element n?

推荐答案

我还没有想到如何在N个维度上进行此操作,但是这是2D版本:

>>> a = np.random.standard_normal(size=(2,5))
>>> a
array([[ 0.72322499, -0.05376714, -0.28316358,  1.43025844, -0.90814293],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])
>>> i = np.array([[0,1,2,4,3],[0,1,2,3,4]]) 
>>> a[np.arange(a.shape[0])[:,np.newaxis],i]
array([[ 0.72322499, -0.05376714, -0.28316358, -0.90814293,  1.43025844],
       [ 0.7459107 ,  0.43020728,  0.05411805, -0.32813465,  2.38829386]])

这是N维版本:

>>> a[list(np.ogrid[[slice(x) for x in a.shape]][:-1])+[i]]

这是它的工作方式:

好吧,让我们从3维数组开始进行说明.

Ok, let's start with a 3 dimensional array for illustration.

>>> import numpy as np
>>> a = np.arange(24).reshape((2,3,4))
>>> a
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

您可以通过指定沿每个轴的索引来访问此数组的元素,如下所示:

You can access elements of this array by specifying the index along each axis as follows:

>>> a[0,1,2]
6

这等效于a[0][1][2],如果我们处理的是列表而不是数组,这就是您访问同一元素的方式.

This is equivalent to a[0][1][2] which is how you would access the same element if we were dealing with a list instead of an array.

在切片数组时,Numpy可以使您变得更加幻想:

Numpy allows you to get even fancier when slicing arrays:

>>> a[[0,1],[1,1],[2,2]]
array([ 6, 18])
>>> a[[0,1],[1,2],[2,2]]
array([ 6, 22])

如果我们处理列表,这些示例将等效于[a[0][1][2],a[1][1][2]][a[0][1][2],a[1][2][2]].

These examples would be equivalent to [a[0][1][2],a[1][1][2]] and [a[0][1][2],a[1][2][2]] if we were dealing with lists.

您甚至可以省去重复的索引,而numpy会找出您想要的内容.例如,以上示例可以等效地写为:

You can even leave out repeated indices and numpy will figure out what you want. For example, the above examples could be equivalently written:

>>> a[[0,1],1,2]
array([ 6, 18])
>>> a[[0,1],[1,2],2]
array([ 6, 22])

在每个维度中切片的数组(或列表)的形状仅影响返回数组的 shape .换句话说,numpy不在乎在尝试获取形状为(2,3,4)的数组时为数组创建索引的方法,只是它会向您反馈形状为(2,3,4)的数组.例如:

The shape of the array (or list) you slice with in each dimension only affects the shape of the returned array. In other words, numpy doesn't care that you are trying to index your array with an array of shape (2,3,4) when it's pulling values, except that it will feed you back an array of shape (2,3,4). For example:

>>> a[[[0,0],[0,0]],[[0,0],[0,0]],[[0,0],[0,0]]]
array([[0, 0],
       [0, 0]])

在这种情况下,我们一次又一次地捕获相同的元素a[0,0,0],但是numpy返回的数组与传入的形状相同.

In this case, we're grabbing the same element, a[0,0,0] over and over again, but numpy is returning an array with the same shape as we passed in.

好,解决您的问题.您想要的是使用index数组中的数字沿最后一个轴索引该数组.因此,对于您问题中的示例,您需要[[a[0,0],a[0,1],a[0,2],a[0,4],a[0,3]],a[1,0],a[1,1],...

Ok, onto your problem. What you want is to index the array along the last axis with the numbers in your index array. So, for the example in your question you would like [[a[0,0],a[0,1],a[0,2],a[0,4],a[0,3]],a[1,0],a[1,1],...

您的索引数组是多维的,就像我之前说的那样,它并没有告诉numpy有关您要从何处提取这些索引的任何信息;它只是指定输出数组的形状.因此,在您的示例中,您需要告诉numpy,前五个值要从a[0]中提取,后五个要从a[1]中提取.容易!

The fact that your index array is multidimensional, like I said earlier, doesn't tell numpy anything about where you want to pull these indices from; it just specifies the shape of the output array. So, in your example, you need to tell numpy that the first 5 values are to be pulled from a[0] and the latter 5 from a[1]. Easy!

>>> a[[[0]*5,[1]*5],index]

它在N维中变得复杂,但让我们对我上面定义的3维数组a进行处理.假设我们有以下索引数组:

It gets complicated in N dimensions, but let's do it for the 3 dimensional array a I defined way above. Suppose we have the following index array:

>>> i = np.array(range(4)[::-1]*6).reshape(a.shape)
>>> i
array([[[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]],

       [[3, 2, 1, 0],
        [3, 2, 1, 0],
        [3, 2, 1, 0]]])

因此,这些值全部用于沿最后一个轴的索引.我们需要告诉numpy这些数字将沿第一和第二轴取哪些索引;也就是说,我们需要告诉numpy第一个轴的索引是:

So, these values are all for indices along the last axis. We need to tell numpy what indices along the first and second axes these numbers are to be taken from; i.e. we need to tell numpy that the indices for the first axis are:

i1 = [[[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]],

      [[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]]]

,第二个轴的索引为:

i2 = [[[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]],

      [[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]]]

然后我们可以做:

>>> a[i1,i2,i]
array([[[ 3,  2,  1,  0],
        [ 7,  6,  5,  4],
        [11, 10,  9,  8]],

       [[15, 14, 13, 12],
        [19, 18, 17, 16],
        [23, 22, 21, 20]]])

生成i1i2的便捷numpy函数称为np.mgrid.我在答案中使用了np.ogrid,在这种情况下,它是等效的,因为我之前谈到过麻木的魔术.

The handy numpy function which generates i1 and i2 is called np.mgrid. I use np.ogrid in my answer which is equivalent in this case because of the numpy magic I talked about earlier.

希望有帮助!

这篇关于numpy:按多维数组对多维数组进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆