如何确定Numpy数组重塑策略 [英] How to determine a numpy-array reshape strategy

查看:80
本文介绍了如何确定Numpy数组重塑策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于python项目,我经常发现自己在重塑和重新排列n维numpy数组.但是,我很难确定如何解决该问题,形象化整形方法结果的结果以及知道我的解决方案是有效的.

For a python project I often find myself reshaping and re-arranging n-dimensional numpy arrays. However, I have a hard time to determine how to approach the problem, visualize the outcome of the results of the reshaping methods and knowing my solution is efficient.

在遇到此类问题时,我的策略是启动ipython,加载一些示例数据并进行反复试验,直到找到transpose(),reshape()和swapaxes()的组合.得到预期的结果.它可以完成工作,但是对所发生的事情没有真正的了解,并且经常产生难以维护的代码.

At the moment when confronted with such a problem my strategy is to start ipython, load some sample data and go trial and error until I find a combination of transpose()s, reshape()s and swapaxes()s. which gets the desired result. It gets the job done, but without a real understanding of what is going on and often produces code which is hard to maintain.

所以,我的问题是寻找策略.您如何处理这样的问题?当您必须以所需的格式整形时,如何可视化头部中的ndarray?您如何采取正确的行动?

So, my question is about finding a strategy. How do you approach such a problem? How do you visualize an ndarray in your head when you have to shape it in the desired format? How do you come to the right actions?

为使回答更加具体,请使用一个示例:

To make answering a bit more concrete, an example to play with:

假设您要重塑以下3d阵列

Assume you want to reshape the following 3d-array

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

到二维数组中,第3维的第一列放在第一位,第二列的第二列,.... etc

to a 2d-array where the first columns from the 3rd dimension are placed first, the 2nd columns second, ....etc

结果应如下所示:

array([[ 0,  9, 18,  3, 12, 21,  6, 15, 24],
       [ 1, 10, 19,  4, 13, 22,  7, 16, 25],
       [ 2, 11, 20,  5, 14, 23,  8, 17, 26]])

PS.同样,任何有关该主题的阅读材料都将很棒!

PS. also any reading material on the subject would be great!

推荐答案

我经常玩ipython中的形状.但是,为了使事情更清楚,我从具有不同维数的数组开始.

I regularly play about with shapes in ipython. However, to make things clearer, I start with array with distinct dimensions.

arr = np.arange(3*4*5).reshape(3,4,5)

这样,更容易识别轴如何移动,例如:

That way, it's easier to identify how the axes get shifted, for example:

In [25]: arr.shape
Out[25]: (3, 4, 5)

In [26]: arr.T.shape
Out[26]: (5, 4, 3)

In [31]: arr.T.reshape(5,-1)
Out[31]: 
array([[ 0, 20, 40,  5, 25, 45, 10, 30, 50, 15, 35, 55],
       [ 1, 21, 41,  6, 26, 46, 11, 31, 51, 16, 36, 56],
       [ 2, 22, 42,  7, 27, 47, 12, 32, 52, 17, 37, 57],
       [ 3, 23, 43,  8, 28, 48, 13, 33, 53, 18, 38, 58],
       [ 4, 24, 44,  9, 29, 49, 14, 34, 54, 19, 39, 59]])

其中有一个不同的转置(不会切换3,4的顺序)

where as a different transpose (that does not switch the order of 3,4)

In [38]: np.transpose(arr,[2,0,1]).shape
Out[38]: (5, 3, 4)

In [39]: np.transpose(arr,[2,0,1]).reshape(5,-1)
Out[39]: 
array([[ 0,  5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55],
       [ 1,  6, 11, 16, 21, 26, 31, 36, 41, 46, 51, 56],
       [ 2,  7, 12, 17, 22, 27, 32, 37, 42, 47, 52, 57],
       [ 3,  8, 13, 18, 23, 28, 33, 38, 43, 48, 53, 58],
       [ 4,  9, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59]])

在开发函数时,我还喜欢使用奇数"形的数组.这样,如果我弄乱了一些移调或广播,尺寸错误就会对我产生影响.经验告诉我,一旦我确定了正确的尺寸,这些值也将是正确的.或者至少影响值的错误类别与影响维度的错误类别完全不同.

I also like to use 'oddly' shaped arrays like this when developing functions. That way, if I do mess up some transpose or broadcasting, dimensions errors will jump out at me. Experience tells me that once I get the dimensions right, the values will also be correct. Or at least the class of errors that affect values is quite different from those that affect dimensions.

我还大量在开发代码中添加了类似print arr.shape的语句,甚至是assert x.shape==y.shape断言.

I also liberally sprinkle development code with print arr.shape like statements, or even assert x.shape==y.shape assertions.

它还有助于标注尺寸:

M, N, L = 3, 4, 5
np.empty((M,N,L))

或类似einsum

np.einsum('ijk,kj->i', A, B) # if A is (M,N,L), B must be (L,N)

https://stackoverflow.com/a/29903842/901925 是试图理解和解释rollaxis.

另一种策略是查看numpy函数的Python代码.他们通常接受axis参数.看看他们如何使用它们很有启发性.有时,特定的axis会旋转到前面或后面.有时,将nd数组重塑为2d数组,将除一个轴外的所有轴折叠为一.另一些则通过构造和操作索引元组来实现通用性.更加先进的功能不仅可以大幅度提高步伐,而且还可以满足形状要求.

Another strategy is to look at the Python code of numpy functions. Often they accept axis arguments. It's instructive to see how they use those. Sometimes that particular axis is rotated to the front, or to the end. Sometimes a nd array is reshaped into a 2d array, collapsing all axes except one down to one. Other achieve generality by constructing and manipulating an indexing tuple. More advanced functions play with the strides as well as the shape.

通常应该将维度放在首位还是最后是一个优化问题-可能涉及易用性(广播,索引)和速度之间的权衡.请记住,对于"C"顺序,最后一个维度会形成连续的块.

Whether a dimension should be first or last is usally an optimization issue - and may involve tradeoffs between ease of use (broadcasting, indexing) and speed. Just keep in mind that for "C" order, the last dimension forms contiguous blocks.

这篇关于如何确定Numpy数组重塑策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆