numpy.array 形状 (R, 1) 和 (R,) 之间的区别 [英] Difference between numpy.array shape (R, 1) and (R,)

查看:26
本文介绍了numpy.array 形状 (R, 1) 和 (R,) 之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

numpy中,一些操作以(R, 1)的形式返回,但一些返回(R,).这将使矩阵乘法更加乏味,因为需要显式 reshape.例如,给定一个矩阵M,如果我们想做numpy.dot(M[:,0], numpy.ones((1, R))),其中R 是行数(当然,同样的问题也会发生在列上).由于 M[:,0] 的形状是 (R,)numpy.ones((1, R)) 的形状是 (1, R).

所以我的问题是:

  1. 形状 (R, 1)(R,) 之间有什么区别.我从字面上知道它是数字列表和列表列表,其中所有列表只包含一个数字.只是想知道为什么不设计 numpy 以便它有利于形状 (R, 1) 而不是 (R,) 以便更容易地进行矩阵乘法.p>

  2. 上面的例子有更好的方法吗?没有像这样显式地重塑:numpy.dot(M[:,0].reshape(R, 1), numpy.ones((1, R)))

解决方案

1.NumPy 中形状的含义

你写道,我知道字面上是数字列表和列表列表,其中所有列表只包含一个数字",但这有点无助于思考它.

考虑 NumPy 数组的最佳方式是它们由两部分组成,一个数据缓冲区,它只是一个原始元素块,一个视图,它描述了如何解释数据缓冲区.

例如,如果我们创建一个包含 12 个整数的数组:

<预><代码>>>>a = numpy.arange(12)>>>一种数组([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])

然后a由一个数据缓冲区组成,排列如下:

┌────┬────┬────┬────┬────┬────┬────┬────┬─────┬────┬────┬────┐│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

以及描述如何解释数据的视图:

<预><代码>>>>a.标志C_CONTIGUOUS : 真F_CONTIGUOUS : 真自己的数据:真的可写:真对齐:真UPDATEIFCOPY : 错误>>>a.dtypedtype('int64')>>>a.物品尺寸8>>>a.大步(8,)>>>一个形状(12,)

这里的 shape (12,) 表示数组由从 0 到 11 的单个索引进行索引.从概念上讲,如果我们标记这个单个索引 i,数组 a 看起来像这样:

i= 0 1 2 3 4 5 6 7 8 9 10 11┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

如果我们重塑一个数组,这不会'不改变数据缓冲区.相反,它创建了一个新视图,描述了解释数据的不同方式.所以之后:

<预><代码>>>>b = a.reshape((3, 4))

数组 ba 具有相同的数据缓冲区,但现在它由 两个 索引索引,从 0 到 2 和分别为 0 到 3.如果我们标记两个索引 ij,数组 b 看起来像这样:

i= 0 0 0 0 1 1 1 1 2 2 2 2j= 0 1 2 3 0 1 2 3 0 1 2 3┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

这意味着:

<预><代码>>>>b[2,1]9

可以看到第二个索引变化很快,第一个索引变化很慢.如果您希望相反,您可以指定 order 参数:

<预><代码>>>>c = a.reshape((3, 4), order='F')

这导致数组索引如下:

i= 0 1 2 0 1 2 0 1 2 0 1 2j= 0 0 0 1 1 1 2 2 2 3 3 3┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

这意味着:

<预><代码>>>>c[2,1]5

现在应该清楚数组具有一个或多个尺寸为 1 的形状意味着什么.之后:

<预><代码>>>>d = a.reshape((12, 1))

数组 d 由两个索引索引,第一个索引从 0 到 11,第二个索引始终为 0:

i= 0 1 2 3 4 5 6 7 8 9 10 11j= 0 0 0 0 0 0 0 0 0 0 0 0┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

等等:

<预><代码>>>>d[10,0]10

长度为 1 的维度是自由的"(在某种意义上),因此没有什么可以阻止您前往城镇:

<预><代码>>>>e = a.reshape((1, 2, 1, 6, 1))

给出一个这样索引的数组:

i= 0 0 0 0 0 0 0 0 0 0 0 0j= 0 0 0 0 0 0 1 1 1 1 1 1k= 0 0 0 0 0 0 0 0 0 0 0 0l= 0 1 2 3 4 5 0 1 2 3 4 5m= 0 0 0 0 0 0 0 0 0 0 0 0┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │ 10 │ 11 │└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

等等:

<预><代码>>>>e[0,1,0,0,0]6

请参阅 NumPy 内部文档,了解有关数组如何工作的更多详细信息实施.

2.怎么办?

由于 numpy.reshape 只是创建一个新视图,您不必害怕在必要时使用它.当您想以不同的方式索引数组时,它是正确的工具.

然而,在长时间的计算中,通常可以首先安排构造具有正确"形状的数组,从而最小化重塑和转置的次数.但如果没有看到导致需要重塑的实际背景,就很难说应该改变什么.

您问题中的示例是:

numpy.dot(M[:,0], numpy.ones((1, R)))

但这并不现实.一、这个表达式:

M[:,0].sum()

更简单地计算结果.其次,第 0 列真的有什么特别之处吗?也许你真正需要的是:

M.sum(axis=0)

In numpy, some of the operations return in shape (R, 1) but some return (R,). This will make matrix multiplication more tedious since explicit reshape is required. For example, given a matrix M, if we want to do numpy.dot(M[:,0], numpy.ones((1, R))) where R is the number of rows (of course, the same issue also occurs column-wise). We will get matrices are not aligned error since M[:,0] is in shape (R,) but numpy.ones((1, R)) is in shape (1, R).

So my questions are:

  1. What's the difference between shape (R, 1) and (R,). I know literally it's list of numbers and list of lists where all list contains only a number. Just wondering why not design numpy so that it favors shape (R, 1) instead of (R,) for easier matrix multiplication.

  2. Are there better ways for the above example? Without explicitly reshape like this: numpy.dot(M[:,0].reshape(R, 1), numpy.ones((1, R)))

解决方案

1. The meaning of shapes in NumPy

You write, "I know literally it's list of numbers and list of lists where all list contains only a number" but that's a bit of an unhelpful way to think about it.

The best way to think about NumPy arrays is that they consist of two parts, a data buffer which is just a block of raw elements, and a view which describes how to interpret the data buffer.

For example, if we create an array of 12 integers:

>>> a = numpy.arange(12)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Then a consists of a data buffer, arranged something like this:

┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and a view which describes how to interpret the data:

>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> a.dtype
dtype('int64')
>>> a.itemsize
8
>>> a.strides
(8,)
>>> a.shape
(12,)

Here the shape (12,) means the array is indexed by a single index which runs from 0 to 11. Conceptually, if we label this single index i, the array a looks like this:

i= 0    1    2    3    4    5    6    7    8    9   10   11
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

If we reshape an array, this doesn't change the data buffer. Instead, it creates a new view that describes a different way to interpret the data. So after:

>>> b = a.reshape((3, 4))

the array b has the same data buffer as a, but now it is indexed by two indices which run from 0 to 2 and 0 to 3 respectively. If we label the two indices i and j, the array b looks like this:

i= 0    0    0    0    1    1    1    1    2    2    2    2
j= 0    1    2    3    0    1    2    3    0    1    2    3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> b[2,1]
9

You can see that the second index changes quickly and the first index changes slowly. If you prefer this to be the other way round, you can specify the order parameter:

>>> c = a.reshape((3, 4), order='F')

which results in an array indexed like this:

i= 0    1    2    0    1    2    0    1    2    0    1    2
j= 0    0    0    1    1    1    2    2    2    3    3    3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> c[2,1]
5

It should now be clear what it means for an array to have a shape with one or more dimensions of size 1. After:

>>> d = a.reshape((12, 1))

the array d is indexed by two indices, the first of which runs from 0 to 11, and the second index is always 0:

i= 0    1    2    3    4    5    6    7    8    9   10   11
j= 0    0    0    0    0    0    0    0    0    0    0    0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> d[10,0]
10

A dimension of length 1 is "free" (in some sense), so there's nothing stopping you from going to town:

>>> e = a.reshape((1, 2, 1, 6, 1))

giving an array indexed like this:

i= 0    0    0    0    0    0    0    0    0    0    0    0
j= 0    0    0    0    0    0    1    1    1    1    1    1
k= 0    0    0    0    0    0    0    0    0    0    0    0
l= 0    1    2    3    4    5    0    1    2    3    4    5
m= 0    0    0    0    0    0    0    0    0    0    0    0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> e[0,1,0,0,0]
6

See the NumPy internals documentation for more details about how arrays are implemented.

2. What to do?

Since numpy.reshape just creates a new view, you shouldn't be scared about using it whenever necessary. It's the right tool to use when you want to index an array in a different way.

However, in a long computation it's usually possible to arrange to construct arrays with the "right" shape in the first place, and so minimize the number of reshapes and transposes. But without seeing the actual context that led to the need for a reshape, it's hard to say what should be changed.

The example in your question is:

numpy.dot(M[:,0], numpy.ones((1, R)))

but this is not realistic. First, this expression:

M[:,0].sum()

computes the result more simply. Second, is there really something special about column 0? Perhaps what you actually need is:

M.sum(axis=0)

这篇关于numpy.array 形状 (R, 1) 和 (R,) 之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆