了解==应用于NumPy数组 [英] Understanding == applied to a NumPy array

查看:70
本文介绍了了解==应用于NumPy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python的新手,我正在学习 TensorFlow .在使用 notMNIST数据集的教程中,他们提供了示例代码来将标签矩阵转换为n之一的编码数组.

I'm new to Python, and I am learning TensorFlow. In a tutorial using the notMNIST dataset, they give example code to transform the labels matrix to a one-of-n encoded array.

目标是获取一个由标签整数0 ... 9组成的数组,并返回一个矩阵,其中每个整数已被转换为n之一的编码数组,如下所示:

The goal is to take an array consisting of label integers 0...9, and return a matrix where each integer has been transformed into a one-of-n encoded array like this:

0 -> [1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
1 -> [0, 1, 0, 0, 0, 0, 0, 0, 0, 0]
2 -> [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
...

他们为此提供的代码是:

The code they give to do this is:

# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)

但是,我完全不了解这段代码是如何做到的.看起来它只是生成一个介于0到9之间的整数数组,然后将其与标签矩阵进行比较,然后将结果转换为浮点型. ==运算符如何产生 n个编码矩阵之一?

However, I don't understand how this code does that at all. It looks like it just generates an array of integers in the range of 0 to 9, and then compares that with the labels matrix, and converts the result to a float. How does an == operator result in a one-of-n encoded matrix?

推荐答案

这里发生了一些事情:numpy的矢量操作,添加单轴和广播.

There are a few things going on here: numpy's vector ops, adding a singleton axis, and broadcasting.

首先,您应该能够看到==是如何做魔术的.

First, you should be able to see how the == does the magic.

比方说,我们从一个简单的标签数组开始. ==的行为是矢量化的,这意味着我们可以将整个数组与标量进行比较,并获得一个包含每个逐元素比较的值的数组.例如:

Let's say we start with a simple label array. == behaves in a vectorized fashion, which means that we can compare the entire array with a scalar and get an array consisting of the values of each elementwise comparison. For example:

>>> labels = np.array([1,2,0,0,2])
>>> labels == 0
array([False, False,  True,  True, False], dtype=bool)
>>> (labels == 0).astype(np.float32)
array([ 0.,  0.,  1.,  1.,  0.], dtype=float32)

首先我们得到一个布尔数组,然后强制转换为浮点数:Python中的False == 0,而True == 1.因此,我们得到一个数组,该数组为0,其中labels不等于0,为1.

First we get a boolean array, and then we coerce to floats: False==0 in Python, and True==1. So we wind up with an array which is 0 where labels isn't equal to 0 and 1 where it is.

但是比较0并没有什么特别的,我们可以比较1或2或3以获得类似的结果:

But there's nothing special about comparing to 0, we could compare to 1 or 2 or 3 instead for similar results:

>>> (labels == 2).astype(np.float32)
array([ 0.,  1.,  0.,  0.,  1.], dtype=float32)

实际上,我们可以遍历每个可能的标签并生成此数组.我们可以使用listcomp:

In fact, we could loop over every possible label and generate this array. We could use a listcomp:

>>> np.array([(labels == i).astype(np.float32) for i in np.arange(3)])
array([[ 0.,  0.,  1.,  1.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  1.]], dtype=float32)

但这并没有真正利用numpy.我们要做的是将每个可能的标签与每个元素进行比较,IOW进行比较

but this doesn't really take advantage of numpy. What we want to do is have each possible label compared with each element, IOW to compare

>>> np.arange(3)
array([0, 1, 2])

使用

>>> labels
array([1, 2, 0, 0, 2])

这是numpy广播的神奇之处.现在,labels是形状为(5,)的一维对象.如果我们将其做成形状为(5,1)的二维对象,则该操作将在最后一个轴上广播",我们将获得形状(5,3)的输出,并比较其中的每个条目标签每个元素的范围.

And here's where the magic of numpy broadcasting comes in. Right now, labels is a 1-dimensional object of shape (5,). If we make it a 2-dimensional object of shape (5,1), then the operation will "broadcast" over the last axis and we'll get an output of shape (5,3) with the results of comparing each entry in the range with each element of labels.

首先,我们可以使用None(或np.newaxis)向"labels"添加额外"轴,更改其形状:

First we can add an "extra" axis to labels using None (or np.newaxis), changing its shape:

>>> labels[:,None]
array([[1],
       [2],
       [0],
       [0],
       [2]])
>>> labels[:,None].shape
(5, 1)

然后我们可以进行比较(这是我们之前看过的布置的转置,但这并不重要).

And then we can make the comparison (this is the transpose of the arrangement we were looking at earlier, but that doesn't really matter).

>>> np.arange(3) == labels[:,None]
array([[False,  True, False],
       [False, False,  True],
       [ True, False, False],
       [ True, False, False],
       [False, False,  True]], dtype=bool)
>>> (np.arange(3) == labels[:,None]).astype(np.float32)
array([[ 0.,  1.,  0.],
       [ 0.,  0.,  1.],
       [ 1.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 0.,  0.,  1.]], dtype=float32)

以numpy进行广播非常强大,非常值得一读.

Broadcasting in numpy is very powerful, and well worth reading up on.

这篇关于了解==应用于NumPy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆