向量化序列解释 [英] Vectorize Sequences explanation
问题描述
学习使用 Python 进行深度学习,我无法理解以下将整数序列编码为二进制矩阵的简单代码.
Studying Deep Learning with Python, I can't comprehend the following simple batch of code which encodes the integer sequences into a binary matrix.
def vectorize_sequences(sequences, dimension=10000):
# Create an all-zero matrix of shape (len(sequences), dimension)
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
results[i, sequence] = 1. # set specific indices of results[i] to 1s
return results
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
x_train = vectorize_sequences(train_data)
x_train 的输出类似于
And the output of x_train is something like
x_train[0]数组([ 0., 1.,1., ...,0.,0.,0.])
x_train[0] array([ 0., 1.,1., ...,0.,0.,0.])
有人可以在 x_train 数组中添加 0.
的存在,而只有 1.
附加在每个下一个 i
中> 迭代?我的意思是不应该全是 1?
Can someone put some light of the 0.
's existance in x_train array while only 1.
's are appending in each next i
iteration?
I mean shouldn't be all 1's?
推荐答案
这里的 for 循环并不是处理所有的矩阵.如您所见,它枚举了序列的元素,因此它仅在一维上循环.举个简单的例子:
The for loop here is not processing all the matrix. As you can see, it enumerates elements of the sequence, so it's looping only on one dimension. Let's take a simple example :
t = np.array([1,2,3,4,5,6,7,8,9])r = np.zeros((len(t), 10))
输出
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
然后我们以与您相同的方式修改元素:
then we modify elements with the same way you have :
for i, s in enumerate(t):r[i,s] = 1.
array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
你可以看到 for 循环只修改了一组具有索引 [i,s] 的元素 (len(t))(在本例中为 ; (0, 1), (1, 2), (2,3), 等等))
you can see that the for loop modified only a set of elements (len(t)) which has index [i,s] (in this case ; (0, 1), (1, 2), (2, 3), an so on))
这篇关于向量化序列解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!