numpy : 计算 softmax 函数的导数 [英] numpy : calculate the derivative of the softmax function
问题描述
我正在尝试使用 MNIST
在一个简单的 3 层神经网络中理解 backpropagation
.
I am trying to understand backpropagation
in a simple 3 layered neural network with MNIST
.
输入层有weights
和bias
.标签是 MNIST
所以它是一个 10
类向量.
There is the input layer with weights
and a bias
. The labels are MNIST
so it's a 10
class vector.
第二层是线性变换
.第三层是 softmax 激活
以获取输出作为概率.
The second layer is a linear tranform
. The third layer is the softmax activation
to get the output as probabilities.
Backpropagation
计算每一步的导数并将其称为梯度.
Backpropagation
calculates the derivative at each step and call this the gradient.
前一层将 global
或 previous
渐变附加到 local 渐变
.我在计算 softmax
Previous layers appends the global
or previous
gradient to the local gradient
. I am having trouble calculating the local gradient
of the softmax
网上的一些资源对 softmax 及其导数进行了解释,甚至给出了 softmax 本身的代码示例
Several resources online go through the explanation of the softmax and its derivatives and even give code samples of the softmax itself
def softmax(x):
"""Compute the softmax of vector x."""
exps = np.exp(x)
return exps / np.sum(exps)
关于何时i = j
和i != j
来解释导数.这是我想出的一个简单的代码片段,希望能验证我的理解:
The derivative is explained with respect to when i = j
and when i != j
. This is a simple code snippet I've come up with and was hoping to verify my understanding:
def softmax(self, x):
"""Compute the softmax of vector x."""
exps = np.exp(x)
return exps / np.sum(exps)
def forward(self):
# self.input is a vector of length 10
# and is the output of
# (w * x) + b
self.value = self.softmax(self.input)
def backward(self):
for i in range(len(self.value)):
for j in range(len(self.input)):
if i == j:
self.gradient[i] = self.value[i] * (1-self.input[i))
else:
self.gradient[i] = -self.value[i]*self.input[j]
那么 self.gradient
是 local gradient
,它是一个向量.这个对吗?有没有更好的写法?
Then self.gradient
is the local gradient
which is a vector. Is this correct? Is there a better way to write this?
推荐答案
我假设你有一个带有 W1
的 3-layer NN,b1
与从输入层到隐藏层的线性变换和W2
、b2
与从隐藏层到输出层的线性变换相关联.Z1
和 Z2
是隐藏层和输出层的输入向量.a1
和 a2
分别代表隐藏层和输出层的输出.a2
是您的预测输出.delta3
和 delta2
是误差(反向传播),您可以看到损失函数相对于模型参数的梯度.
I am assuming you have a 3-layer NN with W1
, b1
for is associated with the linear transformation from input layer to hidden layer and W2
, b2
is associated with linear transformation from hidden layer to output layer. Z1
and Z2
are the input vector to the hidden layer and output layer. a1
and a2
represents the output of the hidden layer and output layer. a2
is your predicted output. delta3
and delta2
are the errors (backpropagated) and you can see the gradients of the loss function with respect to model parameters.
这是 3 层 NN(输入层,只有一个隐藏层和一个输出层)的一般场景.您可以按照上述过程计算梯度,这应该很容易计算!由于这篇文章的另一个答案已经指出了您代码中的问题,因此我不再重复相同的内容.
This is a general scenario for a 3-layer NN (input layer, only one hidden layer and one output layer). You can follow the procedure described above to compute gradients which should be easy to compute! Since another answer to this post already pointed to the problem in your code, i am not repeating the same.
这篇关于numpy : 计算 softmax 函数的导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!