numpy : 计算 softmax 函数的导数 [英] numpy : calculate the derivative of the softmax function

查看:90
本文介绍了numpy : 计算 softmax 函数的导数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 MNIST 在一个简单的 3 层神经网络中理解 backpropagation.

I am trying to understand backpropagation in a simple 3 layered neural network with MNIST.

输入层有weightsbias.标签是 MNIST 所以它是一个 10 类向量.

There is the input layer with weights and a bias. The labels are MNIST so it's a 10 class vector.

第二层是线性变换.第三层是 softmax 激活 以获取输出作为概率.

The second layer is a linear tranform. The third layer is the softmax activation to get the output as probabilities.

Backpropagation 计算每一步的导数并将其称为梯度.

Backpropagation calculates the derivative at each step and call this the gradient.

前一层将 globalprevious 渐变附加到 local 渐变.我在计算 softmax

Previous layers appends the global or previous gradient to the local gradient. I am having trouble calculating the local gradient of the softmax

网上的一些资源对 softmax 及其导数进行了解释,甚至给出了 softmax 本身的代码示例

Several resources online go through the explanation of the softmax and its derivatives and even give code samples of the softmax itself

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

关于何时i = ji != j 来解释导数.这是我想出的一个简单的代码片段,希望能验证我的理解:

The derivative is explained with respect to when i = j and when i != j. This is a simple code snippet I've come up with and was hoping to verify my understanding:

def softmax(self, x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

def forward(self):
    # self.input is a vector of length 10
    # and is the output of 
    # (w * x) + b
    self.value = self.softmax(self.input)

def backward(self):
    for i in range(len(self.value)):
        for j in range(len(self.input)):
            if i == j:
                self.gradient[i] = self.value[i] * (1-self.input[i))
            else: 
                 self.gradient[i] = -self.value[i]*self.input[j]

那么 self.gradientlocal gradient ,它是一个向量.这个对吗?有没有更好的写法?

Then self.gradient is the local gradient which is a vector. Is this correct? Is there a better way to write this?

推荐答案

我假设你有一个带有 W1 的 3-layer NN,b1 与从输入层到隐藏层的线性变换和W2b2与从隐藏层到输出层的线性变换相关联.Z1Z2 是隐藏层和输出层的输入向量.a1a2 分别代表隐藏层和输出层的输出.a2 是您的预测输出.delta3delta2 是误差(反向传播),您可以看到损失函数相对于模型参数的梯度.

I am assuming you have a 3-layer NN with W1, b1 for is associated with the linear transformation from input layer to hidden layer and W2, b2 is associated with linear transformation from hidden layer to output layer. Z1 and Z2 are the input vector to the hidden layer and output layer. a1 and a2 represents the output of the hidden layer and output layer. a2 is your predicted output. delta3 and delta2 are the errors (backpropagated) and you can see the gradients of the loss function with respect to model parameters.

这是 3 层 NN(输入层,只有一个隐藏层和一个输出层)的一般场景.您可以按照上述过程计算梯度,这应该很容易计算!由于这篇文章的另一个答案已经指出了您代码中的问题,因此我不再重复相同的内容.

This is a general scenario for a 3-layer NN (input layer, only one hidden layer and one output layer). You can follow the procedure described above to compute gradients which should be easy to compute! Since another answer to this post already pointed to the problem in your code, i am not repeating the same.

这篇关于numpy : 计算 softmax 函数的导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆