numpy:计算softmax函数的导数 [英] numpy : calculate the derivative of the softmax function

查看:1196
本文介绍了numpy:计算softmax函数的导数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在一个简单的带有MNIST的三层神经网络中理解backpropagation.

I am trying to understand backpropagation in a simple 3 layered neural network with MNIST.

存在带有weightsbias的输入层.标签是MNIST,所以它是10类矢量.

There is the input layer with weights and a bias. The labels are MNIST so it's a 10 class vector.

第二层是linear tranform.第三层是softmax activation,将输出作为概率获取.

The second layer is a linear tranform. The third layer is the softmax activation to get the output as probabilities.

Backpropagation计算每一步的导数,并将其称为梯度.

Backpropagation calculates the derivative at each step and call this the gradient.

以前的图层将globalprevious渐变附加到local gradient.我在计算softmax

Previous layers appends the global or previous gradient to the local gradient. I am having trouble calculating the local gradient of the softmax

几个在线资源对softmax及其派生词进行了解释,甚至给出了softmax本身的代码示例

Several resources online go through the explanation of the softmax and its derivatives and even give code samples of the softmax itself

def softmax(x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

关于何时i = j和何时i != j来解释导数.这是我想出的一个简单代码段,希望验证我的理解:

The derivative is explained with respect to when i = j and when i != j. This is a simple code snippet I've come up with and was hoping to verify my understanding:

def softmax(self, x):
    """Compute the softmax of vector x."""
    exps = np.exp(x)
    return exps / np.sum(exps)

def forward(self):
    # self.input is a vector of length 10
    # and is the output of 
    # (w * x) + b
    self.value = self.softmax(self.input)

def backward(self):
    for i in range(len(self.value)):
        for j in range(len(self.input)):
            if i == j:
                self.gradient[i] = self.value[i] * (1-self.input[i))
            else: 
                 self.gradient[i] = -self.value[i]*self.input[j]

然后,self.gradient是作为向量的local gradient.这样对吗?有没有更好的方法来写这个?

Then self.gradient is the local gradient which is a vector. Is this correct? Is there a better way to write this?

推荐答案

我假设您有一个带有W1的3层NN,其中b1与从输入层到隐藏层和W2b2与从隐藏层到输出层的线性转换相关联. Z1Z2是隐藏层和输出层的输入向量. a1a2表示隐藏层和输出层的输出. a2是您的预测输出. delta3delta2是误差(反向传播),您可以看到损失函数相对于模型参数的梯度.

I am assuming you have a 3-layer NN with W1, b1 for is associated with the linear transformation from input layer to hidden layer and W2, b2 is associated with linear transformation from hidden layer to output layer. Z1 and Z2 are the input vector to the hidden layer and output layer. a1 and a2 represents the output of the hidden layer and output layer. a2 is your predicted output. delta3 and delta2 are the errors (backpropagated) and you can see the gradients of the loss function with respect to model parameters.

这是3层NN(输入层,仅一个隐藏层和一个输出层)的一般情况.您可以按照上述步骤来计算应该易于计算的梯度!由于对此帖子的另一个答案已经指出了您的代码中的问题,因此我不再重复.

This is a general scenario for a 3-layer NN (input layer, only one hidden layer and one output layer). You can follow the procedure described above to compute gradients which should be easy to compute! Since another answer to this post already pointed to the problem in your code, i am not repeating the same.

这篇关于numpy:计算softmax函数的导数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆