如何独立于任何损失函数来实现Softmax导数? [英] How to implement the Softmax derivative independently from any loss function?

查看:151
本文介绍了如何独立于任何损失函数来实现Softmax导数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于神经网络库,我实现了一些激活函数和损失函数及其派生函数.它们可以任意组合,输出层的导数恰好成为损耗导数和激活导数的乘积.

For a neural networks library I implemented some activation functions and loss functions and their derivatives. They can be combined arbitrarily and the derivative at the output layers just becomes the product of the loss derivative and the activation derivative.

但是,我未能独立于任何损失函数来实现Softmax激活函数的导数.由于归一化,即等式中的分母,更改单个输入激活将更改所有输出激活,而不仅仅是一个.

However, I failed to implement the derivative of the Softmax activation function independently from any loss function. Due to the normalization i.e. the denominator in the equation, changing a single input activation changes all output activations and not just one.

这是我的Softmax实现,其中导数未能通过梯度检查大约1%.如何实现Softmax导数以便可以与任何损失函数组合?

Here is my Softmax implementation where the derivative fails the gradient checking by about 1%. How can I implement the Softmax derivative so that it can be combined with any loss function?

import numpy as np


class Softmax:

    def compute(self, incoming):
        exps = np.exp(incoming)
        return exps / exps.sum()

    def delta(self, incoming, outgoing):
        exps = np.exp(incoming)
        others = exps.sum() - exps
        return 1 / (2 + exps / others + others / exps)


activation = Softmax()
cost = SquaredError()

outgoing = activation.compute(incoming)
delta_output_layer = activation.delta(incoming) * cost.delta(outgoing)

推荐答案

在数学上,相对于对数Zi(例如,Wi * X),Softmaxσ(j)的导数是

Mathematically, the derivative of Softmax σ(j) with respect to the logit Zi (for example, Wi*X) is

红色三角洲是克罗内克三角洲.

where the red delta is a Kronecker delta.

如果您迭代实现:

def softmax_grad(s):
    # input s is softmax value of the original input x. Its shape is (1,n) 
    # i.e.  s = np.array([0.3,0.7]),  x = np.array([0,1])

    # make the matrix whose size is n^2.
    jacobian_m = np.diag(s)

    for i in range(len(jacobian_m)):
        for j in range(len(jacobian_m)):
            if i == j:
                jacobian_m[i][j] = s[i] * (1 - s[i])
            else: 
                jacobian_m[i][j] = -s[i] * s[j]
    return jacobian_m

测试:

In [95]: x
Out[95]: array([1, 2])

In [96]: softmax(x)
Out[96]: array([ 0.26894142,  0.73105858])

In [97]: softmax_grad(softmax(x))
Out[97]: 
array([[ 0.19661193, -0.19661193],
       [-0.19661193,  0.19661193]])

如果您以向量化版本实施:

If you implement in a vectorized version:

soft_max = softmax(x)    

# reshape softmax to 2d so np.dot gives matrix multiplication

def softmax_grad(softmax):
    s = softmax.reshape(-1,1)
    return np.diagflat(s) - np.dot(s, s.T)

softmax_grad(soft_max)

#array([[ 0.19661193, -0.19661193],
#       [-0.19661193,  0.19661193]])

这篇关于如何独立于任何损失函数来实现Softmax导数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆