NumPy中的Softmax导数接近0(实现) [英] Softmax derivative in NumPy approaches 0 (implementation)

查看：117 发布时间：2020/5/17 19:11:10 python numpy neural-network softmax

本文介绍了NumPy中的Softmax导数接近0(实现)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试为用Numpy编写的神经网络实现softmax函数.假设 h 是给定信号 i 的softmax值.

I'm trying to implement the softmax function for a neural network written in Numpy. Let h be the softmax value of a given signal i.

我一直在努力实现softmax激活函数的偏导数.

I've struggled to implement the softmax activation function's partial derivative.

我目前正处于一个问题，即随着训练的进行，所有偏导数都接近0.我已经通过>这个出色的答案交叉引用了我的数学，但是我的数学似乎没有解决.

I'm currently stuck at issue where all the partial derivatives approaches 0 as the training progresses. I've cross-referenced my math with this excellent answer, but my math does not seem to work out.

import numpy as np
def softmax_function( signal, derivative=False ):
    # Calculate activation signal
    e_x = np.exp( signal )
    signal = e_x / np.sum( e_x, axis = 1, keepdims = True )

    if derivative:
        # Return the partial derivation of the activation function
        return np.multiply( signal, 1 - signal ) + sum(
            # handle the off-diagonal values
            - signal * np.roll( signal, i, axis = 1 )
            for i in xrange(1, signal.shape[1] )
        )
    else:
        # Return the activation signal
        return signal
#end activation function

signal参数包含发送到激活函数中的输入信号，其形状为(n_samples，n_features).

The signal parameter contains the input signal sent into the activation function and has the shape (n_samples, n_features).

# sample signal (3 samples, 3 features)
signal = [[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]]

下面的代码片段是一个完全有效的激活功能，仅作为参考和证明(主要是为我自己)，该概念实际上有效.

The following code snipped is a fully working activation function and is only included as a reference and proof (mostly for myself) that the conceptual idea actually work.

from scipy.special import expit
import numpy as np
def sigmoid_function( signal, derivative=False ):
    # Prevent overflow.
    signal = np.clip( signal, -500, 500 )

    # Calculate activation signal
    signal = expit( signal )

    if derivative:
        # Return the partial derivation of the activation function
        return np.multiply(signal, 1 - signal)
    else:
        # Return the activation signal
        return signal
#end activation function

编辑

直觉上简单的单层网络仍然存在该问题. softmax(及其派生类)应用于最后一层.

推荐答案

这是有关如何以更加向量化的numpy方式计算softmax函数的导数的答案.但是，偏导数逼近零的事实可能不是数学问题，而仅仅是学习率问题或复杂的深度神经网络已知的 dying weight 问题.诸如 ReLU 之类的层有助于防止后一种问题.

This is an answer on how to calculate the derivative of the softmax function in a more vectorized numpy fashion. However, the fact that the partial derivatives approach to zero might not be a math issue, and just be a problem of the learning rate or the known dying weight issue with complex deep neural networks. Layers like ReLU help preventing the latter issue.

首先，我使用了以下信号(只是复制了您的最后一个条目)使其变为4 samples x 3 features，因此更容易查看尺寸的变化.

First, I've used the following signal (just duplicating your last entry) to make it 4 samples x 3 features so is easier to see what is going on with the dimensions.

>>> signal = [[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]]
>>> signal.shape
(4, 3)

接下来，您要计算softmax函数的Jacobian矩阵.根据引用的页面，对于非对角线条目(n_features > 2的矩阵的大部分)，它定义为-hi * hj，因此让我们从此处开始.在numpy中，您可以使用广播 :

Next, you want to compute the Jacobian matrix of your softmax function. According to the cited page it is defined as -hi * hj for the off-diagonal entries (majority of the matrix for n_features > 2), so lets start there. In numpy, you can efficiently calculate that Jacobian matrix using broadcasting:

>>> J = - signal[..., None] * signal[:, None, :]
>>> J.shape
(4, 3, 3)

第一个signal[..., None](等效于signal[:, :, None])将信号重塑为(4, 3, 1)，第二个signal[:, None, :]将信号重塑为(4, 1, 3).然后，*只是将两个矩阵逐个元素相乘. Numpy的内部广播重复两个矩阵以形成每个样本的n_features x n_features矩阵.

The first signal[..., None] (equivalent to signal[:, :, None]) reshapes the signal to (4, 3, 1) while the second signal[:, None, :] reshapes the signal to (4, 1, 3). Then, the * just multiplies both matrices element-wise. Numpy's internal broadcasting repeats both matrices to form the n_features x n_features matrix for every sample.

然后，我们需要修复对角线元素:

Then, we need to fix the diagonal elements:

>>> iy, ix = np.diag_indices_from(J[0])
>>> J[:, iy, ix] = signal * (1. - signal)

以上几行提取了n_features x n_features矩阵的对角线索引.等效于执行iy = np.arange(n_features); ix = np.arange(n_features).然后，用对角线hi * (1 - hi)替换对角线条目.

The above lines extract diagonal indices for n_features x n_features matrix. It is equivalent of doing iy = np.arange(n_features); ix = np.arange(n_features). Then, replaces the diagonal entries with your defitinion hi * (1 - hi).

最后，根据链接的来源，您需要对每个样本的行进行求和.可以这样做:

Last, according to the linked source, you need to sum across rows for each of the samples. That can be done as:

>>> J = J.sum(axis=1)
>>> J.shape
(4, 3)

在下面找到摘要版本:

if derivative:
    J = - signal[..., None] * signal[:, None, :] # off-diagonal Jacobian
    iy, ix = np.diag_indices_from(J[0])
    J[:, iy, ix] = signal * (1. - signal) # diagonal
    return J.sum(axis=1) # sum across-rows for each sample

衍生物的比较:

Comparison of the derivatives:

>>> signal = [[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]]
>>> e_x = np.exp( signal )
>>> signal = e_x / np.sum( e_x, axis = 1, keepdims = True )

您的用户:

>>> np.multiply( signal, 1 - signal ) + sum(
        # handle the off-diagonal values
        - signal * np.roll( signal, i, axis = 1 )
        for i in xrange(1, signal.shape[1] )
    )
array([[  2.77555756e-17,  -2.77555756e-17,   0.00000000e+00],
       [ -2.77555756e-17,  -2.77555756e-17,  -2.77555756e-17],
       [  2.77555756e-17,   0.00000000e+00,   2.77555756e-17],
       [  2.77555756e-17,   0.00000000e+00,   2.77555756e-17]])

我的:

>>> J = signal[..., None] * signal[:, None, :]
>>> iy, ix = np.diag_indices_from(J[0])
>>> J[:, iy, ix] = signal * (1. - signal)
>>> J.sum(axis=1)
array([[  4.16333634e-17,  -1.38777878e-17,   0.00000000e+00],
       [ -2.77555756e-17,  -2.77555756e-17,  -2.77555756e-17],
       [  2.77555756e-17,   1.38777878e-17,   2.77555756e-17],
       [  2.77555756e-17,   1.38777878e-17,   2.77555756e-17]])

这篇关于NumPy中的Softmax导数接近0(实现)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

NumPy中的Softmax导数接近0(实现) [英] Softmax derivative in NumPy approaches 0 (implementation)

问题描述

编辑

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

NumPy中的Softmax导数接近0(实现) [英] Softmax derivative in NumPy approaches 0 (implementation)

问题描述

编辑

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭