反向传播用于校正带有交叉熵误差的线性单元 [英] Backpropagation for rectified linear unit activation with cross entropy error

查看：100 发布时间：2020/5/4 9:36:31 matlab machine-learning neural-network backpropagation

本文介绍了反向传播用于校正带有交叉熵误差的线性单元的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用反向传播为神经网络实现梯度计算. 我无法使它与交叉熵误差和整流线性单位(ReLU)一起工作.

I'm trying to implement gradient calculation for neural networks using backpropagation. I cannot get it to work with cross entropy error and rectified linear unit (ReLU) as activation.

我设法通过Sigmoid，tanh和ReLU激活函数使实现工作适用于平方误差.正确计算出具有S型激活梯度的交叉熵(CE)误差.但是，当我将激活更改为ReLU时-失败. (我跳过CE的tanh，因为它重新显示(-1,1)范围内的值.)

I managed to get my implementation working for squared error with sigmoid, tanh and ReLU activation functions. Cross entropy (CE) error with sigmoid activation gradient is computed correctly. However, when I change activation to ReLU - it fails. (I'm skipping tanh for CE as it retuls values in (-1,1) range.)

是因为log函数在接近0的值下的行为(对于正常的输入，ReLU大约有50％的时间返回该值)? 我试图通过以下方法缓解该问题:

Is it because of the behavior of log function at values close to 0 (which is returned by ReLUs approx. 50% of the time for normalized inputs)? I tried to mitiage that problem with:

log(max(y,eps))

但是它仅有助于将误差和梯度带回实数-它们仍然与数字梯度不同.

but it only helped to bring error and gradients back to real numbers - they are still different from numerical gradient.

我使用数值梯度来验证结果:

I verify the results using numerical gradient:

num_grad = (f(W+epsilon) - f(W-epsilon)) / (2*epsilon)

下面的matlab代码提供了我的实验中使用的简化的精简反向传播实现方式:

The following matlab code presents a simplified and condensed backpropagation implementation used in my experiments:

function [f, df] = backprop(W, X, Y)
% W - weights
% X - input values
% Y - target values

act_type='relu';    % possible values: sigmoid / tanh / relu
error_type = 'CE';  % possible values: SE / CE

N=size(X,1); n_inp=size(X,2); n_hid=100; n_out=size(Y,2);
w1=reshape(W(1:n_hid*(n_inp+1)),n_hid,n_inp+1);
w2=reshape(W(n_hid*(n_inp+1)+1:end),n_out, n_hid+1);

% feedforward
X=[X ones(N,1)];
z2=X*w1'; a2=act(z2,act_type); a2=[a2 ones(N,1)];
z3=a2*w2'; y=act(z3,act_type);

if strcmp(error_type, 'CE')   % cross entropy error - logistic cost function
    f=-sum(sum( Y.*log(max(y,eps))+(1-Y).*log(max(1-y,eps)) ));
else % squared error
    f=0.5*sum(sum((y-Y).^2));
end

% backprop
if strcmp(error_type, 'CE')   % cross entropy error
    d3=y-Y;
else % squared error
    d3=(y-Y).*dact(z3,act_type);
end

df2=d3'*a2;
d2=d3*w2(:,1:end-1).*dact(z2,act_type);
df1=d2'*X;

df=[df1(:);df2(:)];

end

function f=act(z,type) % activation function
switch type
    case 'sigmoid'
        f=1./(1+exp(-z));
    case 'tanh'
        f=tanh(z);
    case 'relu'
        f=max(0,z);
end
end

function df=dact(z,type) % derivative of activation function
switch type
    case 'sigmoid'
        df=act(z,type).*(1-act(z,type));
    case 'tanh'
        df=1-act(z,type).^2;
    case 'relu'
        df=double(z>0);
end
end

修改

经过另一轮实验，我发现对最后一层使用softmax:

After another round of experiments, I found out that using a softmax for the last layer:

y=bsxfun(@rdivide, exp(z3), sum(exp(z3),2));

和softmax成本函数:

and softmax cost function:

f=-sum(sum(Y.*log(y)));

使实现适用于包括ReLU在内的所有激活功能.

make the implementaion working for all activation functions including ReLU.

这使我得出的结论是，逻辑成本函数(二进制分类器)不适用于ReLU:

This leads me to conclusion that it is the logistic cost function (binary clasifier) that does not work with ReLU:

f=-sum(sum( Y.*log(max(y,eps))+(1-Y).*log(max(1-y,eps)) ));

但是，我仍然无法弄清楚问题出在哪里.

However, I still cannot figure out where the problem lies.

反向传播用于校正带有交叉熵误差的线性单元 [英] Backpropagation for rectified linear unit activation with cross entropy error

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

反向传播用于校正带有交叉熵误差的线性单元 [英] Backpropagation for rectified linear unit activation with cross entropy error

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭