反向传播用于校正带有交叉熵误差的线性单元 [英] Backpropagation for rectified linear unit activation with cross entropy error

查看:100
本文介绍了反向传播用于校正带有交叉熵误差的线性单元的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用反向传播为神经网络实现梯度计算. 我无法使它与交叉熵误差和整流线性单位(ReLU)一起工作.

I'm trying to implement gradient calculation for neural networks using backpropagation. I cannot get it to work with cross entropy error and rectified linear unit (ReLU) as activation.

我设法通过Sigmoid,tanh和ReLU激活函数使实现工作适用于平方误差.正确计算出具有S型激活梯度的交叉熵(CE)误差.但是,当我将激活更改为ReLU时-失败. (我跳过CE的tanh,因为它重新显示(-1,1)范围内的值.)

I managed to get my implementation working for squared error with sigmoid, tanh and ReLU activation functions. Cross entropy (CE) error with sigmoid activation gradient is computed correctly. However, when I change activation to ReLU - it fails. (I'm skipping tanh for CE as it retuls values in (-1,1) range.)

是因为log函数在接近0的值下的行为(对于正常的输入,ReLU大约有50%的时间返回该值)? 我试图通过以下方法缓解该问题:

Is it because of the behavior of log function at values close to 0 (which is returned by ReLUs approx. 50% of the time for normalized inputs)? I tried to mitiage that problem with:

log(max(y,eps))

但是它仅有助于将误差和梯度带回实数-它们仍然与数字梯度不同.

but it only helped to bring error and gradients back to real numbers - they are still different from numerical gradient.

我使用数值梯度来验证结果:

I verify the results using numerical gradient:

num_grad = (f(W+epsilon) - f(W-epsilon)) / (2*epsilon)

下面的matlab代码提供了我的实验中使用的简化的精简反向传播实现方式:

The following matlab code presents a simplified and condensed backpropagation implementation used in my experiments:

function [f, df] = backprop(W, X, Y)
% W - weights
% X - input values
% Y - target values

act_type='relu';    % possible values: sigmoid / tanh / relu
error_type = 'CE';  % possible values: SE / CE

N=size(X,1); n_inp=size(X,2); n_hid=100; n_out=size(Y,2);
w1=reshape(W(1:n_hid*(n_inp+1)),n_hid,n_inp+1);
w2=reshape(W(n_hid*(n_inp+1)+1:end),n_out, n_hid+1);

% feedforward
X=[X ones(N,1)];
z2=X*w1'; a2=act(z2,act_type); a2=[a2 ones(N,1)];
z3=a2*w2'; y=act(z3,act_type);

if strcmp(error_type, 'CE')   % cross entropy error - logistic cost function
    f=-sum(sum( Y.*log(max(y,eps))+(1-Y).*log(max(1-y,eps)) ));
else % squared error
    f=0.5*sum(sum((y-Y).^2));
end

% backprop
if strcmp(error_type, 'CE')   % cross entropy error
    d3=y-Y;
else % squared error
    d3=(y-Y).*dact(z3,act_type);
end

df2=d3'*a2;
d2=d3*w2(:,1:end-1).*dact(z2,act_type);
df1=d2'*X;

df=[df1(:);df2(:)];

end

function f=act(z,type) % activation function
switch type
    case 'sigmoid'
        f=1./(1+exp(-z));
    case 'tanh'
        f=tanh(z);
    case 'relu'
        f=max(0,z);
end
end

function df=dact(z,type) % derivative of activation function
switch type
    case 'sigmoid'
        df=act(z,type).*(1-act(z,type));
    case 'tanh'
        df=1-act(z,type).^2;
    case 'relu'
        df=double(z>0);
end
end


修改

经过另一轮实验,我发现对最后一层使用softmax:

After another round of experiments, I found out that using a softmax for the last layer:

y=bsxfun(@rdivide, exp(z3), sum(exp(z3),2));

和softmax成本函数:

and softmax cost function:

f=-sum(sum(Y.*log(y)));

使实现适用于包括ReLU在内的所有激活功能.

make the implementaion working for all activation functions including ReLU.

这使我得出的结论是,逻辑成本函数(二进制分类器)不适用于ReLU:

This leads me to conclusion that it is the logistic cost function (binary clasifier) that does not work with ReLU:

f=-sum(sum( Y.*log(max(y,eps))+(1-Y).*log(max(1-y,eps)) ));

但是,我仍然无法弄清楚问题出在哪里.

However, I still cannot figure out where the problem lies.

推荐答案

每个压扁函数sigmoid,tanh和softmax(在输出层中) 意味着不同的成本函数. 这样就可以理解,RLU(在输出层中)与交叉熵代价函数不匹配. 我将尝试一个简单的平方误差代价函数来测试RLU输出层.

Every squashing function sigmoid, tanh and softmax (in the output layer) means different cost functions. Then makes sense that a RLU (in the output layer) does not match with the cross entropy cost function. I will try a simple square error cost function to test a RLU output layer.

RLU的真正力量在于深层网络的隐藏层,因为它不会遭受梯度消失误差的困扰.

The true power of RLU is in the hidden layers of a deep net since it not suffer from gradient vanishing error.

这篇关于反向传播用于校正带有交叉熵误差的线性单元的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆