神经网络:连续输出变量的 Sigmoid 激活函数 [英] Neural Networks: Sigmoid Activation Function for continuous output variable

查看:25
本文介绍了神经网络:连续输出变量的 Sigmoid 激活函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我正在,但是在这种情况下,我希望连续变量介于 0 和 1 之间,因此使用 sigmoid 函数.

解决方案

首先,你的成本函数应该是:

J = 1/m * sum( (a3-y).^2 );

我认为您的 Theta2_grad = (delta3'*a2)/m; 在更改为 delta3 = 1/2 * (a3 - y); 后,预计会匹配数值近似值;).

查看此幻灯片了解更多信息详情.

如果我们的代码之间存在一些细微的差异,我将代码粘贴在下面以供您参考.代码已经与数值逼近函数checkNNGradients(lambda);进行了对比,Relative Difference小于1e-4(不满足1e-11 由 Dr.Andrew Ng 提出的要求)

function [J grad] = nnCostFunctionRegression(nn_params, ...input_layer_size, ...hidden_​​layer_size, ...num_labels, ...X, y, λ)Theta1 = reshape(nn_params(1:hidden_​​layer_size * (input_layer_size + 1)), ...hidden_​​layer_size, (input_layer_size + 1));Theta2 = reshape(nn_params((1 + (hidden_​​layer_size * (input_layer_size + 1))):end), ...num_labels, (hidden_​​layer_size + 1));m = 尺寸(X, 1);J = 0;Theta1_grad = zeros(size(Theta1));Theta2_grad = zeros(size(Theta2));X = [个(米,1)X];z1 = sigmoid(X * Theta1');zs = z1;z1 = [ones(m, 1) z1];z2 = z1 * Theta2';ht = sigmoid(z2);y_recode = zeros(length(y),num_labels);对于 i=1:length(y)y_recode(i,y(i))=1;结尾y = y_recode;正则化=lambda/2/m*(sum(sum(sum(Theta1(:,2:end).^2))+sum(sum(theta2(:,2:end).^2)));J=1/(m)*sum(sum((ht - y).^2))+正则化;delta_3 = 1/2*(ht - y);delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');delta_cap2 = delta_3' * z1;delta_cap1 = delta_2' * X;Theta1_grad = ((1/m) * delta_cap1)+ ((λ/m) * (Theta1));Theta2_grad = ((1/m) * delta_cap2)+ ((λ/m) * (Theta2));Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));Theta2_grad(:,1) = Theta2_grad(:,1)-((λ/m) * (Theta2(:,1)));grad = [Theta1_grad(:);Theta2_grad(:)];结尾

Okay, so I am in the middle of Andrew Ng's machine learning course on coursera and would like to adapt the neural network which was completed as part of assignment 4.

In particular, the neural network which I had completed correctly as part of the assignment was as follows:

  • Sigmoid activation function: g(z) = 1/(1+e^(-z))
  • 10 output units, each which could take 0 or 1
  • 1 hidden layer
  • Back-propagation method used to minimize cost function
  • Cost function:

where L=number of layers, s_l = number of units in layer l, m = number of training examples, K = number of output units

Now I want to adjust the exercise so that there is one continuous output unit that takes any value between [0,1] and I am trying to work out what needs to change, so far I have

  • Replaced the data with my own, i.e.,such that the output is continuous variable between 0 and 1
  • Updated references to the number of output units
  • Updated the cost function in the back-propagation algorithm to: where a_3 is the value of the output unit determined from forward propagation.

I am certain that something else must change as the gradient checking method shows the gradient determined by back-propagation and that by the numerical approximation no longer match up. I did not change the sigmoid gradient; it is left at f(z)*(1-f(z)) where f(z) is the sigmoid function 1/(1+e^(-z))) nor did I update the numerical approximation of the derivative formula; simply (J(theta+e) - J(theta-e))/(2e).

Can anyone advise of what other steps would be required?

Coded in Matlab as follows:

% FORWARD PROPAGATION
% input layer
a1 = [ones(m,1),X];
% hidden layer
z2 = a1*Theta1';
a2 = sigmoid(z2);
a2 = [ones(m,1),a2];
% output layer
z3 = a2*Theta2';
a3 = sigmoid(z3);

% BACKWARD PROPAGATION
delta3 = a3 - y;
delta2 = delta3*Theta2(:,2:end).*sigmoidGradient(z2);
Theta1_grad = (delta2'*a1)/m;
Theta2_grad = (delta3'*a2)/m;

% COST FUNCTION
J = 1/(2 * m) * sum( (a3-y).^2 );

% Implement regularization with the cost function and gradients.
Theta1_grad(:,2:end) = Theta1_grad(:,2:end) + Theta1(:,2:end)*lambda/m;
Theta2_grad(:,2:end) = Theta2_grad(:,2:end) + Theta2(:,2:end)*lambda/m;
J = J + lambda/(2*m)*( sum(sum(Theta1(:,2:end).^2)) + sum(sum(Theta2(:,2:end).^2)));

I have since realised that this question is similar to that asked by @Mikhail Erofeev on StackOverflow, however in this case I wish the continuous variable to be between 0 and 1 and therefore use a sigmoid function.

解决方案

First, your cost function should be:

J = 1/m * sum( (a3-y).^2 );

I think your Theta2_grad = (delta3'*a2)/m;is expected to match the numerical approximation after changed to delta3 = 1/2 * (a3 - y);).

Check this slide for more details.

EDIT: In case there is some minor discrepancy between our codes, I pasted my code below for your reference. The code has already been compared with numerical approximation function checkNNGradients(lambda);, the Relative Difference is less than 1e-4 (not meets the 1e-11 requirement by Dr.Andrew Ng though)

function [J grad] = nnCostFunctionRegression(nn_params, ...
                                   input_layer_size, ...
                                   hidden_layer_size, ...
                                   num_labels, ...
                                   X, y, lambda)

Theta1 = reshape(nn_params(1:hidden_layer_size * (input_layer_size + 1)), ...
                 hidden_layer_size, (input_layer_size + 1));

Theta2 = reshape(nn_params((1 + (hidden_layer_size * (input_layer_size + 1))):end), ...
                 num_labels, (hidden_layer_size + 1));

m = size(X, 1);   
J = 0;
Theta1_grad = zeros(size(Theta1));
Theta2_grad = zeros(size(Theta2));


X = [ones(m, 1) X];   
z1 = sigmoid(X * Theta1');
zs = z1;
z1 = [ones(m, 1) z1];
z2 = z1 * Theta2';
ht = sigmoid(z2);


y_recode = zeros(length(y),num_labels);
for i=1:length(y)
    y_recode(i,y(i))=1;
end    
y = y_recode;


regularization=lambda/2/m*(sum(sum(Theta1(:,2:end).^2))+sum(sum(Theta2(:,2:end).^2)));
J=1/(m)*sum(sum((ht - y).^2))+regularization;
delta_3 = 1/2*(ht - y);
delta_2 = delta_3 * Theta2(:,2:end) .* sigmoidGradient(X * Theta1');

delta_cap2 = delta_3' * z1; 
delta_cap1 = delta_2' * X;

Theta1_grad = ((1/m) * delta_cap1)+ ((lambda/m) * (Theta1));
Theta2_grad = ((1/m) * delta_cap2)+ ((lambda/m) * (Theta2));

Theta1_grad(:,1) = Theta1_grad(:,1)-((lambda/m) * (Theta1(:,1)));
Theta2_grad(:,1) = Theta2_grad(:,1)-((lambda/m) * (Theta2(:,1)));


grad = [Theta1_grad(:) ; Theta2_grad(:)];

end

这篇关于神经网络:连续输出变量的 Sigmoid 激活函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆