神经网络不适合异或 [英] Neural Network not fitting XOR

查看:100
本文介绍了神经网络不适合异或的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个Octave脚本,用于使用反向传播训练具有1个隐藏层的神经网络,但它似乎不适合XOR函数.

I created an Octave script for training a neural network with 1 hidden layer using backpropagation but it can not seem to fit an XOR function.

  • x输入4x2矩阵[0 0; 0 1; 1 0; 1 1]
  • y输出4x1矩阵[0; 1; 1; 0]
  • theta隐藏/输出层权重
  • z加权总和
  • a激活函数应用于加权总和
  • m样本计数(此处为4)
  • x Input 4x2 matrix [0 0; 0 1; 1 0; 1 1]
  • y Output 4x1 matrix [0; 1; 1; 0]
  • theta Hidden / output layer weights
  • z Weighted sums
  • a Activation function applied to weighted sums
  • m Sample count (4 here)

我的体重初始化如下

epsilon_init = 0.12;
theta1 = rand(hiddenCount, inputCount + 1) * 2 * epsilon_init * epsilon_init;
theta2 = rand(outputCount, hiddenCount + 1) * 2 * epsilon_init * epsilon_init;

转发

a1 = x;
a1_with_bias = [ones(m, 1) a1];
z2 = a1_with_bias * theta1';
a2 = sigmoid(z2);
a2_with_bias = [ones(size(a2, 1), 1) a2];
z3 = a2_with_bias * theta2';
a3 = sigmoid(z3);

然后我计算物流成本函数

Then I compute the logistic cost function

j = -sum((y .* log(a3) + (1 - y) .* log(1 - a3))(:)) / m;

反向传播

delta2 = (a3 - y);
gradient2 = delta2' * a2_with_bias / m;

delta1 = (delta2 * theta2(:, 2:end)) .* sigmoidGradient(z2);
gradient1 = delta1' * a1_with_bias / m;

使用梯度检查验证了梯度的正确性.

The gradients were verified to be correct using gradient checking.

然后我使用这些梯度使用梯度下降来找到theta的最佳值,尽管使用Octave的fminunc函数会产生相同的结果.成本函数收敛到ln(2)(对于平方误差成本函数,收敛为0.5),因为无论我使用多少隐藏单元,网络对于所有四个输入都输出0.5.

I then use these gradients to find the optimal values for theta using gradient descent, though using Octave's fminunc function yields the same results. The cost function converges to ln(2) (or 0.5 for a squared errors cost function) because the network outputs 0.5 for all four inputs no matter how many hidden units I use.

有人知道我的错误在哪里吗?

Does anyone know where my mistake is?

推荐答案

初始化权重(包括负值)时,应从更大的范围开始.您的代码很难在正权重和负权重之间交叉",并且您可能打算放* 2 * epsilon_init - epsilon_init;,而放下* 2 * epsilon_init * epsilon_init;.该修复程序很可能会修复您的代码.

Start with a larger range when initialising weights, including negative values. It is difficult for your code to "cross-over" between positive and negative weights, and you probably meant to put * 2 * epsilon_init - epsilon_init; when instead you put * 2 * epsilon_init * epsilon_init;. Fixing that may well fix your code.

根据经验,我会做这样的事情:

As a rule of thumb, I would do something like this:

theta1 = ( 0.5 * sqrt ( 6 / ( inputCount + hiddenCount) ) * 
    randn( hiddenCount, inputCount + 1 ) );
theta2 = ( 0.5 * sqrt ( 6 / ( hiddenCount + outputCount ) ) * 
    randn( outputCount, hiddenCount + 1 ) );

乘数只是我在一门课程中得到的一些建议,我认为它得到了研究论文的支持,该研究论文比较了几种不同的方法.

The multiplier is just some advice I picked up on a course, I think that it is backed by a research paper that compared a few different approaches.

此外,如果您运行基本梯度下降,则可能需要很多迭代来学习XOR.我建议至少运行10000次,然后再宣布学习无效. fminunc函数应该比这更好.

In addition, you may need a lot of iterations to learn XOR if you run basic gradient descent. I suggest running for at least 10000 before declaring that learning isn't working. The fminunc function should do better than that.

我用两个隐藏的神经元,基本梯度下降和上述初始化程序运行了您的代码,它正确地学习了XOR.我还尝试添加动量项,并且学习更快,更可靠,所以我建议您接下来看看.

I ran your code with 2 hidden neurons, basic gradient descent and the above initialisations, and it learned XOR correctly. I also tried adding momentum terms, and the learning was faster and more reliable, so I suggest you take a look at that next.

这篇关于神经网络不适合异或的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆