XOR神经网络(FF)收敛到0.5 [英] XOR Neural Network(FF) converges to 0.5

查看:131
本文介绍了XOR神经网络(FF)收敛到0.5的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个程序,可以创建任意大小/长度的灵活神经网络,但是我正在使用XOR设置的简单结构对其进行测试(前馈,Sigmoid激活,向后传播,无批处理) .

I've created a program that allows me to create flexible Neural networks of any size/length, however I'm testing it using the simple structure of an XOR setup(Feed forward, Sigmoid activation, back propagation, no batching).

以下是针对我的原始问题的一种全新方法,该方法没有提供足够的信息

我的体重开始在-2.5到2.5之间,并解决了我的代码中忘记了一些负面因素的问题.现在,对于所有情况,收敛为0,或者对于所有情况,收敛为1,而不是0.5

一切都按照我 THINK 的方式工作,但是它趋向于0.5,而不是在0和1的输出之间振荡.我已经仔细研究并手动计算了整个设置前馈/计算增量误差/反向prop./等,它与我从程序中得到的匹配.我还尝试过通过更改学习率/动量来优化它,以及增加网络的复杂性(更多神经元/层).

Everything works exactly the way that I THINK it should, however it is converging toward 0.5, instead of oscillating between outputs of 0 and 1. I've completely gone through and hand calculated an entire setup of feeding forward/calculating delta errors/back prop./ etc. and it matched what I got from the program. I have also tried optimizing it by changing learning rate/ momentum, as well as increase complexity in the network(more neurons/layers).

因此,我假设我的方程式之一是错误的,或者在我的神经网络中我还有其他误解.以下是我针对每个步骤所遵循的方程式的逻辑:

Because of this, I assume that either one of my equations is wrong, or I have some other sort of misunderstanding in my Neural Network. The following is the logic with equations that I follow for each step:

我有一个具有两个输入和一个偏置的输入层,一个具有2个神经元和一个偏置的隐藏层,以及具有1个神经元的输出.

  1. 取两个输入神经元和偏向神经元各自的输入,然后将它们乘以它们各自的权重,然后将它们加在一起作为隐藏层中两个神经元中每个神经元的输入.
  2. 获取每个隐藏神经元的输入,使其通过 Sigmoid激活函数(参考文献 1 ),并将其用作神经元的输出.
  3. 取隐藏层中每个神经元的输出(偏差为1),将它们乘以它们各自的权重,然后将这些值加到输出神经元的输入中.
  4. 通过Sigmoid激活函数传递输出神经元的输入,并将其用作整个网络的输出.
  5. 计算输出神经元的 Delta误差(参考 2 )
  6. 为2个隐藏神经元中的每一个计算 Delta误差(参考 3 )
  7. 为每个权重(从头开始并向后计算)计算梯度(参考 4 )
  8. 为每个重量计算 Delta重量(参考 5 ),并将其添加到其值中.
  9. 通过更改输入和预期输出(参考文献 6 )
  10. 重新开始
  1. Take the input from each of the two input neurons and the bias neuron, then multiply them by their respective weights, and then add them together as the input for each of the two neurons in the hidden layer.
  2. Take the input of each hidden neuron, pass it through the Sigmoid activation function (Reference 1) and use that as the neuron's output.
  3. Take the outputs of each neuron in hidden layer (1 for the bias), multiply them by their respective weights, and add those values to the output neuron's input.
  4. Pass the output neuron's input through the Sigmoid activation function, and use that as the output for the whole network.
  5. Calculate the Delta Error(Reference 2) for the output neuron
  6. Calculate the Delta Error(Reference 3) for each of the 2 hidden neurons
  7. Calculate the Gradient(Reference 4) for each weight (starting from the end and working back)
  8. Calculate the Delta Weight(Reference 5) for each weight, and add that to its value.
  9. Start the process over with by Changing the inputs and expected output(Reference 6)

以下是对方程式/过程的引用(这可能是我的问题所在!):

Here are the specifics of those references to equations/processes (This is probably where my problem is!):

  1. x是神经元的输入:(1/(1 + Math.pow(Math.E, (-1 * x))))
  2. -1*(actualOutput - expectedOutput)*(Sigmoid(x) * (1 - Sigmoid(x))//Same sigmoid used in reference 1
  3. SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to))
  4. ParentNeuron.output * NeuronItConnectsTo.deltaError
  5. learningRate*(weight.gradient) + momentum*(Previous Delta Weight)
  6. 我有一个arrayList,其中的值依次为0,1,1,0.它采用第一个对(0,1),然后期望为1.对于第二次,它使用第二对(1,1)并期望0.它只是不断地遍历每个新集合的列表.也许以这种系统的方式对其进行培训会导致问题?
  1. x is the input of the neuron: (1/(1 + Math.pow(Math.E, (-1 * x))))
  2. -1*(actualOutput - expectedOutput)*(Sigmoid(x) * (1 - Sigmoid(x))//Same sigmoid used in reference 1
  3. SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to))
  4. ParentNeuron.output * NeuronItConnectsTo.deltaError
  5. learningRate*(weight.gradient) + momentum*(Previous Delta Weight)
  6. I have an arrayList with the values 0,1,1,0 in it in that order. It takes the first pair(0,1), and then expects a 1. For the second time through, it takes the second pair(1,1) and expects a 0. It just keeps iterating through the list for each new set. Perhaps training it in this systematic way causes the problem?

就像我之前说过的那样,他们之所以认为我不认为这是一个编码问题,是因为它与我用纸和铅笔计算出的结果完全匹配(如果出现编码错误,则不会发生).

Like I said before, they reason I don't think it's a code problem is because it matched exactly what I had calculated with paper and pencil (which wouldn't have happened if there was a coding error).

另外,当我第一次初始化权重时,会给它们一个介于0和1之间的随机double值.本文建议这可能会导致问题:如果我可以更具体一些,或者您想让其他代码让我知道,谢谢!

If I can be more specific or you want other code let me know, thanks!

推荐答案

这是错误的

SigmoidDerivative(Neuron.input)*(((Neuron.Weights *它们连接的神经元的deltaError的总和))) 首先是乙状结肠激活(g) 第二个是乙状结肠激活的衍生物

SigmoidDerivative(Neuron.input)*(The sum of(Neuron.Weights * the deltaError of the neuron they connect to)) First is sigmoid activation (g) second is derivative of sigmoid activation

private double g(double z) {
    return 1 / (1 + Math.pow(2.71828, -z));
}

private double gD(double gZ) {
    return gZ * (1 - gZ);
}

不相关的注释:您使用(-1 * x)表示法确实很奇怪,只需使用-x

Unrelated note: Your notation of (-1*x) is really strange just use -x

从您的ANN步骤的措辞来看,您的实现似乎很糟糕.尝试着重于实现Forward/BackPropogation,然后再实现UpdateWeights方法. 创建矩阵类

Your implementation from how you phrase the steps of your ANN seems poor. Try to focus on implementing Forward/BackPropogation and then an UpdateWeights method. Creating a matrix class

这是我的Java实现,非常简单,有些粗糙.我使用Matrix类使其背后的数学代码看起来非常简单.

This is my Java implementation, its very simple and somewhat rough. I use a Matrix class to make the math behind it appear very simple in code.

如果您可以用C ++编写代码,则可以使操作员重载,这将使编写可理解的代码更加容易.

If you can code in C++ you can overload operaters which will enable for even easier writing of comprehensible code.

https://github.com/josephjaspers/ArtificalNetwork/blob/master/src/artificalnetwork/ArtificalNetwork.java

以下是算法(C ++)

Here are the algorithms (C++)

所有这些代码都可以在我的github上找到(神经网络既简单又功能). 每层都包含偏置节点,这就是为什么存在偏移的原因

All of these codes can be found on my github (the Neural nets are simple and funcitonal) Each layer includes the bias nodes, which is why there are offsets

void NeuralNet::forwardPropagation(std::vector<double> data) {
    setBiasPropogation(); //sets all the bias nodes activation to 1
    a(0).set(1, Matrix(data)); //1 to offset for bias unit (A = X)

    for (int i = 1; i < layers; ++i) {
        //  (set(1 -- offsets the bias unit

        z(i).set(1, w(i - 1) * a(i - 1)); 
        a(i) = g(z(i)); // g(z ) if the sigmoid function
    }
}
void NeuralNet::setBiasPropogation() {
    for (int i = 0; i < activation.size(); ++i) {
        a(i).set(0, 0, 1);
    }
}

outLayer D = A-Y(y是输出数据) hiddenLayers d ^ l =(w ^ l(T)* d ^ l + 1)*:gD(a ^ l)

outLayer D = A - Y (y is the output data) hiddenLayers d^l = (w^l(T) * d^l+1) *: gD(a^l)

d =导数向量

W =权重矩阵(长度=连接数,宽度=特征)

W = weights matrix (Length = connections, width = features)

a =激活矩阵

gD =导数函数

^ l =没有力量(这仅表示在l层)

^l = IS NOT POWER OF (this just means at layer l)

  • = dotproduct

*:=相乘(将每个元素直通"相乘)

*: = multiply (multiply each element "through")

cpy(n)返回矩阵偏移量n的副本(忽略n行)

cpy(n) returns a copy of the matrix offset by n (ignores n rows)

void NeuralNet::backwardPropagation(std::vector<double> output) {
    d(layers - 1) = a(layers - 1) - Matrix(output);
    for (int i = layers - 2; i > -1; --i) {
    d(i) = (w(i).T() * d(i + 1).cpy(1)).x(gD(a(i))); 
    }       
}

解释此代码可能会使图像混乱,因此我发送此链接(我认为是很好的来源),其中还包含对BackPropagation的解释,可能比我自己的解释更好. http://galaxy.agh.edu.pl/~vlsi/AI /backp_t_en/backprop.html

Explaining this code maybe confusing without images so I'm sending this link which I think is a good source, it also contains an explanation of BackPropagation which may be better then my own explanation. http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html

void NeuralNet::updateWeights() {
    // the operator () (int l, int w) returns a double reference at that position in the matrix
    // thet operator [] (int n) returns the nth double (reference) in the matrix (useful for vectors) 
    for (int l = 0; l < layers - 1; ++l) {
        for (int i = 1; i < d(l + 1).length(); ++i) {
            for (int j = 0; j < a(l).length(); ++j) {
                w(l)(i - 1, j) -= (d(l + 1)[i] * a(l)[j]) * learningRate + m(l)(i - 1, j);
                m(l)(i - 1, j) = (d(l + 1)[i] * a(l)[j]) * learningRate * momentumRate;
            }
        }
    }
}

这篇关于XOR神经网络(FF)收敛到0.5的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆