与神经网络进行异或运算(Matlab) [英] XOR with Neural Networks (Matlab)

查看:391
本文介绍了与神经网络进行异或运算(Matlab)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我希望这是我做的一件愚蠢的事情,并且有一个简单的答案.我正在尝试训练2x3x1神经网络来执行XOR问题.它没有用,所以我决定深入研究正在发生的事情.最后,我决定自行分配权重.这是我想出的权重向量:

So, I'm hoping this is a real dumb thing I'm doing, and there's an easy answer. I'm trying to train a 2x3x1 neural network to do the XOR problem. It wasn't working, so I decided to dig in to see what was happening. Finally, I decided to assign the weights my self. This was the weight vector I came up with:

theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];

(采用Matlab表示法).我故意尝试使两个权重都不相同(除非有零)

(In Matlab notation). I deliberately tried to make no two weights be the same (barring the zeros)

而且,我的代码在matlab中真正简单的是

And, my code, really simple in matlab is

function layer2 = xornn(iters)
    if nargin < 1
        iters = 50
    end
    function s = sigmoid(X)
        s = 1.0 ./ (1.0 + exp(-X));
    end
    T = [0 1 1 0];
    X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
    theta1 = [11 0 -5; 0 12 -7;18 17 -20];
    theta2 = [14 13 -28 -6];
    for i = [1:iters]
        layer1 = [sigmoid(theta1 * X); 1 1 1 1];
        layer2 = sigmoid(theta2 * layer1)
        delta2 = T - layer2;
        delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
        % remove the bias from delta 1. There's no real point in a delta on the bias.
        delta1 = delta1(1:3,:);
        theta2d = delta2 * layer1';
        theta1d = delta1 * X';
        theta1 = theta1 - 0.1 * theta1d;
        theta2 = theta2 - 0.1 * theta2d;
    end
end

我认为是正确的.我用有限差分法测试了(thetas)的各种参数,看它们是否正确,而且似乎正确.

I believe that's right. I tested various parameters (of the thetas) with the finite differences method to see if they were right, and they seemed to be.

但是,当我运行它时,它最终全部归结为返回全零.如果我执行xornn(1)(进行1次迭代),我将得到

But, when I run it, it eventually just all boils down to returning all zeros. If I do xornn(1) (for 1 iteration) I get

0.0027    0.9966    0.9904    0.0008

但是,如果我做xornn(35)

But, if I do xornn(35)

0.0026    0.9949    0.9572    0.0007

(它开始朝错误的方向下降),到我到达xornn(45)时,我得到了

(It's started a descent in the wrong direction) and by the time I get to xornn(45) I get

0.0018    0.0975    0.0000    0.0003

如果我将其运行10,000次迭代,它只会返回全0.

If I run it for 10,000 iterations, it just returns all 0's.

这是怎么回事?我必须添加正则化吗?我本以为这样一个简单的网络就不需要它了.但是,无论如何,为什么它会偏离我亲自喂食的显而易见的好的解决方案?

What is going on? Must I add regularization? I would have thought such a simple network wouldn't need it. But, regardless, why does it move away from an obvious good solution that I have hand fed it?

谢谢!

推荐答案

AAARRGGHHH!解决方案只是改变问题

AAARRGGHHH! The solution was simply a matter of changing

theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;

theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;

叹气

现在,当我以为自己在计算负数时,我需要弄清楚如何以某种方式计算负导数...没关系.无论如何,我都会在这里发布,以防万一它可以帮助别人.

Now tho, I need to figure out how I'm computing the negative derivative somehow when what I thought I was computing was the ... Never mind. I'll post here anyway, just in case it helps someone else.

所以,z =是S形输入的总和,y是S形输出.

So, z = is the sum of inputs to the sigmoid, and y is the output of the sigmoid.

C = -(T * Log[y] + (1-T) * Log[(1-y))

dC/dy = -((T/y) - (1-T)/(1-y))
      = -((T(1-y)-y(1-T))/(y(1-y)))
      = -((T-Ty-y+Ty)/(y(1-y)))
      = -((T-y)/(y(1-y)))
      = ((y-T)/(y(1-y))) # This is the source of all my woes.
dy/dz = y(1-y)
dC/dz = ((y-T)/(y(1-y))) * y(1-y)
      = (y-T)

因此,问题是我不小心在计算T-y,因为我忘记了成本函数前面的负号.然后,我减去了我认为是梯度的东西,但实际上是负梯度.而且,在那里.那是问题.

So, the problem, is that I accidentally was computing T-y, because I forgot about the negative sign in front of the cost function. Then, I was subtracting what I thought was the gradient, but was in fact the negative gradient. And, there. That's the problem.

一旦我这样做:

function layer2 = xornn(iters)
    if nargin < 1
        iters = 50
    end
    function s = sigmoid(X)
        s = 1.0 ./ (1.0 + exp(-X));
    end
    T = [0 1 1 0];
    X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
    theta1 = [11 0 -5; 0 12 -7;18 17 -20];
    theta2 = [14 13 -28 -6];
    for i = [1:iters]
        layer1 = [sigmoid(theta1 * X); 1 1 1 1];
        layer2 = sigmoid(theta2 * layer1)
        delta2 = T - layer2;
        delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
        % remove the bias from delta 1. There's no real point in a delta on the bias.
        delta1 = delta1(1:3,:);
        theta2d = delta2 * layer1';
        theta1d = delta1 * X';
        theta1 = theta1 + 0.1 * theta1d;
        theta2 = theta2 + 0.1 * theta2d;
    end
end

xornn(50)返回0.0028 0.9972 0.9948 0.0009和 xornn(10000)返回0.0016 0.9989 0.9993 0.0005

xornn(50) returns 0.0028 0.9972 0.9948 0.0009 and xornn(10000) returns 0.0016 0.9989 0.9993 0.0005

Ph!也许这将有助于其他人调试他们的版本.

Phew! Maybe this will help someone else in debugging their version..

这篇关于与神经网络进行异或运算(Matlab)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆