XOR 与神经网络 (Matlab) [英] XOR with Neural Networks (Matlab)

查看:63
本文介绍了XOR 与神经网络 (Matlab)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我希望这是我正在做的一件非常愚蠢的事情,并且有一个简单的答案.我正在尝试训练一个 2x3x1 的神经网络来解决 XOR 问题.它不起作用,所以我决定深入了解发生了什么.最后,我决定给自己分配权重.这是我想出的权重向量:

So, I'm hoping this is a real dumb thing I'm doing, and there's an easy answer. I'm trying to train a 2x3x1 neural network to do the XOR problem. It wasn't working, so I decided to dig in to see what was happening. Finally, I decided to assign the weights my self. This was the weight vector I came up with:

theta1 = [11 0 -5; 0 12 -7;18 17 -20];
theta2 = [14 13 -28 -6];

(在 Matlab 符号中).我故意试图让任何两个权重都不相同(除了零)

(In Matlab notation). I deliberately tried to make no two weights be the same (barring the zeros)

而且,我的代码在 matlab 中非常简单

And, my code, really simple in matlab is

function layer2 = xornn(iters)
    if nargin < 1
        iters = 50
    end
    function s = sigmoid(X)
        s = 1.0 ./ (1.0 + exp(-X));
    end
    T = [0 1 1 0];
    X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
    theta1 = [11 0 -5; 0 12 -7;18 17 -20];
    theta2 = [14 13 -28 -6];
    for i = [1:iters]
        layer1 = [sigmoid(theta1 * X); 1 1 1 1];
        layer2 = sigmoid(theta2 * layer1)
        delta2 = T - layer2;
        delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
        % remove the bias from delta 1. There's no real point in a delta on the bias.
        delta1 = delta1(1:3,:);
        theta2d = delta2 * layer1';
        theta1d = delta1 * X';
        theta1 = theta1 - 0.1 * theta1d;
        theta2 = theta2 - 0.1 * theta2d;
    end
end

我相信这是对的.我用有限差分法测试了各种参数(thetas),看看它们是否正确,而且它们似乎是正确的.

I believe that's right. I tested various parameters (of the thetas) with the finite differences method to see if they were right, and they seemed to be.

但是,当我运行它时,它最终归结为返回全零.如果我做 xornn(1) (1 次迭代)我得到

But, when I run it, it eventually just all boils down to returning all zeros. If I do xornn(1) (for 1 iteration) I get

0.0027    0.9966    0.9904    0.0008

但是,如果我做 xornn(35)

But, if I do xornn(35)

0.0026    0.9949    0.9572    0.0007

(它开始向错误的方向下降)当我到达 xornn(45) 时我得到了

(It's started a descent in the wrong direction) and by the time I get to xornn(45) I get

0.0018    0.0975    0.0000    0.0003

如果我运行 10,000 次迭代,它只会返回全 0.

If I run it for 10,000 iterations, it just returns all 0's.

这是怎么回事?我必须添加正则化吗?我会认为这样一个简单的网络不需要它.但是,无论如何,为什么它偏离了我亲手提供的一个明显的好解决方案?

What is going on? Must I add regularization? I would have thought such a simple network wouldn't need it. But, regardless, why does it move away from an obvious good solution that I have hand fed it?

谢谢!

推荐答案

AAARRGGHHH!解决办法很简单,就是改变一下

AAARRGGHHH! The solution was simply a matter of changing

theta1 = theta1 - 0.1 * theta1d;
theta2 = theta2 - 0.1 * theta2d;

theta1 = theta1 + 0.1 * theta1d;
theta2 = theta2 + 0.1 * theta2d;

叹息

现在,我需要弄清楚我如何以某种方式计算负导数,而我认为我正在计算的是......没关系.无论如何我都会在这里发帖,以防它对其他人有帮助.

Now tho, I need to figure out how I'm computing the negative derivative somehow when what I thought I was computing was the ... Never mind. I'll post here anyway, just in case it helps someone else.

所以,z = 是 sigmoid 的输入总和,y 是 sigmoid 的输出.

So, z = is the sum of inputs to the sigmoid, and y is the output of the sigmoid.

C = -(T * Log[y] + (1-T) * Log[(1-y))

dC/dy = -((T/y) - (1-T)/(1-y))
      = -((T(1-y)-y(1-T))/(y(1-y)))
      = -((T-Ty-y+Ty)/(y(1-y)))
      = -((T-y)/(y(1-y)))
      = ((y-T)/(y(1-y))) # This is the source of all my woes.
dy/dz = y(1-y)
dC/dz = ((y-T)/(y(1-y))) * y(1-y)
      = (y-T)

所以,问题是我不小心计算了 T-y,因为我忘记了成本函数前面的负号.然后,我减去了我认为是梯度的东西,但实际上是负梯度.而且,那里.这就是问题所在.

So, the problem, is that I accidentally was computing T-y, because I forgot about the negative sign in front of the cost function. Then, I was subtracting what I thought was the gradient, but was in fact the negative gradient. And, there. That's the problem.

一旦我这样做了:

function layer2 = xornn(iters)
    if nargin < 1
        iters = 50
    end
    function s = sigmoid(X)
        s = 1.0 ./ (1.0 + exp(-X));
    end
    T = [0 1 1 0];
    X = [0 0 1 1; 0 1 0 1; 1 1 1 1];
    theta1 = [11 0 -5; 0 12 -7;18 17 -20];
    theta2 = [14 13 -28 -6];
    for i = [1:iters]
        layer1 = [sigmoid(theta1 * X); 1 1 1 1];
        layer2 = sigmoid(theta2 * layer1)
        delta2 = T - layer2;
        delta1 = layer1 .* (1-layer1) .* (theta2' * delta2);
        % remove the bias from delta 1. There's no real point in a delta on the bias.
        delta1 = delta1(1:3,:);
        theta2d = delta2 * layer1';
        theta1d = delta1 * X';
        theta1 = theta1 + 0.1 * theta1d;
        theta2 = theta2 + 0.1 * theta2d;
    end
end

xornn(50) 返回 0.0028 0.9972 0.9948 0.0009 和xornn(10000) 返回 0.0016 0.9989 0.9993 0.0005

xornn(50) returns 0.0028 0.9972 0.9948 0.0009 and xornn(10000) returns 0.0016 0.9989 0.9993 0.0005

呸!也许这会帮助其他人调试他们的版本..

Phew! Maybe this will help someone else in debugging their version..

这篇关于XOR 与神经网络 (Matlab)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆