反向传播的神经网络无法收敛 [英] Neural Network with backpropogation not converging

查看:93
本文介绍了反向传播的神经网络无法收敛的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我正在尝试在网络中实现backpropogation.我知道反向传播算法是经过硬编码的,但是我正在尝试使其首先起作用.

Basically I'm trying to implement backpropogation in a network. I know the backpropogation algorithm is hard coded, but I'm trying to make it functional first.

它适用于一组输入和输出,但超出一个训练集,网络收敛于一个解决方案,而另一输出收敛于0.5.

It works for one set of inputs and outputs but beyond one training set the network converges on one solution while the other output converges on 0.5.

即一个试验的输出为: [0.9969527919933012, 0.003043774988797313]

I.e the output for one trial is: [0.9969527919933012, 0.003043774988797313]

[0.5000438200377985, 0.49995612243030635]

Network.java

private ArrayList<ArrayList<ArrayList<Double>>> weights;
private ArrayList<ArrayList<Double>> nodes;

private final double LEARNING_RATE = -0.25;
private final double DEFAULT_NODE_VALUE = 0.0;

private double momentum = 1.0;

public Network() {
    weights = new ArrayList<ArrayList<ArrayList<Double>>>();
    nodes = new ArrayList<ArrayList<Double>>();
}

/**
 * This method is used to add a layer with {@link n} nodes to the network.
 * @param n number of nodes for the layer
 */
public void addLayer(int n) {
    nodes.add(new ArrayList<Double>());
    for (int i = 0;i < n;i++)
        nodes.get(nodes.size()-1).add(DEFAULT_NODE_VALUE);
}

/**
 * This method generates the weights used to link layers together.
 */
public void createWeights() {
    // there are only weights between layers, so we have one less weight layer than node layer
    for (int i = 0;i < nodes.size()-1;i++) {
        weights.add(new ArrayList<ArrayList<Double>>());

        // for each node above the weight
        for (int j = 0;j < nodes.get(i).size();j++) {
            weights.get(i).add(new ArrayList<Double>());

            // for each node below the weight
            for (int k = 0;k < nodes.get(i+1).size();k++)
                weights.get(i).get(j).add(Math.random()*2-1);
        }
    }
}

/**
 * Utilizes the differentiated sigmoid function to change weights in the network
 * @param out   The desired output pattern for the network
 */
private void propogateBackward(double[] out) {
    /*
     * Error calculation using squared error formula and the sigmoid derivative
     * 
     * Output Node : dk = Ok(1-Ok)(Ok-Tk)
     * Hidden Node : dj = Oj(1-Oj)SummationkEK(dkWjk)
     * 
     * k is an output node
     * j is a hidden node
     * 
     * dw = LEARNING_RATE*d*outputOfpreviousLayer(not weighted)
     * W = dW + W
     */

    // update the last layer of weights first because it is a special case

    double dkW = 0;

    for (int i = 0;i < nodes.get(nodes.size()-1).size();i++) {

        double outputK = nodes.get(nodes.size()-1).get(i);
        double deltaK = outputK*(1-outputK)*(outputK-out[i]);

        for (int j = 0;j < nodes.get(nodes.size()-2).size();j++) {
            weights.get(1).get(j).set(i, weights.get(1).get(j).get(i) + LEARNING_RATE*deltaK*nodes.get(nodes.size()-2).get(j) );
            dkW += deltaK*weights.get(1).get(j).get(i);
        }
    }

    for (int i = 0;i < nodes.get(nodes.size()-2).size();i++) {

        //Hidden Node : dj = Oj(1-Oj)SummationkEK(dkWjk)
        double outputJ = nodes.get(1).get(i);
        double deltaJ = outputJ*(1-outputJ)*dkW*LEARNING_RATE;

        for (int j = 0;j < nodes.get(0).size();j++) {
            weights.get(0).get(j).set(i, weights.get(0).get(j).get(i) + deltaJ*nodes.get(0).get(j) );
        }


    }

}

/**
 * Propogates an array of input values through the network
 * @param in    an array of inputs
 */
private void propogateForward(double[] in) {
    // pass the weights to the input layer
    for (int i = 0;i < in.length;i++)
        nodes.get(0).set(i, in[i]);

    // propagate through the rest of the network
    // for each layer after the first layer
    for (int i = 1;i < nodes.size();i++)

        // for each node in the layer
        for (int j = 0;j < nodes.get(i).size();j++) {

            // for each node in the previous layer
            for (int k = 0;k < nodes.get(i-1).size();k++)

                // add to the node the weighted output from k to j
                nodes.get(i).set(j, nodes.get(i).get(j)+weightedNode(i-1, k, j));

            // once the node has received all of its inputs we can apply the activation function
            nodes.get(i).set(j, activation(nodes.get(i).get(j)));

        }   
}

/**
 * This method returns the activation value of an input
 * @param   in the total input of a node
 * @return  the sigmoid function at the input
 */
private double activation(double in) {
    return 1/(1+Math.pow(Math.E,-in));
}

/**
 * Weighted output for a node.
 * @param layer the layer which the transmitting node is on
 * @param node  the index of the transmitting node
 * @param previousNode  the index of the receiving node
 * @return  the output of the transmitting node times the weight between the two nodes
 */
private double weightedNode(int layer, int node, int nextNode) {
    return nodes.get(layer).get(node)*weights.get(layer).get(node).get(nextNode);
}

/**
 * This method resets all of the nodes to their default value
 */
private void resetNodes() {
    for (int i = 0;i < nodes.size();i++)
        for (int j = 0;j < nodes.get(i).size();j++)
            nodes.get(i).set(j, DEFAULT_NODE_VALUE);
}

/**
 * Teach the network correct responses for certain input values.
 * @param in    an array of input values
 * @param out   an array of desired output values
 * @param n     number of iterations to perform
 */
public void train(double[] in, double[] out, int n) {
    for (int i = 0;i < n;i++) {
        propogateForward(in);
        propogateBackward(out);
        resetNodes();
    }
}

public void getResult(double[] in) {
    propogateForward(in);
    System.out.println(nodes.get(2));
    resetNodes();
}

SnapSolve.java

public SnapSolve() {

    Network net = new Network();
    net.addLayer(2);
    net.addLayer(4);
    net.addLayer(2);
    net.createWeights();

    double[] l = {0, 1};
    double[] p = {1, 0};

    double[] n = {1, 0};
    double[] r = {0, 1};

    for(int i = 0;i < 100000;i++) {
        net.train(l, p, 1);
        net.train(n, r, 1);
    }

    net.getResult(l);
    net.getResult(n);

}

public static void main(String[] args) {
    new SnapSolve();
}

推荐答案

建议

  • 您在网络中使用的初始权重非常大.通常,您要在S型激活神经网​​络中按与单元扇入平方根的平方成反比的方式初始化权重.因此,对于网络第i层中的单位,请在正负n ^ {-1/2}之间选择初始权重,其中n是第i-1层中的单位数. (请参见 http://www.willamette.edu/~gorr/classes/cs449 /precond.html 了解更多信息.)

    Suggestions

    • The initial weights you're using in your network are pretty large. Typically you want to initialize weights in a sigmoid-activation neural network proportionally to the inverse of the square root of the fan-in of the unit. So, for units in layer i of the network, choose initial weights between positive and negative n^{-1/2}, where n is the number of units in layer i-1. (See http://www.willamette.edu/~gorr/classes/cs449/precond.html for more information.)

      您似乎正在使用的学习率参数也相当大,这可能会导致您的网络在训练过程中反弹".我为此尝试了不同的值,以对数为单位:0.2、0.1、0.05、0.02、0.01、0.005,...,直到找到一个看起来更好的值.

      The learning rate parameter that you seem to be using is also fairly large, which can cause your network to "bounce around" during training. I'd experiment with different values for this, on a log scale: 0.2, 0.1, 0.05, 0.02, 0.01, 0.005, ... until you find one that appears to work better.

      您实际上只是在训练两个示例(尽管您使用的网络应该能够轻松地为这两点建模).您可以通过向现有输入添加噪声并期望网络产生正确的输出来增加训练数据集的多样性.我发现,这在使用平方误差损失(如您所使用的)并尝试学习像XOR这样的二进制布尔运算符时有时会有所帮助,因为在真正的函数域中要进行训练的输入输出对很少.

      You're really only training on two examples (though the network you're using should be able to model these two points easily). You can increase the diversity of your training dataset by adding noise to the existing inputs and expecting the network to produce the correct output. I've found that this helps sometimes when using a squared-error loss (like you're using) and trying to learn a binary boolean operator like XOR, since there are very few input-output pairs in the true function domain to train with.

      此外,我想提出一个可能对您解决此类问题的方法的一般建议:添加一些代码,当给定已知的输入输出时,该代码将允许您监视网络的当前错误.对(或整个验证"数据集).

      Also, I'd like to make a general suggestion that might help in your approach to problems like this: add a little bit of code that will allow you to monitor the current error of the network when given a known input-output pair (or entire "validation" dataset).

      如果您可以在训练期间监视网络的错误,则可以帮助您更清楚地看到网络何时收敛-在训练网络时,错误应该稳定地减少.如果它周围反弹,您将知道您正在使用太大的学习率,或者需要以其他方式调整您的训练数据集.如果误差增加,则梯度计算出了点问题.

      If you can monitor the error of the network during training, it will help you see more clearly when the network is converging -- the error should decrease steadily as you train the network. If it bounces all around, you'll know that you're either using too large a learning rate or need to otherwise adapt your training dataset. If the error increases, something is wrong with your gradient computations.

      这篇关于反向传播的神经网络无法收敛的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆