神经网络变得难以控制 [英] Neural Network Becomes Unruly with Large Layers

查看:77
本文介绍了神经网络变得难以控制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是有关神经网络性能的更高层次的问题.我遇到的问题是,每层神经元的数量更多,因此网络经常出现完全的愚蠢现象.它们不一致.当层数超过60个神经元(总是3层)时,总体上成功与失败的可能性似乎约为50/50.

我通过向具有10-200大小的输入和隐藏层的网络教授相同的功能来测试了这一点.成功率是0-1%或90 +%,但是两者之间没有任何关系.为了帮助可视化,我将其绘制成图形.失败是指经过5k次训练迭代后,对200个数据集的错误响应的总数. .

我认为同样重要的是,每次实验运行网络成功或失败的数字都会改变.我想出的唯一可能的罪魁祸首是局部极小值(但不要让它影响您的答案,我是新手,因此尝试最小化局部极小值的机会似乎没有任何作用).

因此,最终的问题是,什么可能导致这种行为?为什么这东西如此不一致?

Python代码位于 Github 上,生成此图的代码是testHugeNetwork方法在test.py中(第172行).如果网络算法的任何特定部分会有所帮助,我很高兴发布相关的摘要.

解决方案

我的猜测是,您的网络在锯齿状的错误表面上剧烈振荡.尝试降低错误率可能会有所帮助.但是首先,您可以做一些事情以更好地了解您的网络在做什么:

  • 在训练时期绘制输出误差.这将向您显示何时在培训过程中出现问题.
  • 具有重量矩阵和输出的图形表示(图像).可以更轻松地发现违规行为.

神经网络训练的一个主要问题是乙状结肠功能的饱和度.对于逻辑函数和tanh的渐近线,导数接近于0,数值上甚至都为零.结果,网络只会学习得很慢,或者根本不会学习.当S形输入太大时,会发生此问题,这是您可以做的:

  • 根据神经元接收的输入数量初始化您的权重.标准文献建议从均值为0且标准偏差为1/sqrt(m)的分布中提取它们,其中m为输入连接数.
  • 扩展您的老师,使他们位于网络上可以学习最多的地方;也就是说,在激活函数最陡的地方:一阶导数的最大值.对于tanh,您也可以将函数缩放为f(x)= 1.7159 * tanh(2/3 * x)并将教师保持在[-1,1].但是,不要忘记将导数调整为f'(x)= 2/3 * 1.7159 *(1-tanh ^ 2(2/3 * x)

如果您需要其他说明,请告诉我.

This is a higher-level question about the performance of a neural network. The issue I'm having is that with larger numbers of neurons per layer, the network has frequent rounds of complete stupidity. They are not consistent; it seems that the probability of general success vs failure is about 50/50 when layers get larger than 60 neurons (always 3 layers).

I tested this by teaching the same function to networks with input and hidden layers of sizes from 10-200. The success rate is either 0-1% or 90+%, but nothing in between. To help visualize this, I graphed it. Failures is a total count of incorrect responses on 200 data sets after 5k training iterations. .

I think it's also important to note that the numbers at which the network succeeds or fails change for each run of the experiment. The only possible culprit I've come up with is local minima (but don't let this influence your answers, I'm new to this, and initial attempts to minimize the chance of local minima seem to have no effect).

So, the ultimate question is, what could cause this behavior? Why is this thing so wildly inconsistent?

The Python code is on Github and the code that generated this graph is the testHugeNetwork method in test.py (line 172). If any specific parts of the network algorithm would be helpful I'm glad to post relevant snippets.

解决方案

My guess is, that your network is oscillating heavily across a jagged error surface. Trying a lower error rate might help. But first of all, there are a few things you can do to better understand what your network is doing:

  • plot the output error over training epochs. This will show you when in the training process things go wrong.
  • have a graphical representation (an image) of your weight matrices and of your outputs. Makes it much easier to spot irregularities.

A major problem with ANN training is saturation of the sigmoid function. Towards the asymptotes of both the logistic function and tanh, the derivative is close to 0, numerically it probably even is zero. As a result, the network will only learn very slowly or not at all. This problem occurs when the input for the sigmoid are too big, here's what you can do about it:

  • initialize your weights proportional to number of inputs a neuron receives. Standard literature suggests to draw them from a distribution with mean = 0 and standard deviation 1/sqrt(m), where m is the number of input connections.
  • scale your teachers so that they lie where the network can learn the most; that is, where the activation function is the steepest: the maximum of the first derivative. For tanh you can alternatively scale the function to f(x) = 1.7159 * tanh(2/3 * x) and keep the teachers at [-1, 1]. However, don't forget to adjust the derivative to f'(x) = 2/3 * 1.7159 * (1 - tanh^2 (2/3 * x)

Let me know if you need additional clarification.

这篇关于神经网络变得难以控制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆