神经网络总是为任何输入产生相同/相似的输出 [英] Neural Network Always Produces Same/Similar Outputs for Any Input

查看:122
本文介绍了神经网络总是为任何输入产生相同/相似的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试为井字游戏创建神经网络时遇到问题.然而,出于某种原因,训练神经网络会使其对任何给定输入产生几乎相同的输出.

I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output for any given input.

我确实看过人工神经网络基准,但我的网络实现是为每个神经元具有相同激活函数的神经元构建的,即没有恒定神经元.

I did take a look at Artificial neural networks benchmark, but my network implementation is built for neurons with the same activation function for each neuron, i.e. no constant neurons.

为了确保问题不仅仅是由于我选择的训练集(1218 个棋盘状态和由遗传算法生成的移动),我尝试训练网络以重现 XOR.使用了逻辑激活函数.我没有使用导数,而是将误差乘以 output*(1-output),因为一些消息来源表明这等效于使用导数.可以把Haskell源码放到HPaste上,但是看着有点尴尬.网络有3层:第一层有2个输入和4个输出,第二层有4个输入和1个输出,第三层有1个输出.在第二层增加到 4 个神经元没有帮助,在第一层增加到 8 个输出也没有帮助.

To make sure the problem wasn't just due to my choice of training set (1218 board states and moves generated by a genetic algorithm), I tried to train the network to reproduce XOR. The logistic activation function was used. Instead of using the derivative, I multiplied the error by output*(1-output) as some sources suggested that this was equivalent to using the derivative. I can put the Haskell source on HPaste, but it's a little embarrassing to look at. The network has 3 layers: the first layer has 2 inputs and 4 outputs, the second has 4 inputs and 1 output, and the third has 1 output. Increasing to 4 neurons in the second layer didn't help, and neither did increasing to 8 outputs in the first layer.

然后我根据 http://hebb.mit.edu/courses/9.641/2002/lectures/lecture04.pdf 以确保代码的那些部分没有错误(没有,但我可能会再做一次以确保).因为我使用的是批量训练,所以我没有在公式(4)中乘以 x 那里.我正在添加重量变化,尽管 http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-2.html 建议将其减去.

I then calculated errors, network output, bias updates, and the weight updates by hand based on http://hebb.mit.edu/courses/9.641/2002/lectures/lecture04.pdf to make sure there wasn't an error in those parts of the code (there wasn't, but I will probably do it again just to make sure). Because I am using batch training, I did not multiply by x in equation (4) there. I am adding the weight change, though http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-2.html suggests to subtract it instead.

即使在这个简化的网络中,问题仍然存在.例如,这些是批量训练和增量训练 500 个 epoch 后的结果.

The problem persisted, even in this simplified network. For example, these are the results after 500 epochs of batch training and of incremental training.

Input    |Target|Output (Batch)      |Output(Incremental)
[1.0,1.0]|[0.0] |[0.5003781562785173]|[0.5009731800870864]
[1.0,0.0]|[1.0] |[0.5003740346965251]|[0.5006347214672715]
[0.0,1.0]|[1.0] |[0.5003734471544522]|[0.500589332376345]
[0.0,0.0]|[0.0] |[0.5003674110937019]|[0.500095157458231]

减法而不是加法会产生同样的问题,除了一切都是 0.99 而不是 0.50.5000 epochs 产生相同的结果,除了批量训练的网络在每种情况下准确返回 0.5.(哎呀,即使是 10,000 个 epoch 也不适用于批量训练.)

Subtracting instead of adding produces the same problem, except everything is 0.99 something instead of 0.50 something. 5000 epochs produces the same result, except the batch-trained network returns exactly 0.5 for each case. (Heck, even 10,000 epochs didn't work for batch training.)

一般情况下有什么可以产生这种行为的吗?

Is there anything in general that could produce this behavior?

此外,我查看了增量训练的中间误差,尽管隐藏/输入层的输入各不相同,但输出神经元的误差始终为 +/-0.12.对于批量训练,错误在增加,但非常缓慢,而且错误都非常小 (x10^-7).不同的初始随机权重和偏差也没有区别.

Also, I looked at the intermediate errors for incremental training, and the although the inputs of the hidden/input layers varied, the error for the output neuron was always +/-0.12. For batch training, the errors were increasing, but extremely slowly and the errors were all extremely small (x10^-7). Different initial random weights and biases made no difference, either.

请注意,这是一个学校项目,因此提示/指南会更有帮助.虽然重新发明轮子并建立自己的网络(用我不太了解的语言!)是一个可怕的想法,但我觉得这更适合学校项目(所以我知道发生了什么......理论上,至少.我学校似乎没有计算机科学老师).

Note that this is a school project, so hints/guides would be more helpful. Although reinventing the wheel and making my own network (in a language I don't know well!) was a horrible idea, I felt it would be more appropriate for a school project (so I know what's going on...in theory, at least. There doesn't seem to be a computer science teacher at my school).

两层,2 个输入到 8 个输出的输入层,以及 8 个输入到 1 个输出的输出层,产生几乎相同的结果:每个训练案例 0.5+/-0.2(左右).我也在玩 pyBrain,看看那里是否有任何网络结构可以工作.

Two layers, an input layer of 2 inputs to 8 outputs, and an output layer of 8 inputs to 1 output, produces much the same results: 0.5+/-0.2 (or so) for each training case. I'm also playing around with pyBrain, seeing if any network structure there will work.

编辑 2:我使用的学习率为 0.1.抱歉忘记了.

Edit 2: I am using a learning rate of 0.1. Sorry for forgetting about that.

编辑 3:Pybrain 的trainUntilConvergence"也没有给我一个完全训练好的网络,但是 20000 epochs 可以,隐藏层中有 16 个神经元.10000 个 epoch 和 4 个神经元,数量不多,但接近.所以,在 Haskell 中,输入层有 2 个输入 &2 个输出,具有 2 个输入和 8 个输出的隐藏层,以及具有 8 个输入和 1 个输出的输出层......我在 10000 个时期遇到同样的问题.并且有 20000 个 epoch.

Edit 3: Pybrain's "trainUntilConvergence" doesn't get me a fully trained network, either, but 20000 epochs does, with 16 neurons in the hidden layer. 10000 epochs and 4 neurons, not so much, but close. So, in Haskell, with the input layer having 2 inputs & 2 outputs, hidden layer with 2 inputs and 8 outputs, and output layer with 8 inputs and 1 output...I get the same problem with 10000 epochs. And with 20000 epochs.

编辑 4:我根据上面的 MIT PDF 再次手动运行网络,并且值匹配,所以代码应该是正确的,除非我误解了这些方程.

Edit 4: I ran the network by hand again based on the MIT PDF above, and the values match, so the code should be correct unless I am misunderstanding those equations.

我的一些源代码位于 http://hpaste.org/42453/neural_network__not_working;我正在努力清理我的代码并将其放入 Github(而不是私有 Bitbucket)存储库中.

所有相关源代码现在位于 https://github.com/l33tnerd/hsann.

All of the relevant source code is now at https://github.com/l33tnerd/hsann.

推荐答案

我没有用问题中的 XOR 问题对其进行测试,但是对于我基于 Tic-Tac-Toe 的原始数据集,我相信我已经得到网络进行一些训练(我只运行了 1000 个 epoch,这还不够):快速传播网络可以赢得/打平一半以上的比赛;反向传播可以得到大约 41%.问题归结为实现错误(小错误),并且不了解错误导数(每个权重)与每个神经元的错误之间的差异,我在我的研究中没有发现.@darkcanuck 关于以类似于权重的方式训练偏差的回答可能会有所帮助,尽管我没有实施它.我还用 Python 重写了我的代码,以便我可以更轻松地使用它进行破解.因此,虽然我还没有得到匹配 minimax 算法效率的网络,但我相信我已经设法解决了这个问题.

I haven't tested it with the XOR problem in the question, but for my original dataset based on Tic-Tac-Toe, I believe that I have gotten the network to train somewhat (I only ran 1000 epochs, which wasn't enough): the quickpropagation network can win/tie over half of its games; backpropagation can get about 41%. The problems came down to implementation errors (small ones) and not understanding the difference between the error derivative (which is per-weight) and the error for each neuron, which I did not pick up on in my research. @darkcanuck's answer about training the bias similarly to a weight would probably have helped, though I didn't implement it. I also rewrote my code in Python so that I could more easily hack with it. Therefore, although I haven't gotten the network to match the minimax algorithm's efficiency, I believe that I have managed to solve the problem.

这篇关于神经网络总是为任何输入产生相同/相似的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆