神经网络始终为任何输入生成相同/相似的输出 [英] Neural Network Always Produces Same/Similar Outputs for Any Input

查看:858
本文介绍了神经网络始终为任何输入生成相同/相似的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我正在尝试为Tic-Tac-Toe创建一个神经网络。但是,由于某种原因,对神经网络进行训练会导致它为任何给定的输入产生几乎相同的输出。



我看过,但是我的网络实现是针对每个神经元具有相同激活函数的神经元而构建的,即没有恒定的神经元。为了确定问题不仅仅是由于我选择了训练集(由遗传算法产生的1218个棋盘状态和移动),我试图训练网络来重现XOR。逻辑激活功能被使用。我没有使用导数,而是通过 output *(1-output)乘以误差,因为有些来源表明这相当于使用导数。我可以将Haskell源码放在HPaste上,但看起来有点令人尴尬。网络有3层:第一层有2个输入端和4个输出端,第二层有4个输入端和1个输出端,第三层有1个输出端。在第二层增加到4个神经元并没有帮助,在第一层也没有增加到8个输出。然后,我计算了错误,网络输出,偏差更新,并根据 http://hebb.mit .edu / courses / 9.641 / 2002 / lectures / lecture04.pdf ,以确保在代码的这些部分中没有错误(没有,但我可能会再次做,只是为了确保)。因为我正在使用批量训练,所以在方程(4)中我没有乘以 x 。我正在增加体重变化,尽管 http: //www.faqs.org/faqs/ai-faq/neural-nets/part2/section-2.html 建议减去它。

即使在这个简化的网络中,问题仍然存在。例如,这些是批次培训和增量培训500次后的结果。

 输入|目标|输出(批) |输出(增量)
[1.0,1.0] | [0.0] | [0.5003781562785173] | [0.5009731800870864]
[1.0,0.0] | [1.0] | [0.5003740346965251] | [0.5006347214672715]
[0.0,1.0] | [1.0] | [0.5003734471544522] | [0.500589332376345]
[0.0,0.0] | [0.0] | [0.5003674110937019] | [0.500095157458231]

减去而不是加入会产生同样的问题,除了一切都是0.99而不是0.50。 5000个时期产生相同的结果,除了批量训练的网络每种情况下精确返回0.5。 (哎呀,即使是10000个时代也没有用于批量培训。)



有什么可以产生这种行为的吗?

另外,我研究了增量训练的中间错误,尽管隐藏/输入层的输入变化了,输出神经元的错误总是+/- 0.12。对于批量训练,误差在增加,但非常缓慢,误差都非常小(x10 ^ -7)。不同的初始随机权重和偏差也没有区别。



请注意,这是一个学校项目,所以提示/指南会更有帮助。尽管重新发明了自己的网络(用一种我不太熟悉的语言),但这是一个可怕的想法,我觉得这对于一个学校项目来说更合适(所以我知道发生了什么事......理论上,至少在我的学校似乎没有计算机科学老师)。编辑:两层,一个输入到2个输入到8个输出的输入层,以及一个8输入到1输出的输出层,产生相同的结果:每个训练案例0.5 +/- 0.2(左右)。我也在玩pyBrain,看看有没有网络结构可以工作。



编辑2:我使用0.1的学习率。编辑3:Pybrain的trainUntilConvergence并没有给我一个训练有素的网络,但是有20000个时代,16个神经元在隐藏层。 10000个时代和4个神经元,不是那么多,但很接近。因此,在Haskell中,输入层有2个输入& 2个输出,2个输入和8个输出的隐藏层,以及8个输入和1个输出的输出层......我用10000个纪元获得同样的问题。编辑4:我再次根据上面的MIT PDF手动运行网络,并且这些值匹配,所以代码应该是正确的,除非我我错误地理解了这些方程。



我的一些源代码位于 http://hpaste.org/42453/neural_network__not_working <击>;我正在努力清理我的代码,并将它放在Github(而不是私有的Bitbucket)存储库中。

所有相关的源代码是现在 https://github.com/l33tnerd/hsann

解决方案

我没有在问题中用XOR问题对它进行测试,但对于基于Tic-Tac-Toe的原始数据集,我相信我已经获得网络训练有点(我只跑了1000个时代,这还不够):快速传播网络可以赢得/超过一半的比赛;反向传播可以获得约41%。这些问题归结为实现错误(小的错误),并且不了解误差导数(这是每个重量)与每个神经元的错误之间的差异,我在我的研究中没有提到。 @ darkcanuck关于像体重一样训练偏见的答案可能会有所帮助,尽管我没有实现它。我还用Python重写了我的代码,以便我可以更轻松地使用它。因此,虽然我没有得到网络来匹配最小最大化算法的效率,但我相信我已经设法解决了这个问题。


I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output for any given input.

I did take a look at Artificial neural networks benchmark, but my network implementation is built for neurons with the same activation function for each neuron, i.e. no constant neurons.

To make sure the problem wasn't just due to my choice of training set (1218 board states and moves generated by a genetic algorithm), I tried to train the network to reproduce XOR. The logistic activation function was used. Instead of using the derivative, I multiplied the error by output*(1-output) as some sources suggested that this was equivalent to using the derivative. I can put the Haskell source on HPaste, but it's a little embarrassing to look at. The network has 3 layers: the first layer has 2 inputs and 4 outputs, the second has 4 inputs and 1 output, and the third has 1 output. Increasing to 4 neurons in the second layer didn't help, and neither did increasing to 8 outputs in the first layer.

I then calculated errors, network output, bias updates, and the weight updates by hand based on http://hebb.mit.edu/courses/9.641/2002/lectures/lecture04.pdf to make sure there wasn't an error in those parts of the code (there wasn't, but I will probably do it again just to make sure). Because I am using batch training, I did not multiply by x in equation (4) there. I am adding the weight change, though http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-2.html suggests to subtract it instead.

The problem persisted, even in this simplified network. For example, these are the results after 500 epochs of batch training and of incremental training.

Input    |Target|Output (Batch)      |Output(Incremental)
[1.0,1.0]|[0.0] |[0.5003781562785173]|[0.5009731800870864]
[1.0,0.0]|[1.0] |[0.5003740346965251]|[0.5006347214672715]
[0.0,1.0]|[1.0] |[0.5003734471544522]|[0.500589332376345]
[0.0,0.0]|[0.0] |[0.5003674110937019]|[0.500095157458231]

Subtracting instead of adding produces the same problem, except everything is 0.99 something instead of 0.50 something. 5000 epochs produces the same result, except the batch-trained network returns exactly 0.5 for each case. (Heck, even 10,000 epochs didn't work for batch training.)

Is there anything in general that could produce this behavior?

Also, I looked at the intermediate errors for incremental training, and the although the inputs of the hidden/input layers varied, the error for the output neuron was always +/-0.12. For batch training, the errors were increasing, but extremely slowly and the errors were all extremely small (x10^-7). Different initial random weights and biases made no difference, either.

Note that this is a school project, so hints/guides would be more helpful. Although reinventing the wheel and making my own network (in a language I don't know well!) was a horrible idea, I felt it would be more appropriate for a school project (so I know what's going on...in theory, at least. There doesn't seem to be a computer science teacher at my school).

EDIT: Two layers, an input layer of 2 inputs to 8 outputs, and an output layer of 8 inputs to 1 output, produces much the same results: 0.5+/-0.2 (or so) for each training case. I'm also playing around with pyBrain, seeing if any network structure there will work.

Edit 2: I am using a learning rate of 0.1. Sorry for forgetting about that.

Edit 3: Pybrain's "trainUntilConvergence" doesn't get me a fully trained network, either, but 20000 epochs does, with 16 neurons in the hidden layer. 10000 epochs and 4 neurons, not so much, but close. So, in Haskell, with the input layer having 2 inputs & 2 outputs, hidden layer with 2 inputs and 8 outputs, and output layer with 8 inputs and 1 output...I get the same problem with 10000 epochs. And with 20000 epochs.

Edit 4: I ran the network by hand again based on the MIT PDF above, and the values match, so the code should be correct unless I am misunderstanding those equations.

Some of my source code is at http://hpaste.org/42453/neural_network__not_working; I'm working on cleaning my code somewhat and putting it in a Github (rather than a private Bitbucket) repository.

All of the relevant source code is now at https://github.com/l33tnerd/hsann.

解决方案

I haven't tested it with the XOR problem in the question, but for my original dataset based on Tic-Tac-Toe, I believe that I have gotten the network to train somewhat (I only ran 1000 epochs, which wasn't enough): the quickpropagation network can win/tie over half of its games; backpropagation can get about 41%. The problems came down to implementation errors (small ones) and not understanding the difference between the error derivative (which is per-weight) and the error for each neuron, which I did not pick up on in my research. @darkcanuck's answer about training the bias similarly to a weight would probably have helped, though I didn't implement it. I also rewrote my code in Python so that I could more easily hack with it. Therefore, although I haven't gotten the network to match the minimax algorithm's efficiency, I believe that I have managed to solve the problem.

这篇关于神经网络始终为任何输入生成相同/相似的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆