神经网络如何使用遗传算法和反向传播玩游戏? [英] How do neural networks use genetic algorithms and backpropagation to play games?

查看:101
本文介绍了神经网络如何使用遗传算法和反向传播玩游戏?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了这是YouTube上有关遗传算法的有趣视频.

正如您在视频中看到的那样,机器人学会了战斗.
现在,我学习神经网络已有一段时间了,我想开始学习遗传算法.

As you can see in the video, the bots learn to fight.
Now, I have been studying neural networks for a while and I wanted to start learning genetic algorithms.. This somehow combines both.

您如何结合遗传算法和神经网络来做到这一点?
而且,在这种情况下,您如何知道用于反向传播,更新权重和训练网络的错误呢?您还如何看待视频中的程序计算其适应度函数?我想视频中的程序肯定会发生突变,但是交叉呢?

How do you combine genetic algorithms and neural networks to do this?
And also how does one know the error in this case which you use to back-propagate and update your weights and train the net? And also how do you think the program in the video calculated its fitness function ? I guess mutation is definitely happening in the program in the video but what about crossover ?

谢谢!

推荐答案

这是一个强化学习问题,其中神经网络的输出是要按顺序按下的键盘上的使适应度函数给出的分数最大化.使用遗传算法(GA)并从初始的神经网络架构开始,GA往往会找到一种更好的架构,该架构可以迭代地最大化适应度函数.遗传算法通过繁殖大量的体系结构来生成不同的体系结构,然后将它们用于任务(玩游戏),选择得分更高的体系结构(使用适应度函数).下次,GA使用最佳候选结构(GA术语中的双亲)进行育种,并再次重复生成新种群(结构)的过程.当然,育种也包括突变.

Well this is a reinforcement learning problem in which the outputs of the neural network are the keys on the keyboard to be pressed in order to maximize a score given by the fitness function. Using genetic algorithms (GAs) and starting from an initial neural network architecture the GA tends to find a better architecture that maximizes a fitness function, iteratively. The GA generates different architectures by breeding a population of them and then uses them for the task (playing the game), selects the one yielding a higher score (using the fitness function). Next time the GA uses the best architecture candidates (parents in GA terminology) to use for breeding and again repeats the process of generating new population (architectures). Of course, breeding includes mutation too.

此过程一直进行到满足终止标准(适应度函数的特定值或生成大量总体)为止.您可能会注意到,遗传算法的计算量很大,因此被大规模问题抛弃了.自然地,在生成架构时,会使用反向传播或任何其他适用的优化技术(包括GA)对其进行训练.

This process continues until a termination criteria is met (a specific value for the fitness function or generating a number of populations). You may note that genetic algorithms are very computationally intensive and therefore are kind of abandoned for large-scale problems. Naturally, when a architecture is generated it is trained using backpropagation or any other applicable optimization technique, including GAs.

例如,此视频显示了遗传算法如何帮助选择"最好的"架构来玩Mario,它做得非常好!但是,请注意,如果GA选择一种架构可以在一个级别上很好地玩Mario,那么该架构在下一个级别中不一定会表现良好,如

For instance, this video shows how genetic algorithms help selecting the "best" architecture to play Mario, and it does it very well! However, note that if GA selects an architecture to play Mario very well in one level, that architecture will not be necessarily doing well in next levels as shown in another video. In my opinion, this is because both genetic algorithms and backpropagation tend to find a local minima. So there is still a long way to go ...

来源

  • Genetic Algorithms
  • Fitness function
  • The paper Evolving Neural Networks through Augmenting Topologies

这篇关于神经网络如何使用遗传算法和反向传播玩游戏?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆