泛化Q学习与连续*动*的工作空间 [英] Generalizing Q-learning to work with a continuous *action* space

查看:180
本文介绍了泛化Q学习与连续*动*的工作空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图让代理去学习必要的鼠标运动的加强,以最好的执行某些任务,学习设置(即奖励信号是学习的唯一的反馈)。

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning).

我希望用Q学习技术,但同时我发现的办法这种方法扩展到连续状态空间时,我似乎无法弄清楚如何适应的问题有连续动作空间

I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space.

我可以强迫所有的鼠标动作得到一定幅度的,并只在一定数量不同的方向,但使得行动离散会产生巨大的作用空间中的任何合理的方式。由于标准Q学习要求代理评估的所有的可能行动,这种近似不会解决问题在任何实际意义。

I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action space. Since standard Q-learning requires the agent to evaluate all possible actions, such an approximation doesn't solve the problem in any practical sense.

推荐答案

处理这个问题的常用方法是用的演员评论家方法的。这些自然延伸到连续动作的空间。基本Q学习可以发散但有近似工作时,如果你仍然想使用它,你可以尝试用自组织映射结合,如在做的自组织地图中的应用,以强化学习。该文件还包含一些进一步的引用,你可能会发现有用的。

The common way of dealing with this problem is with actor-critic methods. These naturally extend to continuous action spaces. Basic Q-learning could diverge when working with approximations, however, if you still want to use it, you can try combining it with a self-organizing map, as done in "Applications of the self-organising map to reinforcement learning". The paper also contains some further references you might find useful.

这篇关于泛化Q学习与连续*动*的工作空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆