如何将强化学习应用于连续动作空间? [英] How can I apply reinforcement learning to continuous action spaces?

查看:287
本文介绍了如何将强化学习应用于连续动作空间?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图找一个特工来学习在强化学习设置中最好地执行某些任务所必需的鼠标移动(即,奖励信号是唯一的学习反馈).

I'm trying to get an agent to learn the mouse movements necessary to best perform some task in a reinforcement learning setting (i.e. the reward signal is the only feedback for learning).

我希望使用Q学习技术,但是当我发现

I'm hoping to use the Q-learning technique, but while I've found a way to extend this method to continuous state spaces, I can't seem to figure out how to accommodate a problem with a continuous action space.

我可以强制所有鼠标移动一定程度,并且只能在一定数量的不同方向上移动,但是使动作离散的任何合理方法都会产生巨大的动作空间.由于标准的Q学习需要代理评估所有可能采取的措施,因此这种近似值在任何实际意义上都无法解决问题.

I could just force all mouse movement to be of a certain magnitude and in only a certain number of different directions, but any reasonable way of making the actions discrete would yield a huge action space. Since standard Q-learning requires the agent to evaluate all possible actions, such an approximation doesn't solve the problem in any practical sense.

推荐答案

解决此问题的常用方法是使用演员批评方法.这些自然地延伸到连续的动作空间.基本Q学习在使用近似值时可能会有所不同,但是,如果您仍然想使用它,可以尝试将其与自组织映射结合起来,如自组织映射在强化学习中的应用" .本文还包含一些可能会有用的其他参考.

The common way of dealing with this problem is with actor-critic methods. These naturally extend to continuous action spaces. Basic Q-learning could diverge when working with approximations, however, if you still want to use it, you can try combining it with a self-organizing map, as done in "Applications of the self-organising map to reinforcement learning". The paper also contains some further references you might find useful.

这篇关于如何将强化学习应用于连续动作空间?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆