分层强化学习的实现 [英] Implementations of Hierarchical Reinforcement Learning

查看:91
本文介绍了分层强化学习的实现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何人都可以推荐可以通过抽象处理大型状态空间的强化学习库或框架吗?

Can anyone recommend a reinforcement learning library or framework that can handle large state spaces by abstracting them?

我正在尝试为游戏世界中的小代理实现智能.该代理由一个小型的两轮机器人代表,该机器人可以向前和向后移动,以及向左和向右旋转.它具有用于检测地面边界的一对传感器,用于检测远处物体的一对超声波传感器以及用于检测与物体或对手的接触的一对碰撞传感器.它也可以做一些简单的航位推算,以其起始位置为参考来估算其在世界上的位置.因此,所有可用的状态功能是:

I'm attempting to implement the intelligence for a small agent in a game world. The agent is represented by a small two-wheeled robot that can move forward and backwards, and turn left and right. It has a couple sensors for detecting a boundary on the ground, a couple ultrasonic sensors for detecting objects far away, and a couple bump sensors for detecting contact with an object or opponent. It also can do some simple dead reckoning to estimate its position in the world using its starting position as a reference. So all the state features available to it are:

edge_detected=0|1
edge_left=0|1
edge_right=0|1
edge_both=0|1
sonar_detected=0|1
sonar_left=0|1
sonar_left_dist=near|far|very_far
sonar_right=0|1
sonar_right_dist=near|far|very_far
sonar_both=0|1
contact_detected=0|1
contact_left=0|1
contact_right=0|1
contact_both=0|1
estimated_distance_from_edge_in_front=near|far|very_far
estimated_distance_from_edge_in_back=near|far|very_far
estimated_distance_from_edge_to_left=near|far|very_far
estimated_distance_from_edge_to_right=near|far|very_far

目标是确定接收到奖励信号的状态,并学习一种策略,以尽快获取该奖励.在传统的马尔可夫模型中,离散表示的状态空间将具有2985984个可能的值,对于使用Q学习或SARSA之类的方法探究每个状态空间而言,这是太多了.

The goal is to identify the state where the reward signal is received, and learn a policy to acquire that reward as quickly as possible. In a traditional Markov model, this state space represented discretely would have 2985984 possible values, which is far too much to explore each and every one using something like Q-learning or SARSA.

任何人都可以推荐适合该领域的增强库(最好是使用Python绑定)或我可能自己实现的未实现算法吗?

Can anyone recommend a reinforcement library appropriate for this domain (preferably with Python bindings) or an unimplemented algorithm that I could potentially implement myself?

推荐答案

您的实际状态是机器人在世界上的位置和方向.使用这些传感器读数是一个近似值,因为它可能使许多状态难以区分.

Your actual state is the robot's position and orientation in the world. Using these sensor readings is an approximation, since it is likely to render many states indistinguishable.

现在,如果您走这条路,则可以使用线性函数逼近.那么这就是24个二元特征(12 0 | 1 + 6 * 2 near | far | very_far).这是一个很小的数目,您甚至可以使用所有功能对进行学习.在这条路的更远处是在线发现功能依赖关系(例如,参见Alborz Geramifard的论文).这直接关系到您对分层学习的兴趣.

Now, if you go down this road, you could use linear function approximation. Then this is just 24 binary features (12 0|1 + 6*2 near|far|very_far). This is such a small number that you could even use all pairs of features for learning. Farther down this road is online discovery of feature dependencies (see Alborz Geramifard's paper, for example). This is directly related to your interest in hierarchical learning.

一种替代方法是使用常规算法来跟踪机器人的位置,并将该位置用作RL的输入.

An alternative is to use a conventional algorithm to track the robot's position and use the position as input to RL.

这篇关于分层强化学习的实现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆