为什么要限制连续动作? [英] Why should continuous actions be clamped?

查看:101
本文介绍了为什么要限制连续动作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在深度强化学习中,使用连续的动作空间,为什么似乎通常的做法是在特工执行之前紧紧抓住动作?

In Deep Reinforcement Learning, using continuous action spaces, why does it seem to be common practice to clamp the action right before the agent's execution?

示例:

OpenAI健身山地车 https://github.com/openai/健身房/blob/master/gym/envs/classic_control/continuous_mountain_car.py#L57

OpenAI Gym Mountain Car https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py#L57

Unity 3DBall

Unity 3DBall https://github.com/Unity-Technologies/ml-agents/blob/master/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs#L29

信息不会丢失吗?就像模型为速度(运动)输出+10,然后将其固定为+1一样,动作本身的行为也很离散(就其执行而言).对于细粒度的运动,将输出乘以0.1之类的值更有意义吗?

Isn't information lost doing so? Like if the model outputs +10 for velocity (moving), which is then clamped to +1, the action itself behaves rather discrete (concerning its mere execution). For a fine grained movement, wouldn't it make more sense to multiply the output by something like 0.1?

推荐答案

这可能只是为了对代理可以执行的操作施加约束.代理人可能想采取行动,将速度提高1,000,000.但是,如果代理人是具有弱发动机的自动驾驶汽车,最多只能以1个单位加速,那么我们就不在乎该代理人是否假设会想以更多单位进行加速.汽车的引擎功能有限.

This is probably simply done to enforce constraints on what the agent can do. Maybe the agent would like to put out an action that increases velocity by 1,000,000. But if the agent is a self-driving car with a weak engine that can at most accelerate by 1 unit, we don't care if the agent would hypothetically like to accelerate by more units. The car's engine has limited capabilities.

这篇关于为什么要限制连续动作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆