为什么要限制连续动作? [英] Why should continuous actions be clamped?

查看：101 发布时间：2020/7/24 9:56:29 deep-learning reinforcement-learning continuous ml-agent

本文介绍了为什么要限制连续动作?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在深度强化学习中，使用连续的动作空间，为什么似乎通常的做法是在特工执行之前紧紧抓住动作?

In Deep Reinforcement Learning, using continuous action spaces, why does it seem to be common practice to clamp the action right before the agent's execution?

示例:

OpenAI健身山地车 https://github.com/openai/健身房/blob/master/gym/envs/classic_control/continuous_mountain_car.py#L57

OpenAI Gym Mountain Car https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py#L57

Unity 3DBall

Unity 3DBall https://github.com/Unity-Technologies/ml-agents/blob/master/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs#L29

信息不会丢失吗?就像模型为速度(运动)输出+10，然后将其固定为+1一样，动作本身的行为也很离散(就其执行而言).对于细粒度的运动，将输出乘以0.1之类的值更有意义吗?

Isn't information lost doing so? Like if the model outputs +10 for velocity (moving), which is then clamped to +1, the action itself behaves rather discrete (concerning its mere execution). For a fine grained movement, wouldn't it make more sense to multiply the output by something like 0.1?

为什么要限制连续动作? [英] Why should continuous actions be clamped?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么要限制连续动作? [英] Why should continuous actions be clamped?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭