通过可变动作进行强化学习 [英] Reinforcement Learning With Variable Actions

查看:170
本文介绍了通过可变动作进行强化学习的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所有强化学习算法通常都应用于具有固定数量的单个代理动作.是否有任何强化学习算法可在考虑可变数量的动作的同时做出决定?例如,您如何在玩家控制N名士兵,并且每个士兵根据其状况随机选择动作的计算机游戏中应用RL算法?您无法为全球决策者(即将军")制定固定数量的行动,因为随着士兵的创造和死亡,可用的行动不断变化.而且您不能在士兵级别制定固定数量的动作,因为士兵的动作是根据其直接环境而定的.如果一个士兵没有看到对手,那么它可能只能走路,而如果看到10个对手,那么它将有10种可能的新动作,攻击10个对手中的1个.

All the reinforcement learning algorithms I've read about are usually applied to a single agent that has a fixed number of actions. Are there any reinforcement learning algorithms for making a decision while taking into account a variable number of actions? For example, how would you apply a RL algorithm in a computer game where a player controls N soldiers, and each soldier has a random number of actions based its condition? You can't formulate fixed number of actions for a global decision maker (i.e. "the general") because the available actions are continually changing as soldiers are created and killed. And you can't formulate a fixed number of actions at the soldier level, since the soldier's actions are conditional based on its immediate environment. If a soldier sees no opponents, then it might only be able to walk, whereas if it sees 10 opponents, then it has 10 new possible actions, attacking 1 of the 10 opponents.

推荐答案

您所描述的没有什么异常.强化学习是一种找到马尔可夫决策过程的价值函数的方式.在MDP中,每个状态都有其自己的一组动作.要继续进行强化学习应用程序,您必须明确定义问题中的状态,动作和奖励.

What you describe is nothing unusual. Reinforcement learning is a way of finding the value function of a Markov Decision Process. In an MDP, every state has its own set of actions. To proceed with reinforcement learning application, you have to clearly define what the states, actions, and rewards are in your problem.

这篇关于通过可变动作进行强化学习的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆