q学习计算中的大量状态 [英] The huge amount of states in q-learning calculation

查看：160 发布时间：2020/5/4 9:52:40 c++ machine-learning reinforcement-learning

本文介绍了q学习计算中的大量状态的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我通过q-learning实现了3x3 OX游戏(它在AI vs AI和AI vs Human上都可以完美运行)，但是我无法进一步前进到4x4 OX游戏，因为它将耗尽我所有的PC内存并崩溃.

I implemented a 3x3 OX game by q-learning ( it works perfectly in AI v.s AI and AI v.s Human), but I can't go one step further to 4x4 OX game since it will eat up all my PC memory and crash.

这是我当前的问题: 大规模访问冲突?

Here is my current problem: Access violation in huge array?

据我了解，一个3x3的OX游戏共有3(空格，白色，黑色)^ 9 = 19683个可能的状态. (相同模式的不同角度仍算在内)

In my understanding, a 3x3 OX game has a total 3(space, white, black) ^ 9 = 19683 possible states. ( same pattern different angle still count )

对于4x4 OX游戏，总状态为3 ^ 16 = 43,046,721

For a 4x4 OX game, the total state will be 3 ^ 16 = 43,046,721

对于15x15的常规围棋游戏，总状态为3 ^ 225〜2.5 x 10 ^ 107

For a regular go game, 15x15 board, the total state will be 3 ^ 225 ~ 2.5 x 10^107

Q1.我想知道我的计算正确与否. (对于4x4 OX游戏，我需要3 ^ 16数组吗?)

Q1. I want to know my calculation is correct or not. ( for 4x4 OX game, I need a 3^16 array ? )

Q2.由于我需要计算每个Q值(针对每个状态，每个动作)，因此我需要大量的数组，这是预期的吗?有什么办法可以避免吗?

Q2. Since I need to calculate each Q value ( for each state, each action), I need such a large number of array, is it expected? any way to avoid it?

推荐答案

如果您不打算重新发明轮子，以下是解决此问题的方法:

If you skip reinventing the wheel, here is what have done to solve this problem:

该模型是一个卷积神经网络，使用 Q学习，其输入为原始像素，其输出为一个值估算未来回报的功能.我们将方法应用于7个Atari 来自Arcade学习环境的2600场游戏，无需进行任何调整体系结构或学习算法.

The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm.

https://arxiv.org/pdf/1312.5602v1.pdf

我们可以用神经网络来表示我们的Q函数状态(四个游戏画面)和动作作为输入并输出相应的Q值.或者，我们只能拍摄游戏画面作为输入并输出每个可能动作的Q值.这方法的优势在于，如果我们要执行Q值更新或选择Q值最高的动作，我们只需要做一个通过网络前进并具有所有动作的所有Q值立即可用.

We could represent our Q-function with a neural network, that takes the state (four game screens) and action as input and outputs the corresponding Q-value. Alternatively we could take only game screens as input and output the Q-value for each possible action. This approach has the advantage, that if we want to perform a Q-value update or pick the action with highest Q-value, we only have to do one forward pass through the network and have all Q-values for all actions immediately available.

https://neuro.cs.ut.ee/demystifying-深度强化学习/

这篇关于q学习计算中的大量状态的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

q学习计算中的大量状态 [英] The huge amount of states in q-learning calculation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

q学习计算中的大量状态 [英] The huge amount of states in q-learning calculation

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭