需要有关Buraco纸牌游戏中令人愉悦的AI的建议 [英] Needed advice for an enjoyable AI in Buraco card game

查看:67
本文介绍了需要有关Buraco纸牌游戏中令人愉悦的AI的建议的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试为 Buraco 纸牌游戏构建有效的AI(2和4位玩家).

我想避免启发式方法:我不是游戏专家,而在过去的游戏中,我都是以这种方式开发的,因此获得了中等的结果.

我知道蒙特卡洛树搜索算法,我已经将它用于跳棋游戏结果不连续,但我对其他机器学习选项最近的成功感到非常困惑.

例如,我在堆栈溢出中找到了此答案,这真的使我感到困惑, 它说 : 再说一遍:建立一个可以与自己对战的机器人.一个共同的基础是函数Q(S,a),该函数为任何游戏状态和玩家可能采取的行动分配一个值-这就是所谓的Q学习.这就是所谓的Q学习.函数通常被实现为神经网络……尽管我认为这里不需要那么复杂."

我对机器学习非常陌生(应该是强化学习,对吗?),我只了解一些Q学习,但这听起来是个好主意:我拿起我的机器人,与自己对战,然后它从结果中学到了……问题是我不知道如何开始! (无论这种方法是否有效,都无效).

您能帮我找到正确的方向吗? Q学习策略对我的领域来说是一种好策略吗? 蒙特卡洛仍然是我的最佳选择吗? 在Buraco这样的4人游戏(2个对手和1个队友)中,它会很好地工作吗? 还有其他我忽略的方法吗?

PS:我的目标是为休闲应用开发令人愉悦的AI,我什至可以考虑通过例如观察玩家的手或甲板来使AI作弊的可能性.即使如此,嗯,我仍无法建立良好的启发式方法,我想:/

谢谢你们的帮助!

解决方案

再想一想您在MCTS中的知识以及您对Checkers的实现,我认为我的回答过于教学法了-如果您了解了很多,了解MCTS.不过,我希望它会有所帮助.随时在评论中提出具体问题.


我是一个建议您提出一个新问题的人,所以我跌倒了,我至少应该在这里给出答案.但是,从一开始,我想澄清一下这里应该发生的情况.您要求的内容非常复杂,并且需要一定的实践经验.我有机会为很多游戏实施最佳策略,我认为这仍然是一个严峻的挑战.因此,我认为这里的目标是获得概述-而不是适当的解决方案(由于我不了解Buraco的游戏,所以我无法给出任何解决方案).

正如在另一个答案中所述,强化学习理论提供了坚实的基础.一个很好的介绍是萨顿和巴托同名的书.这真的很容易理解,我建议您遍历前五章左右(这些内容涵盖了Dynamic Programming和Monte Carlo).本书不足之处有两点:(i)没有明确适用于两人游戏,(ii)主要依赖于值函数的表格表示(没有神经网络之类的东西).

实现的基本部分是游戏的状态S.此状态应尽可能紧凑且非冗余地捕获游戏的完整当前状态.此外,对于每个状态S,您需要能够分配可用的操作A(例如,抓牌).此外,根据您要应用的方法,有助于了解概率分布p(S'|S,A),该概率分布给出了处于状态S并执行操作A时处于状态S'的概率.最后,您需要在状态S中指定奖励r(S'|S,A),并执行操作AS'结尾(对于两个玩家零和游戏,可以很容易地选择奖励: S'是您获胜的状态,如果您输了-1,则会得到+1作为奖励,否则,是0 –但是,对于拥有更多玩家或非零和游戏的游戏不成立.

从给定状态S,您还可以导出单个玩家看到的简化状态S_P1S_P2等.这些状态捕获特定玩家的信息(当然比完整状态要少-例如,玩家不知道对手的纸牌,也不知道甲板上的纸牌顺序).这些减少的状态为玩家做出决定提供了基础.因此,玩家1的目标是获得一个告诉他的函数Q_1(S_P1, A):当我处于状态S_P1时,我最多应该执行动作A(这是Q学习的思想).其他玩家也一样.必须对这些功能进行培训,以便取得最佳效果.

做到这一点的方法是通过强化学习的中心方程式,即贝尔曼方程式.有几种解决方法,例如值迭代,蒙特卡洛(基本上是您在链接中引用的方法),时间差方法等.这些方法中的一些可以解释为您的数字播放器,它们反复地相互竞争,并且在此过程中有望彼此变得更好.但是,这不能保证,就像任何优化问题一样,很容易陷入局部最小值.充其量,您已经手头有个不错的数字玩家(从某个地方),并且只训练一个玩家来击败他-这样,您可以将两人游戏的问题减少为单人游戏的问题,这要容易得多.

我会在这里进行介绍,因为从书中确实可以更好地理解这些内容,并且我知道即使答案长达十页,该答案也几乎对您没有帮助.最后一个建议是如何进行:(i)从书中获得基础,(ii)在其中实现一些玩具问题,(iii)将内容扩展到两个玩家游戏-有关此的最小极大定理,(iv)解决一些更简单的游戏(例如井字游戏,甚至要花很长时间),(v)熟悉神经网络和所有这些东西,并且(vi)解决Buraco-但是,这是一个艰巨的程序.

i’m trying to build an effective AI for the Buraco card game (2 and 4 players).

I want to avoid the heuristic approach : i’m not an expert of the game and for the last games i’ve developed this way i obtained mediocre results with that path.

I know the montecarlo tree search algorithm, i’ve used it for a checkers game with discrete result but I’m really confused by the recent success of other Machine Learning options.

For example i found this answer in stack overflow that really puzzles me, it says : "So again: build a bot which can play against itself. One common basis is a function Q(S,a) which assigns to any game state and possible action of the player a value -- this is called Q-learning. And this function is often implemented as a neural network ... although I would think it does not need to be that sophisticated here."

I’m very new to Machine Learning (this should be Reinforcement Learning, right?) and i only know a little of Q-learning but it sounds like a great idea: i take my bot, making play against itself and then it learns from its results… the problem is that i have no idea how to start! (and neither if this approach could be good or not).

Could you help me to get the right direction? Is the Q-learning strategy a good one for my domain? Is the Montecarlo still the best option for me? Would it work well in a 4 players game like Buraco (2 opponents and 1 team mate)? Is there any other method that i’m ignoring?

PS: My goal is to develop an enjoyable AI for a casual application, i can even consider the possibility to make the AI cheating for example by looking at the players hands or deck. Even with this, ehm, permission i would not be able to build a good heuristic, i think :/

Thank you guys for your help!

解决方案

EDIT: thinking again about your knowledge in MCTS and your implementation of checkers, I think my answer is formulated too pedagogical -- you probably know much of this if you know MCTS. Nevertheless, I hope it helps. Feel free to ask specific questions in the comments.


I was the one to suggest you to open a new question, so I fell I should at least give an answer here. However, right at the beginning, I want to clarify about what should be expected here. The things you are asking for are quite complex and require some solid experience in the realization. I had the occasion to implement optimal strategies for quite a few games, and I'd consider this here still a heavy challenge. So I think the goal here is to get some overview -- and not a proper solution (which I couldn't give anyhow as I don't know the game of Buraco).

As stated in the other answer, the theory of reinforcement learning provides a solid foundation. A good introduction is the book of Sutton and Barto with the same title. It's really readible and I'd suggest to get through the first five chapters or so (these cover Dynamic Programming and Monte Carlo). Two points in which this book is not sufficient are: (i) it does not give explicit application to two-player games, and (ii) it mainly relies on tabular representations of the value functions (--no neural networks and the like are involved).

A elementary part of the implementation is the state S of the game. This state should capture the complete current status of the game as compact and non-redundant as possible. Further, for each state S, you need to be able to assign the available actions A (like taking a card). Moreover, depending on the method you want to apply, it helps to know the probability distribution p(S'|S,A) which gives the probability to end up in state S' when you are in state S and do action A. And finally, you need to specify the reward r(S'|S,A) when you are in state S and do the action A to end in S' (for two player zero-sum games the reward can be chosen quite easily: when S' is the state where you win, you get +1 as reward, if you lose -1, otherwise 0 -- this however does not hold for games with more players or non-zero-sum games).

From a given state S, you further can derive the reduced states S_P1, S_P2, etc. which the single players see. Those states capture the information of a specific player (which is of course less than the complete state -- a player does not know the cards of the opponent, and also not the order of cards in the deck, for example). These reduced states provide the base on which players make their decisions. So, the goal for player 1 is to get a function Q_1(S_P1, A) which tells him: when I'm in state S_P1, I should at best do the action A (this is the idea of Q-learning). And the same holds for the other players. These functions have to be trained so that optimal results come out.

The way to do so is through the central equation of reinforcement learning, the Bellman equation. There are several solution methods like value iteration, Monte Carlo (this is basically the method you referenced in your link), temporal difference methods, and so on. Some of these methods can be interpreted as your digital players which repeatedly play against each other and in this process hopefully get better each. However, this is not guaranteed, it is easy to get stuck in local minima like in any optimization problem. At best you already have a good digital player at hand (from somewhere) and you train only one player to beat him -- in this way you reduce the two-player problem to a single-player problem which is much easier.

I'll make a cut here, because this stuff is really better to grasp from the book, and because I know this answer won't help you practically even if it were ten pages long. Here one last suggestion how to proceed: (i) get the foundations from the book, (ii) implement some toy problems therein, (iii) extend the stuff to two player games -- see also the minimax theorem for this, (iv) solve some more easy games (like tic-tac-toe, even this takes long), (v) get acquainted with neural networks and all this stuff, and (vi) tackle Buraco -- that's a tough program, however.

这篇关于需要有关Buraco纸牌游戏中令人愉悦的AI的建议的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆