机器学习建模纸牌游戏 [英] Modelling card game for machine learning

查看:80
本文介绍了机器学习建模纸牌游戏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找帮助对该机器学习问题建模的帮助.

一只手由三行组成(分别包含3张,5张和5张卡).您的目标是打造一手得分最高的手.您会在称为街道的间隔中收到卡片,在第一条街道中收到五张卡片,在接下来的四个街道中收到三张卡片(必须在最后四个街道中丢弃其中一张卡片).放置卡片后便无法移动卡片. 有关得分的详细信息.

我的目标是建立一个在给定街道的情况下可以像我们最好的球员一样打球的系统.似乎很明显,我需要为每条街道建立一个神经网络,使用基于现有手和街道上卡片组的功能.我有很多数据(街道,展示位置和最终得分),但由于可能的输出在卡组中是唯一的(尽管少于3 ^ 5),因此我不确定如何对问题进行建模展示位置在第一条街,然后是3 ^ 3).我以前只处理固定类别的分类问题.

有人在有独特输出时有类似问题的例子或建议如何准备训练数据吗?

解决方案

一个模糊的问题给出了一个模糊的答案(这是我懒于编写代码的借口;-.).

您编写了许多数据,似乎您希望将游戏映射到通过监督学习获得的经验上.但这不是游戏优化的工作方式.通常不进行监督学习,而是进行强化学习.差异是细微的,但是强化学习(以马尔可夫决策过程为理论基础)提供了更多的局部视角-像在特定状态下优化决策一样.监督学习反而对应于一次优化多个决策.

对于通常的有监督学习方法,另一个显示停止器是,即使您有很多数据,也肯定会太少.并且不会提供必需的路径".

至少由于Thesauro的步步高玩家是这样,所以通常的方法是:设置游戏的基本规则,可能将人类知识引入启发式方法,然后让程序尽可能多地与自己对战-这就是Google的深度例如,您可以设置主控围棋运动员.另请参见此有趣的视频.

就您而言,任务原则上不应该那么难,因为游戏状态的数量相对较少,而且重要的是,心理学所涉及的任何问题(例如虚张声势,持续打球等)都完全不存在. /p>

再说一遍:构建一个可以与自己抗衡的机器人.一个通用的基础是功能Q(S,a),该功能为任何游戏状态和玩家可能采取的行动分配一个值-这称为Q学习.而且此功能通常被实现为神经网络……尽管我认为这里不需要那么复杂.

我暂时不要再说了.但如有必要,我很乐意为您提供进一步的帮助.

I'm looking for some help modelling this machine learning problem.

A hand consists of three rows (containing 3, 5, and 5 cards respectively). Your goal is to build a hand that scores the most points. You receive the cards in intervals called streets, five cards in the first street, and three in the next four streets (you must discard one of the cards in the final four streets). Cards can't be moved once you place them. More details on scoring.

My goal is to build a system that, given a set of streets, plays the hand similar to our best players. It seem pretty clear that I'll need to build a neural network for each street, using features based on the existing hand and the set of cards in the street. I've got plenty of data (streets, placements, and final score), but I'm a little unsure how to model the problem given that the possible outputs are unique on the set of cards (although there are less than 3^5 placements in the first street, and 3^3 after). I've previously only dealt with classification problems with fixed categories.

Does anyone have an example of a similar problem or suggestions how to prepare the training data when you have unique outputs?

解决方案

A vague question gives a vague answer (which is my excuse for being too lazy to code ;-).

You wrote you have a lot of data, and it seems you want to map the game onto experience gained with supervised learning. But that is not the way game-optimization works. One usually does not perform supervised learning, but rather reinforcement learning. The differences are subtle, but reinforcement learning (with Markov decision processes as its theoretical basis) offers more a local view -- like optimize the decision given a specific state. Supervised learning rather corresponds to optimize several decisions at once.

Another show stopper for the usual supervised learning approach is that even if you have a lot of data, it will almost surely be too little. And it will not offer the "required paths".

The usual approach at least since Thesauro's backgammon player is rather: set up the basic rules of the game, possibly introduce human knowledge as heuristics, and then let the program play against itself as often as possible -- this is how google deep mind set up a master go player, for example. See also this interesting video.

In your case, the task should in principle be not that hard, as there is a comparatively small number of game states and, importantly, any issues involved by psychology like bluffing, consistent playing, and so on are completely absent.

So again: build a bot which can play against itself. One common basis is a function Q(S,a) which assigns to any game state and possible action of the player a value -- this is called Q-learning. And this function is often implemented as a neural network ... although I would think it does not need to be that sophisticated here.

I'll stay that vague for now. But I would be glad to assist you further if necessary.

这篇关于机器学习建模纸牌游戏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆