使用minimax搜索信息不完善的纸牌游戏 [英] Using minimax search for card games with imperfect information

查看:82
本文介绍了使用minimax搜索信息不完善的纸牌游戏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用minimax搜索(带有alpha-beta修剪),或者更确切地说是negamax搜索,以使计算机程序玩纸牌游戏.

I want to use minimax search (with alpha-beta pruning), or rather negamax search, to make a computer program play a card game.

纸牌游戏实际上由4个玩家组成.因此,为了能够使用minimax等,我将游戏简化为我"与其他".每次移动"之后,您都可以从游戏本身客观地读取当前状态的评估.当所有4位玩家都放置了卡后,最高的玩家将全部获胜-且卡的值也将计算在内.

The card game actually consists of 4 players. So in order to be able to use minimax etc., I simplify the game to "me" against the "others". After each "move", you can objectively read the current state's evaluation from the game itself. When all 4 players have placed the card, the highest wins them all - and the cards' values count.

由于您不知道其他3个玩家之间的纸牌分布到底如何,我认为您必须用非您的纸牌模拟所有可能的纸牌分布(世界").您有12张卡,其他3位玩家总共有36张卡.

As you don't know how the distribution of cards between the other 3 players is exactly, I thought you must simulate all possible distributions ("worlds") with the cards that are not yours. You have 12 cards, the other 3 players have 36 cards in total.

所以我的方法是这种算法,其中player是1到3之间的数字,表示程序可能需要寻找移动的三个计算机播放器. -player代表对手,即所有其他三名球员在一起.

So my approach is this algorithm, where player is a number between 1 and 3 symbolizing the three computer players that the program might need to find moves for. And -player stands for the opponents, namely all the other three players together.

private Card computerPickCard(GameState state, ArrayList<Card> cards) {
    int bestScore = Integer.MIN_VALUE;
    Card bestMove = null;
    int nCards = cards.size();
    for (int i = 0; i < nCards; i++) {
        if (state.moveIsLegal(cards.get(i))) { // if you are allowed to place this card
            int score;
            GameState futureState = state.testMove(cards.get(i)); // a move is the placing of a card (which returns a new game state)
            score = negamaxSearch(-state.getPlayersTurn(), futureState, 1, Integer.MIN_VALUE, Integer.MAX_VALUE);
            if (score > bestScore) {
                bestScore = score;
                bestMove = cards.get(i);
            }
        }
    }
    // now bestMove is the card to place
}

private int negamaxSearch(int player, GameState state, int depthLeft, int alpha, int beta) {
    ArrayList<Card> cards;
    if (player >= 1 && player <= 3) {
        cards = state.getCards(player);
    }
    else {
        if (player == -1) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(2));
            cards.addAll(state.getCards(3));
        }
        else if (player == -2) {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(3));
        }
        else {
            cards = state.getCards(0);
            cards.addAll(state.getCards(1));
            cards.addAll(state.getCards(2));
        }
    }
    if (depthLeft <= 0 || state.isEnd()) { // end of recursion as the game is finished or max depth is reached
        if (player >= 1 && player <= 3) {
            return state.getCurrentPoints(player); // player's points as a positive value (for self)
        }
        else {
            return -state.getCurrentPoints(-player); // player's points as a negative value (for others)
        }
    }
    else {
        int score;
        int nCards = cards.size();
        if (player > 0) { // make one move (it's player's turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // wenn Zug gültig ist
                    score = negamaxSuche(-player, futureState, depthLeft-1, -beta, -alpha);
                    if (score >= beta) {
                        return score;
                    }
                    if (score > alpha) {
                        alpha = score; // alpha acts like max
                    }
                }
            }
            return alpha;
        }
        else { // make three moves (it's the others' turn)
            for (int i = 0; i < nCards; i++) {
                GameState futureState = state.testMove(cards.get(i));
                if (futureState != null) { // if move is valid
                    for (int k = 0; k < nCards; k++) {
                        if (k != i) {
                            GameState futureStateLevel2 = futureState.testMove(cards.get(k));
                            if (futureStateLevel2 != null) { // if move is valid
                                for (int m = 0; m < nCards; m++) {
                                    if (m != i && m != k) {
                                        GameState futureStateLevel3 = futureStateLevel2.testMove(cards.get(m));
                                        if (futureStateLevel3 != null) { // if move is valid
                                            score = negamaxSuche(-player, futureStateLevel3, depthLeft-1, -beta, -alpha);
                                            if (score >= beta) {
                                                return score;
                                            }
                                            if (score > alpha) {
                                                alpha = score; // alpha acts like max
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
            return alpha;
        }
    }
}

这似乎工作正常,但是对于深度1(depthLeft=1),该程序已经需要平均计算50,000次移动(放置的牌).当然太多了!

This seems to work fine, but for a depth of 1 (depthLeft=1), the program already needs to calculate 50,000 moves (placed cards) on average. This is too much, of course!

所以我的问题是:

  1. 实施完全正确吗?您可以模拟这样的游戏吗?关于不完美的信息,尤其是?
  2. 如何改善算法的速度和工作量?
  3. 例如,我可以将可能的移动次数减少到50%的随机次数以提高速度,同时保持良好的结果吗?
  4. 我发现 UCT算法是一个很好的解决方案(也许是).你知道这个算法吗?您能帮我实现它吗?
  1. Is the implementation correct at all? Can you simulate a game like this? Regarding the imperfect information, especially?
  2. How can you improve the algorithm in speed and work load?
  3. Can I, for example, reduce the set of possible moves to a random set of 50% to improve speed, while keeping good results?
  4. I found UCT algorithm to be a good solution (maybe). Do you know this algorithm? Can you help me implementing it?

推荐答案

Minimax搜索已实现,这对于不确定性很大的游戏是错误的方法.由于您不知道其他玩家之间的纸牌分布,因此您的搜索将花费大量时间探索在实际分配纸牌的情况下不会发生的游戏.

Minimax search as you've implemented it is the wrong approach for games where there is so much uncertainty. Since you don't know the card distribution among the other players, your search will spend an exponential amount of time exploring games that could not happen given the actual distribution of the cards.

我认为,更好的方法是从对其他玩家手牌了解很少或根本没有信息的良好比赛规则入手.像这样的东西:

I think a better approach would be to start with good rules for play when you have little or no information about the other players' hands. Things like:

  1. 如果您是第一轮比赛,请打出最低的牌,因为您几乎没有机会赢得比赛.
  2. 如果您在回合中排名最后,请打出最低的纸牌以赢得该回合.如果您无法赢得本回合,请打出最低的纸牌.

让您的程序起初不打扰搜索,仅遵循这些规则,并假设所有其他玩家也将使用这些试探法.当程序观察到第一张和最后一张卡时每个回合的玩家都可以建立一张有关每个玩家可能持有的纸牌的信息表.例如.一个9会赢得本回合,但是玩家3没有参加,因此他必须没有9或更高的牌.随着收集有关每个玩家手牌的信息,搜索空间最终将受到限制,以至于可能游戏的极小值搜索可能产生有关下一张要玩的纸牌的有用信息.

Have your program initially not bother with search and just play by these rules and have it assume that all the other players will use these heuristics as well. As the program observes what cards the first and last players of each round play it can build up a table of information about the cards each player likely holds. E.g. a 9 would have won this round, but player 3 didn't play it so he must not have any cards 9 or higher. As information is gathered about each player's hand the search space will eventually be constrained to the point where a minimax search of possible games could produce useful information about the next card to play.

这篇关于使用minimax搜索信息不完善的纸牌游戏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆