Tic-Tac-Toe minimax算法不适用于4x4电路板 [英] Tic-Tac-Toe minimax algorithm doesn't work with 4x4 board

查看:84
本文介绍了Tic-Tac-Toe minimax算法不适用于4x4电路板的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我过去3星期一直在从事这个项目.我设法使minimax函数可以在3x3板上早期使用,但是当我尝试在4x4板上使用minimax函数时就出现了问题,即Java堆空间错误.从那时起,借助Alpha beta修剪,我设法减少了aprox在minimax函数中所需的minimax调用次数.从59000到16000到11000,然后最终到8000个呼叫(这是假设已填充一个插槽的电路板的初始minimax呼叫).但是现在的问题是,该方法只能在4x4游戏中运行.它只是不断地不断地自我称呼,没有错误,没有结果,什么都没有.从理论上讲,我的功能应该适用于任意大小的板,唯一的问题是内存.现在,由于我大大降低了函数的内存消耗,所以我希望它能起作用.好吧,它适用于3x3.但是,它不适用于4x4. 该功能的简要说明: 该函数返回一个大小为2的数组,其中包含所有可能的下一个动作中最有利的下一个动作以及预期从该动作中获得的得分.计分系统很简单. O胜利+ 10,X胜利-10,平局0.该函数当然是递归的.在其中,您会找到某些快捷方式,这些快捷方式减少了对自身的必要调用次数.例如,如果轮到X且返回的分数是-10(这是X的最佳分数),则退出循环,即停止观察此状态下的其他潜在移动.这是State类的代码:

So I've been working on this project for the past 3 weeks now. I managed to get the minimax function to work early on for a 3x3 board, however problems started arising when I tried using it for a 4x4 board, namely Java heap space errors. Since then, with the help of Alpha beta pruning, I've managed to bring down the number of required minimax calls within the minimax function from aprox. 59000 to 16000 to 11000 and then finally to 8000 calls(This is assuming an initial minimax call for a board with one slot already filled). The problem now however, is that the method just keeps running for 4x4 games. It simply keeps calling itself non stop, no errors, no result, nothing. Theoretically, the way I see it, my function should work for arbitrary board sizes, the only problem was memory. Now, since I've reduced the memory greed of my function greatly, I'd expected it to work. Well, it does for the 3x3. However, it doesn't for the 4x4. A brief explanation of what the function does: The function returns an array of size 2 containing the most favorable next move from amongst all the possible next moves along with the score expected to be achieved from that move. The scoring system is simple. +10 for an O win, -10 for an X win and 0 for a draw. The function is of course recursive. Within it you will find certain shortcuts that reduce the number of required calls to itself. For example if it's X's turn and the returned score is -10(which is the best possible score for X) then exit the loop, i.e. stop observing other potential moves from this state. Here's the code for class State:

private String [] state;    //Actual content of the board
private String turn;        //Whose turn it is
private static int n;       //Size of the board

public State(int n) {
    state = new String[n*n];
    for(int i = 0; i < state.length; i++) {
        state[i] = "-";
    }
    State.n = n;
}


public int[] newminimax47(int z) {
    int bestScore = (turn == "O") ? +9 : -9;    //X is minimizer, O is maximizer
    int bestPos = -1;
    int currentScore;
    int lastAdded = z;
    if(isGameOver() != "Not Gameover") {
        bestScore= score();
    }
    else {
        int i = 0;
        for(int x:getAvailableMoves()) {
            if(turn == "X") {   //X is minimizer
                setX(x);
                currentScore = newminimax47(x)[0];
                if(i == 0) {
                    bestScore = currentScore;
                    bestPos = x;
                    if(bestScore == -10)
                        break;
                }
                else if(currentScore < bestScore) {
                    bestScore = currentScore;
                    bestPos = x;
                    if(bestScore == -10)
                        break;
                }
            }
            else if(turn == "O") {  //O is maximizer
                setO(x);
                currentScore = newminimax47(x)[0];
                if(i == 0) {
                    bestScore = currentScore;
                    bestPos = x;
                    if(bestScore == 10)
                        break;
                }

                else if(currentScore > bestScore) {
                    bestScore = currentScore;
                    bestPos = x;
                    if(bestScore == 10)
                        break;
                }
            }
            i++;
        }
    }
    revert(lastAdded);
    return new int [] {bestScore, bestPos};
}

newminimax47()使用的互补函数:

Complementary functions used by newminimax47():

isGameOver():

isGameOver():

public String isGameOver() {
    if(n == 3) {
        //Rows 1 to 3
        if((state[0] != "-") && (state[0] == state[1]) && (state[1] == state[2]))
            return (state[0] == "X") ? "X Won" : "O Won";
        else if((state[3] != "-") && (state[3] == state[4]) && (state[4] == state[5]))
            return (state[3] == "X") ? "X Won" : "O Won";
        else if((state[6] != "-") && (state[6] == state[7]) && (state[7] == state[8]))
            return (state[6] == "X") ? "X Won" : "O Won";

        //Columns 1 to 3
        else if((state[0] != "-")&&(state[0] == state[3]) && (state[3] == state[6]))
            return (state[0] == "X") ? "X Won" : "O Won";
        else if((state[1] != "-") && (state[1] == state[4]) && (state[4] == state[7]))
            return (state[1] == "X") ? "X Won" : "O Won";
        else if((state[2] != "-") && (state[2] == state[5]) && (state[5] == state[8]))
            return (state[2] == "X") ? "X Won" : "O Won";

        //Diagonals
        else if((state[0] != "-") && (state[0]==state[4]) && (state[4] == state[8]))
            return (state[0] == "X") ? "X Won" : "O Won";
        else if((state[6] != "-") && (state[6] == state[4]) && (state[4] == state[2]))
            return (state[6] == "X") ? "X Won" : "O Won";

        //Checking if draw
        else if((state[0] != "-") && (state[1]!="-") && (state[2] != "-") && (state[3]!="-") &&
                (state[4] != "-") && (state[5] != "-") && (state[6] != "-") && (state[7] != "-") &&
                (state[8] != "-"))
            return "Draw";
        else
            return "Not Gameover";
    }
    else {
        //Rows 1 to 4
        if((state[0] != "-") && (state[0] == state[1]) && (state[1] == state[2]) && (state[2] == state[3]))
            return (state[0] == "X") ? "X Won" : "O Won";
        else if((state[4] != "-") && (state[4] == state[5]) && (state[5]==state[6]) && (state[6] == state[7]))
            return (state[4] == "X") ? "X Won" : "O Won";
        else if((state[8] != "-") && (state[8] == state[9]) && (state[9]==state[10]) && (state[10] == state[11]))
            return (state[8] == "X") ? "X Won" : "O Won";
        else if((state[12] != "-") && (state[12] == state[13]) &&(state[13] == state[14]) && (state[14] == state[15]))
            return (state[12] == "X") ? "X Won" : "O Won";

        //Columns 1 to 4
        else if((state[0] != "-") && (state[0] == state[4]) && (state[4] == state[8]) && (state[8] == state[12]))
            return (state[0] == "X") ? "X Won" : "O Won";
        else if((state[1] != "-") && (state[1] == state[5]) && (state[5] == state[9]) && (state[9] == state[13]))
            return (state[1] == "X") ? "X Won" : "O Won";
        else if((state[2] != "-") && (state[2] == state[6]) && (state[6] == state[10]) && (state[10] == state[14]))
            return (state[2] == "X") ? "X Won" : "O Won";
        else if((state[3] != "-") && (state[3] == state[7]) && (state[7] == state[11]) && (state[11] == state[15]))
            return (state[3] == "X") ? "X Won" : "O Won";

        //Diagonale
        else if((state[0] != "-") && (state[0] == state[5]) && (state[5] == state[10]) && (state[10] == state[15]))
            return (state[0] == "X") ? "X Won" : "O Won";
        else if((state[12] != "-") && (state[12] == state[9]) &&  (state[9] == state[6]) && (state[6] == state[3]))
            return (state[0] == "X") ? "X Won" : "O Won";

        //Pruefe ob Gleichstand
        else if((state[0] != "-") && (state[1] != "-") && (state[2] != "-") && (state[3]!="-") &&
                (state[4] != "-") && (state[5] != "-") && (state[6] != "-") && (state[7] != "-") &&
                (state[8] != "-") && (state[9] != "-") && (state[10] != "-") && (state[11] != "-") &&
                (state[12] != "-") && (state[13] != "-") && (state[14] != "-") && (state[15] != "-")) 
            return "Draw";
        else
            return "Not Gameover";
    }   
}

请原谅isGameOver()方法的直率,它只是检查棋盘的状态(即赢,平局,游戏未结束)

Please excuse the bluntness of the isGameOver() method, it merely checks the state of the board( i.e. Win, Draw, Game not Over)

getAvailableMoves()方法:

The getAvailableMoves() method:

public int[] getAvailableMoves() {
    int count = 0;
    int i = 0;
    for(int j = 0; j < state.length; j++) {
        if(state[j] == "-")
            count++;
    }
    int [] availableSlots = new int[count];
    for(int j = 0; j < state.length; j++){
        if(state[j] == "-")
            availableSlots[i++] = j;        
    }
    return availableSlots;
}

此方法仅返回一个int数组,该数组包含所有可用的下一步动作(关于当前状态),如果没有可用的动作或游戏结束,则返回空数组.

This method merely returns an int array with all available next moves(regarding the current state) or returns empty array if no moves are available or if game is over.

score()方法:

The score() method:

public int score() {
    if(isGameOver() == "X Won")
        return -10;
    else if(isGameOver() == "O Won")
        return +10;
    else 
        return 0;
}

setO(),setX()和revert():

setO(), setX() and revert():

public void setX(int i) {
    state[i] = "X";
    turn = "O";
    lastAdded = i;
}
public void setO(int i) {
    state[i] = "O";
    turn = "X";
    lastAdded = i;
}
public void revert(int i) {
    state[i] = "-";
    if(turn == "X")
        turn = "O";
    else
        turn = "X";
}

对于3x3游戏,我的主要方法如下:

My main method looks like this for a 3x3 game:

public static void main(String args[]) {
    State s = new State(3);
    int [] ScoreAndRecommendedMove = new int[2];
    s.setX(8);
    ScoreAndRecommendedMove = s.newminimax47(8);
    System.out.println("Score: "+ScoreAndRecommendedMove[0]+" Position: "+ ScoreAndRecommendedMove[1]);
}

在此游戏中,X以位置8的移动开始了游戏.在这种情况下,方法将返回

In this game, X has started the game with a move at position 8. The method in this case will return

Score: 0 Position: 4  

意味着O的最有前途的举动是在第4位,在最坏的情况下将得到0分(即平局).

Meaning that O's most promising move is at position 4 and in the worst case will yield a score of 0(i.e a draw).

以下图像旨在让您了解newminimax47()的工作方式.在这种情况下,我们的起始状态(板)被赋予数字1.注意:这些数字表示在创建所考虑的状态时的优先级. 1在2之前创建,2在3之前创建,3在4之前创建,依此类推.

The following image is meant to give an idea of how newminimax47() works. In this case our starting state(board) is given the number 1. Note: The numbers indicate the precedence in creation of the regarded states. 1 was created before 2, 2 was created before 3, 3 was created before 4 and so on.

在这种情况下,最终返回到状态1的得分和排名将是

In this scenario, the score and position eventually returned to state 1 will be

Score: 0 Position: 6

来自状态8.

注意:您看到的代码只是实际State类的片段.这些代码片段本身应该可以使您重新创建newminimax47函数,并且可以正常使用(至少3x3).您可能会发现的任何bug都不是真正的bug,它们根本没有包含在我在此处复制的代码段中,并且没有它们的代码也可以正常工作.例如,setO和setX函数中的lastAdded变量未包含在此处的代码段中,但我只是意识到您不需要它即可使用minimax函数,因此您可以将其注释掉.

Note: The code you see is just snippets of the actual State class. These snippets on their own should allow you to recreate, and play around with, the newminimax47 function without problems(at least for 3x3). Any bugs you may find are not really bugs, they were simply not included in the snippets I copied here and the code should work without them. The lastAdded variable in the setO and setX functions for example is not included in the snippets here but I just realized you don't need it to be able to work with the minimax function, so you can just comment that out.

推荐答案

我玩弄了您的代码,还有很多话要说

I played around with your code and there is quite a bit to say

错误

首先有一个错误.我认为您的代码实际上不适用于3x3电路板.问题在于您revert将位置添加到板上的位置.您只需在newminimax47方法的末尾执行一次此操作,即使您在for循环内将添加的内容移至电路板上也是如此.这意味着调用该方法不仅会计算某些内容,还会更改板的状态,其余代码希望它不会.

first of all there is a bug. I don't think your code actually works for a 3x3 board. The problem is the location you revert the move you add to the board. You do this at the end of the newminimax47 method exactly once, even though in the method you add moves to the board inside a for loop. This means that calling the method does not only compute something but also changes the board state, and the rest of the code expects it not to.

因此,请立即删除revert所在的位置并尽快还原:

So remove the revert where it is now and in revert as soon as you can:

setX(x);                                                                                                                                                                                                                                             
currentScore = newminimax47(x)[0];                           
revert(x);       

这也意味着您不需要lastAdded变量.

this also implies you don't need the lastAdded variable.

播放

如果您实际使用自己的算法,则更容易看到正在发生的事情.在State类中添加方法

It's a lot easier to see what is happening if you actually play against your own algorithm. Add a method to your State class

public void dump() {                                                        
    for (int y = 0; y < n; y++) {                                           
        for (int x = 0; x < n; x++) {                                       
            System.out.print(state[y * n + x]);                             
        }                                                                   
        System.out.println();                                               
    }                                                                       
}

在您体内主要是

public void play() {                                                        
    State s=new State(3);                                                   
    Scanner in = new Scanner (System.in);                                   
    while (s.isGameOver().equals("Not Gameover")) {                         
        int[] options = s.getAvailableMoves();                              
        s.dump();                                                           
        System.out.println ("Your options are " + Arrays.toString(options));
        int move = in.nextInt();                                            
        s.setX(move);                                                       
        int [] ScoreAndRecommendedMove=new int[2];                          
        ScoreAndRecommendedMove=s.newminimax47(0);                           
        System.out.println("Score: "+ScoreAndRecommendedMove[0]+" Position: "+ ScoreAndRecommendedMove[1]);
        s.setO(ScoreAndRecommendedMove[1]);                                 
    }                                                                       
    s.dump();                                                               
}

,您实际上可以与之对抗.在3x3板上,这对我来说效果很好.不幸的是,我估计计算4x4的第一步需要花费我的计算机大约48小时.

and you can actually play against it. On a 3x3 board this works just fine for me. Unfortunately I estimated computing the first move on a 4x4 takes my computer approximately 48 hours.

数据类型

您选择的数据类型通常有点...奇怪.如果您想记住一个字符,请使用char而不是String.如果要返回是/否决定,请尝试使用boolean.程序的某些部分也可以用更少的代码来代替.但这不是您的问题,所以继续...

Your choice of data types is often a bit... strange. If you want to remember a single character, use char instead of String. If you want to return a yes/no decision, try use boolean. There are also some parts of the program that could be replaces by less code doing the same. But that wasn't your question, so on to...

算法

好,那么解决这个问题的minimax有什么问题? 假设前四个动作是X5,O8,X6 O7.另一种可能性是使用X5,O7,X6,O8开始游戏.另一个是X6,O7,X5,O8.最后是X6,O8,X5,O7.

Ok, so what's wrong with minimax to solve this problem? Suppose the first four moves are X5, O8, X6 O7. Another possibility is to start the game with X5, O7, X6, O8. Yet another one is X6, O7, X5, O8. And finally there's X6, O8, X5, O7.

游戏的前四个动作的所有四种可能性导致完全相同的游戏状态.但是minimax不会认识到它们是相同的(基本上没有并行分支的内存),因此它将计算所有这四个分支.而且,如果您进行更深入的搜索,则计算每个板状态的次数将迅速增加.

All four of those possibilities for the first four moves of the game lead to the exact same gamestate. But minimax will not recognise they're the same (basically there is no memory of parallel branches) so it will compute them all four. And the number of times each board state is computed will increase rapidly if you search deeper.

可能的游戏数量大大超过可能的棋盘状态数量.要估算游戏的数量:首先有16种可能的举动,然后有15种,然后有14种,13种……等等.粗略估算为16 !,尽管minimax不必全部计算,因为其中许多将在第16步之前完成.

The number of possible games vastly outnumbers the number of possible board states. To estimate the number of games: at first there are 16 possible moves, then 15, then 14, 13, ... and so on. A rough estimation is 16!, although minimax won't have to compute all of them, because many of them will have finished before the 16th move.

对游戏状态数量的估计是:棋盘上的每个方块都可以是空的,也可以是X或O.所以这是3 ^ 16个棋盘.实际上并不是所有的板都是有效的板,因为板上的X数量最多可以比O的数量多一个,但仍然接近3 ^ 16.

An estimation for the number of game states is: every square on the board can be either empty, or an X, or an O. So that's 3^16 boards. Not all of them are actually valid boards, because the number of Xs on the board can be at most one more then the number of Os, but still it's close to 3^16.

16!可能的游戏比3 ^ 16个可能的棋盘状态多大约一百万倍.这意味着我们大约每个董事会要计算500万次,而不是一次.

16! possible games is about half a million times more then 3^16 possible board states. Which means we're approximately computing every board half a milion times in stead of just once.

解决方案是开始记住您计算出的每个游戏状态.每次调用递归函数时,首先检查您是否已经知道答案,如果知道,则返回旧答案.这是一种称为记忆化的技术.

The solution is to start remembering every gamestate you compute. Every time the recursive function is called first check if you already know the answer, and if so just return the old answer. It's a technique called memoization.

记忆力

我将描述如何在使用已经选择的数据结构时添加备忘录(即使我不同意).要进行记录,您需要一个可以快速添加和快速查找的集合.列表(例如ArrayList)对我们没有任何帮助.添加值的速度很快,但是在长列表中进行查找的速度非常慢.有一些选项,但最容易使用的是HashMap.为了使用HashMap,您需要创建一些代表您的状态的东西,并且可以将其用作键.最简单的方法是制作一个String,其中带有代表您的电路板的所有X/O/-符号.

I'll describe how to add memoization while using the data structures you already choose (even though I don't agree with them). To do memoization you need a collection on which you can do quick adds and quick lookups. A list (for example ArrayList) won't do us any good. It's fast to add values but to do a lookup is very slow in long lists. There are some options but the easiest one to use is HashMap. In order to use HashMap you need to create something that represents your state and you can use as a key. The most straight-forward is to just make a String with all the X/O/- symbols in it that represent your board.

所以添加

Map<String,int[]> oldAnswers = new HashMap<String,int[]>();                                                                  

到您的State对象.

然后在newminimax47方法的开头创建代表State的String并检查我们是否已经知道答案:

Then at the start of your newminimax47 method create the String that represents the State and check if we already know the answer:

    String stateString = "";                                                
    for (String field : state) stateString += field;                        
    int[] oldAnswer = oldAnswers.get(stateString);                          
    if (oldAnswer != null) return oldAnswer;

最后,当您计算新答案时,newminimax47的结尾不仅应返回它,而且还应将其存储在地图中:

Finally when you compute a new answer the end of newminimax47 you should not only return it, but also store it in the map:

    int[] answer = {bestScore, bestPos};                                    
    oldAnswers.put (stateString, answer);                                   
    return answer;

有了备忘录,我就可以根据您的代码玩4x4游戏.第一步仍然很慢(20秒),但是此后一切都经过了计算并且非常快.如果您想进一步加快速度,可以查看 alpha beta修剪.但是这种改进不会像回忆那样多.另一种选择是使用更有效的数据类型.它不会降低算法的理论顺序,但仍然可以轻松地使其速度提高5倍.

With memoization in place I was able to play a 4x4 game against your code. The first move is still slow (20 seconds) but after that everything is computed and it's very fast. If you want to speed it up further, you could look into alpha beta pruning. But the improvement won't be near as much as memoization. Another option is using more efficient data types. It won't reduce the theoretical order of your algorithm, but could still easily make it 5 times faster.

这篇关于Tic-Tac-Toe minimax算法不适用于4x4电路板的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆