Tic Tac Toe和Minimax-在微控制器上创建不完美的AI [英] Tic Tac Toe and Minimax - Creating an imperfect AI on a microcontroller

查看:83
本文介绍了Tic Tac Toe和Minimax-在微控制器上创建不完美的AI的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在微控制器上创建了一个Tic-Tac-Toe游戏,其中包括一个完美的AI(完美的意思是它不会丢失).我没有为此使用minimax算法,只是一个带有所有可能和最佳移动的小状态机. 我现在的问题是我想实施不同的困难(简单,中等和困难).到目前为止,人工智能将是困难的. 因此,我考虑了最佳方法,最终想使用minimax算法,但是它会计算所有游戏位置的所有得分,因此有时我也可以选择次佳得分最好的.由于我无法始终在微控制器本身上进行所有这些计算,因此我想创建一个可以在计算机上运行的小程序,该程序可以为我提供所有可能的板状态阵列(相对于对称性,这样可以最大程度地减少存储量)使用)及其相应的分数. 为此,我首先尝试实现关于depth的minimax算法本身,以便正确计算每个状态的scores.然后应该把我所有的最佳动作(现在)还给我.但是,它似乎不能很好地工作.我尝试用一​​些printf行对其进行调试.这是到目前为止minimax函数以及我的主要函数的代码:

I have created a Tic-Tac-Toe game on a microcontroller, including a perfect AI (perfect meaning that it doesn't lose). I did not use a minimax algorithm for that, just a little state machine with all possible and optimal moves. My problem now is that I wanted to implement different difficulties (Easy, Medium and Hard). The AI so far would be the hard one. So I've thought about how to do this the best way and ended up wanting to use the minimax algorithm but in a way that it calculates all the scores for all game positions so that I can also sometimes pick the second best score instead of the best. Since I can't always do all of these calculations on the microcontroller itself, I wanted to create a little program that I can run on my computer which gives me arrays of all possible board states (with respect to symmetry, ect to minimize the storage used) and their according scores. For this I firstly tried to implement the minimax algorithm itself, regarding the depth in order to properly calculate the scores of each state. It was then supposed to give me back all of the optimal moves (for now) in an array. However, it does not seem to work that well. I have tried to debug it with some printf lines. Here is the code so far of both the minimax function as well as my main function:

    static int minimax(int *board, int depth)
{
    int score;
    int move = -1;
    int scores[9];
    int nextDepth;

    printf("\n----- Called Minimax, Depth: %i -----\n\n", depth);

    if(depth%2 ==1){
        player = -1;
    } else {
        player = 1;
    }

    printf("Player: %i\n---\n", player);

    if(isWin(board) != 0){
        score = (10-depth)*winningPlayer;

        printf("Player %i won on depth %i\n", winningPlayer, depth);
        printf("Resulting score: (10-%i)*%i = %i\nScore returned to depth %i\n---\n", depth, winningPlayer, score, depth-1);

        return score;
    }

    score = -20;
    nextDepth = depth+1;

    printf("Next depth is %i\n---\n", nextDepth);

    int i;
    for(i=0; i<9; i++){
        if(board[i] == 0) {

            if(nextDepth%2 ==0) {
                player = -1;
            } else {
                player = 1;
            }

            printf("Found vacant space at position %i\n", i);
            printf("Set value of position %i to %i\n---\n", i, player);

            board[i] = player;
            int thisScore = minimax(board, nextDepth);

            printf("Value of the move at position %i on next depth %i is %i\n---\n", i, nextDepth, thisScore);

            scores[i] = thisScore;
            if(thisScore > score){

                printf("New score value is greater than the old one: %i < %i\n---\n", thisScore, score);

                score = thisScore;
                move = i;
                g_moves[nextDepth-1] = move;

                printf("Score was set to %i\n", thisScore);
                printf("Remembered move %i\n---\n", move);

            }
            board[i] = 0;

            printf("Value of position %i was reset to 0 on next depth %i\n---\n", i, nextDepth);

        }
    }

    if(move == -1) {

        printf("Game ended in a draw.\n Returned score: 0\n---\n");

        return 0;
    }

    printf("Move at position %i was selected on next depth %i\n", move, nextDepth);
    printf("Returning score of %i to depth %i\n---\n", score, depth);


    return score;
}

main是:

int main(int argc, char **argv)
{   
    memcpy(board, initBoard, sizeof(board));
    int score = 0;
    int depth = getDepth(board);
    score = minimax(board, depth);
    printf("\n--- finished ---\n\n");


    printf("Moves with the highest score: ");
    int i;
    for(i=0; i<9; i++){
        printf("%i | ", g_moves[i]);
    }
    printf("\n");

    printf("The score is %i\n", score);

    printf("The best next board is: \n|----|----|----|\n");

    for(i=0; i<3; i++){
        printf("| %-2i ", board[i]);
    }
    printf("|\n|----|----|----|\n");
    for(i=3; i<6; i++){
        printf("| %-2i ", board[i]);
    }
    printf("|\n|----|----|----|\n");
    for(i=6; i<9; i++){
        printf("| %-2i ", board[i]);
    }
    printf("|\n|----|----|----|\n");

    return 0;
}

此外,我还有一些变量:

Furthermore, i have some variables:

//1  = Beginning Player
//-1 = second Player
static int player;
static int winningPlayer = 0;
static int g_moves[9];

/* 0 1 2
 * 3 4 5
 * 6 7 8
 */
int initBoard[9] = {
    0, 0, 0,
    0, 0, 0,
    0, 0, 0,
};

int board[9];

还有我的获奖功能:

int isWin(int *board)
{
    unsigned winningBoards[8][3] = {
        {board[0], board[1], board[2],},
        {board[3], board[4], board[5],},
        {board[6], board[7], board[8],},
        {board[0], board[3], board[6],},
        {board[1], board[4], board[7],},
        {board[2], board[5], board[8],},
        {board[0], board[4], board[8],},
        {board[2], board[4], board[6],},
    };

    int i;
    for(i=0; i<8; i++){
        if( (winningBoards[i][0] != 0) &&
            (winningBoards[i][0] == winningBoards[i][1]) &&
            (winningBoards[i][0] == winningBoards[i][2])){
                winningPlayer = winningBoards[i][0];
                return winningPlayer;
            }
    }
    return 0;
}

由于某种原因,最后一次最小极大值从depth 7逐步返回到depth 1时,它将覆盖全为0的数组g_moves,从而在打印输出中产生以下几行(仅最后70行):

For some reason, the last time the minimax returns from depth 7 step-by-step to depth 1, it overwrites my array g_moves with all 0s thus resulting in the following lines in my printed output (only the last 70 lines):

...
----- Called Minimax, Depth: 7 -----

Player: -1                                                                                                                                                                                                                                                                     
---                                                                                                                                                                                                                                                                            
Player 1 won on depth 7                                                                                                                                                                                                                                                        
Resulting score: (10-7)*1 = 3                                                                                                                                                                                                                                                  
Score returned to depth 6                                                                                                                                                                                                                                                      
---                                                                                                                                                                                                                                                                            
Value of the move at position 2 on next depth 7 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 2 was reset to 0 on next depth 7                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 7                                                                                                                                                                                                                                
Returning score of 3 to depth 6                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 3 on next depth 6 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 3 was reset to 0 on next depth 6                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 6                                                                                                                                                                                                                                
Returning score of 3 to depth 5                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 4 on next depth 5 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 4 was reset to 0 on next depth 5                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 5                                                                                                                                                                                                                                
Returning score of 3 to depth 4                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 5 on next depth 4 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 5 was reset to 0 on next depth 4                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 4                                                                                                                                                                                                                                
Returning score of 3 to depth 3                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 6 on next depth 3 is 3                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 6 was reset to 0 on next depth 3                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 3                                                                                                                                                                                                                                
Returning score of 5 to depth 2                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 7 on next depth 2 is 5                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 7 was reset to 0 on next depth 2                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 2                                                                                                                                                                                                                                
Returning score of 5 to depth 1                                                                                                                                                                                                                                                
---                                                                                                                                                                                                                                                                            
Value of the move at position 8 on next depth 1 is 5                                                                                                                                                                                                                           
---                                                                                                                                                                                                                                                                            
Value of position 8 was reset to 0 on next depth 1                                                                                                                                                                                                                             
---                                                                                                                                                                                                                                                                            
Move at position 0 was selected on next depth 1                                                                                                                                                                                                                                
Returning score of 5 to depth 0
---

--- finished ---

Moves with the highest score: 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
The score is 5
The best next board is: 
|----|----|----|
| 0  | 0  | 0  |
|----|----|----|
| 0  | 0  | 0  |
|----|----|----|
| 0  | 0  | 0  |
|----|----|----|

如果您需要任何其他信息来帮助我,我会很高兴将它们提供给您.

If you need any other information in order to help me, I'll be glad to give them to you if I have them myself.

谢谢.

因此,我重写了minimax函数,现在它使用控制台(在相应文件夹中的cmd:./NAME_OF_FILE> DEST_NAME.txt)在.txt文件上打印所有可能的电路板状态.代码如下:

So I rewrote my minimax function so it now prints all the possible board states on a .txt file using the console (cmd: ./NAME_OF_FILE > DEST_NAME.txt in the according folder). The code is the following:

int minimax(int *board, int depth)
{
    g_node++;
    int player;
    int move = -1;
    int score = -20;
    int thisScore = -20;
    int i;

    if(isWin(board) != 0){
        printf("\nNode: %i\n", g_node);
        printf("Board state:");
        for(i=0;i<9;i++) {
            if((i%3) == 0)
                printf("\n");
            printf("%2i ", board[i]);
        }
        printf("\n");
        printf("has a score of %i\n", (10-depth)*winningPlayer);
        return (10-depth)*winningPlayer;
    }


    if(depth%2 ==1){
            player = -1;
        } else {
            player = 1;
        }
    for(i=0; i<9; i++){
        if(board[i] == 0){
            board[i] = player;
            thisScore = minimax(board, depth+1);
            if(thisScore > score){
                score = thisScore;
                move = i;
            }
            board[i] = 0;
        }
    }

    printf("\nNode: %i\n", g_node);
    printf("Board state:");
    for(i=0;i<9;i++) {
        if((i%3) == 0)
            printf("\n");
        printf("%2i ", board[i]);
    }
    printf("\n");

    if(move == -1){
        printf("has a score of 0\n");
        return 0;

    }
    printf("has a score of %i\n", score);
    return score;
}

下一步是在相应的位置以每次移动的最大score来打印电路板.

My next step is to print out the board with the max score of each move at the according position.

Example:
10  8 10
 8  7  8
10  8 10
for the empty board at the beginning.

现在,我添加了另一个名为printScoredBoards的函数,该函数基本上应该可以完成我在上一次编辑中所描述的功能,但是它存在问题. 因为如果您的对手玩得很愚蠢,在第5步之后总是有可能获胜,并且由于minimax尝试了所有可能性,包括那些可能性,因此,使用以下代码,我得到了空局的全部15分的得分板.

EDIT 2: I now added another function called printScoredBoards which basically is supposed to do what I discribed above in my last edit, however there is a problem to it. Since it is always possible to win after the 5th move if your opponent plays dumb enough and since the minimax tries out all possibilities, including those, with the following code I get a scored board of all 15s for the empty board.

void printScoredBoards(int *board, int depth)
{
    int player;
    int scoredBoard[9] = {0,0,0,0,0,0,0,0,0,};
    int i;
    if(isWin(board) == 0){
        if(depth%2 ==1){
            player = -1;
        } else {
            player = 1;
        }

        for(i=0; i<9; i++){
            if(board[i] == 0){
                board[i] = player;
                scoredBoard[i] = minimax(board, depth+1)+10;
                printScoredBoards(board, depth+1);
                board[i] = 0;
            }
        }
        printf("Scored board:");
        dumpTable(scoredBoard);
        printf("\n");
    }
}

这是尽管角点应该更有价值,而中心是最不有价值的.有人碰巧知道解决方法吗?

This is although the corners should be worth more and the center is the least valuable. Does anyone happen to know a work-around for this?

编辑:我建立了一个新的minimax算法,并将其发布在另一篇文章中.您可以在右侧的链接"部分中找到该帖子,也可以在此处. 现在我要做的就是在微控制器代码中实现它,并创建一个函数,该函数可以从所有计分的动作中选择最佳/次佳的动作,并在有多个相同记分的动作时将其随机化. 因此,该帖子可以关闭.

I've set up a new minimax algorithm and posted it in another post. You can find the post in the "Linked" section on the right or here. Now all I'm doing is implementing it in the microcontroller code and creating a function which can pick the best/second best move out of all scored moves as well as randomize it if there are multiple moves with the same score. This post can thereby be closed.

推荐答案

我认为,尝试通过全面深度分析获得第二好的方法是不可行的.不要通过限制最小最大值的深度来探索整棵树(前进2个可以获胜,但AI仍然很强),或者只是对真正不完美的AI进行随机移动.

I think trying to get the second best move with a full depth analysis is overdoing it. Don't explore the whole tree by limiting the depth of your minmax (2 move ahead permits to win but the AI is still strong), or just a use a random move for a really imperfect AI.

这篇关于Tic Tac Toe和Minimax-在微控制器上创建不完美的AI的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆