井字游戏的遗传算法 [英] A Genetic Algorithm for Tic-Tac-Toe

查看:76
本文介绍了井字游戏的遗传算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我被分配了使用遗传算法编写5x5x5井字游戏播放器的问题.我的方法是从3x3开始,开始工作,然后扩展到5x5,再扩展到5x5x5.

它的工作方式是这样的:

  • 模拟一整堆游戏,并且在每个游戏的每个回合中,在相应的表(X表或作为c ++ stdlib映射实现的O表)中进行查找以进行响应.如果板子不在那,请将板子添加到表中.否则,请做出随机响应.

  • 完成桌子后,我会初始化一堆玩家(每个人都有局面表的副本,并通过随机响应进行初始化),然后让他们彼此对抗.

  • 利用他们的获胜/失利来评估健康状况,我保留了一定比例的最佳水平,然后他们继续前进.漂洗并重复X代,然后应该出现一个最佳播放器.

对于3x3而言,打折的板是其他板的反射/旋转,以及举棋不定或制胜"的板,则我遇到的板总数为53或38,取决于您是第一名还是第二名.极好的!一个小时之内就产生了一个最佳球员.太酷了!

对于5x5使用相同的策略,我知道桌子的大小会增加,但是并没有意识到它会急剧增加.即使打折轮换/反射和强制性移动,我的表仍然有约360万个条目,而且看不到尽头.

好的,这显然行不通,我需要一个新计划.如果我不列举所有董事会,而只是一些董事会,那该怎么办?好吧,这似乎也不起作用,因为如果每个玩家只有一小部分可能会看到的棋盘,那么他们将做出很多随机动作,显然是朝着相反的最优方向前进./p>

解决这个问题的现实方法是什么?我会在使用板卡功能时被卡住吗?目标是硬编码尽可能少的游戏功能.

我一直在做研究,但是我读到的所有内容都会导致最小/最大,其中A-B修剪是唯一可行的选择.我当然可以那样做,但是GA真的很酷,我目前的方法在这里只是超出了现实.

EDIT 问题已基本解决:

使用相似性功能,将空地的汉明距离,可能的获胜条件以及其他一些措施相结合,将赌桌的赔率降至非常容易管理的2500种可能性,而std::map只需几分之一秒即可处理完.

解决方案

我对GA的知识非常有限,但是在建模板配置中,您不是问错了问题吗?您的任务不是要列举所有可能的获胜配置,而是要找到导致获胜配置的一系列举动.也许您应该关注的人群不是一组板子,而是一组移动顺序.

编辑:我并不是想从一个特定的板开始,而是从一个空的板开始.很明显,在3x3板上,以(1,1)开头的序列最适合X.重要的是,最终板的中间没有X,而是X放在中间第一.如果X有一个或多个最佳的第一步,也许X也有最佳的第二,第三或第四步?经过几轮适应性测试和重组后,我们是否会发现X的第二步通常是相同的,还是一小部分值?那第三步呢?

这不是极小值,因为您不是根据董事会以前的状态一次寻找最佳动作,而是同时寻找所有最佳动作,希望能汇聚到获胜策略.

我知道这并不能解决您的问题,但是,如果要制定一种制胜战略的想法,那么您自然会想看一下一系列行动而不是董事会状态.

So I was assigned the problem of writing a 5x5x5 tic-tac-toe player using a genetic algorithm. My approach was to start off with 3x3, get that working, and then extend to 5x5, and then to 5x5x5.

The way it works is this:

  • Simulate a whole bunch of games, and during each turn of each game, lookup in a corresponding table (X table or O table implemented as a c++ stdlib maps) for a response. If the board was not there, add the board to the table. Otherwise, make a random response.

  • After I have complete tables, I initialize a bunch of players (each with a copy of the board table, initialized with random responses), and let them play against each other.

  • Using their wins/losses to evaluate fitness, I keep a certain % of the best, and they move on. Rinse and repeat for X generations, and an optimal player should emerge.

For 3x3, discounting boards that were reflections/rotations of other boards, and boards where the move is either 'take the win' or 'block the win', the total number of boards I would encounter were either 53 or 38, depending on whether you go first or second. Fantastic! An optimal player was generated in under an hour. Very cool!

Using the same strategy for 5x5, I knew the size of the table would increase, but did not realize it would increase so drastically. Even discounting rotations/reflections and mandatory moves, my table is ~3.6 million entries, with no end in sight.

Okay, so that's clearly not going to work, I need a new plan. What if I don't enumerate all the boards, but just some boards. Well, it seems like this won't work either, because if each player has just a fraction of possible boards they might see, then they are going to be making a lot of random moves, clearly steering in the opposite direction of optimality.

What is a realistic way of going about this? Am I going to be stuck using board features? The goal is to hard-code as little game functionality as possible.

I've been doing research, but everything I read leads to min/max with A-B pruning as the only viable option. I can certainly do it that way, but the GA is really cool, my current method is just exceeding reality a bit here.

EDIT Problem has been pretty much solved:

Using a similarity function that combines hamming distance of open spaces, the possible win conditions, and a few other measures has brought the table down to a very manageable 2500 possibilities, which a std::map handles in a fraction of a second.

解决方案

My knowledge of GA is pretty limited, but in modeling board configurations, aren't you asking the wrong question? Your task isn't to enumerate all the possible winning configurations -- what you're trying to do is to find a sequence of moves that leads to a winning configuration. Maybe the population you should be looking at isn't a set of boards, but a set of move sequences.

Edit: I wasn't thinking so much of starting from a particular board as starting from an empty board. It's obvious on a 3x3 board that move sequences starting with (1,1) work out best for X. The important thing isn't that the final board has an X in the middle, it's that the X was placed in the middle first. If there's one or more best first moves for X, maybe there's also a best second, third, or fourth move for X, too? After several rounds of fitness testing and recombining, will we find that X's second move is usually the same, or is one of a small set of values? And what about the third move?

This isn't minimax because you're not looking for the best moves one at a time based on the previous state of the board, you're looking for all the best moves at the same time, hoping to converge on a winning strategy.

I know this doesn't solve your problem, but if the idea is to evolve a winning strategy then it seems natural that you'd want to look at sequences of moves rather than board states.

这篇关于井字游戏的遗传算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆