尝试使用延续传递样式以通过minimax算法避免堆栈溢出 [英] Attempting to use continuation passing style to avoid stack overflow with minimax algorithm

查看:99
本文介绍了尝试使用延续传递样式以通过minimax算法避免堆栈溢出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标摘要:弄清楚如何使用连续传递样式来避免使用算法时的堆栈溢出,我认为这种算法无法进行尾递归.或者,找到一种使函数尾部递归的方法.

详细信息: 我是F#(通常是函数式编程)的新手,我正尝试通过alpha-beta修剪实现minimax算法.这是一种算法,用于确定两人游戏的最佳移动方式.可以在以下位置找到该算法的伪代码: https://en.wikipedia. org/wiki/Alpha%E2%80%93beta_pruning

Details: I am new to F# (and functional programming in general) and I am attempting to implement the minimax algorithm with alpha-beta pruning. This is an algorithm used to determine the best possible move for a two-player game. The pseudocode for the algorithm can be found here: https://en.wikipedia.org/wiki/Alpha%E2%80%93beta_pruning

我发现这是有助于理解算法流程的资源:

This is a resource I've found useful for understanding the flow of the algorithm: http://inst.eecs.berkeley.edu/~cs61b/fa14/ta-materials/apps/ab_tree_practice/

我的实现方式与众不同之处在于,我正在开发的游戏的玩家并不一定总是轮换.因此,我从函数中删除了其中一个参数.我的实现如下:

One difference in my implementation is that the players for the game I'm working on do not always alternate. For this reason, I've removed one of the parameters from the function. My implementation is below:

let rec minimax node depth alpha beta =
    if depth = 0 || nodeIsTerminal node then
        heuristicValue node
    else
        match node.PlayerToMakeNextChoice with
        | PlayerOneMakesNextChoice ->
            takeMax (getChildren node) depth alpha beta    
        | PlayerTwoMakesNextChoice ->
            takeMin (getChildren node) depth alpha beta
and takeMax children depth alpha beta =      
    match children with
    | [] -> alpha
    | firstChild :: remainingChildren ->
        let newAlpha = [alpha; minimax firstChild (depth - 1) alpha beta] |> List.max

        if beta < newAlpha then newAlpha
        else takeMax remainingChildren depth newAlpha beta
and takeMin children depth alpha beta =      
    match children with
    | [] -> beta
    | firstChild :: remainingChildren ->
        let newBeta = [beta; minimax firstChild (depth - 1) alpha beta] |> List.min

        if newBeta < alpha then newBeta
        else takeMax remainingChildren depth alpha newBeta

我遇到的问题是,虽然takeMaxtakeMin是尾递归的,但是这些方法在分配newAlphanewBeta时调用minimax,因此当我以较大的深度调用minimax.我进行了一些研究,发现在无法使函数尾部递归的情况下,使用延续传递样式是使用堆而不是堆栈的一种潜在方法(并且我相信,经过许多小时的尝试,这不能实现).虽然我可以理解一些非常基本的示例,但是我在理解如何将这种概念应用于这种情况时遇到了麻烦.如果有人可以帮助我完成我,我将不胜感激.

The problem I'm having is that while takeMax and takeMin are tail-recursive, these methods call minimax when assigning newAlpha and newBeta so it is still likely to produce a stack overflow when I call minimax with a large depth. I have done some research and found that using continuation-passing style is a potential way to use the heap rather than the stack when the function cannot be made tail-recursive (and I believe after many hours of trying that this cannot). While I can understand some very basic examples, I'm having trouble understanding how I would apply the concept to this situation; I'd be very grateful if someone could help walk me through it.

我对解决方案的最佳理解

let minimax node depth alpha beta =
    let rec recurse node depth alpha beta k =
        if depth = 0 || nodeIsTerminal node then
            k (heuristicValue node)
        else
            match node.PlayerToMakeNextChoice with
            | PlayerOneMakesNextChoice ->
                takeMax (getChildren node) depth alpha beta k
            | PlayerTwoMakesNextChoice ->
                takeMin (getChildren node) depth alpha beta k
    and takeMax children depth alpha beta k =      
        match children with
        | [] -> k alpha
        | firstChild :: remainingChildren ->
            let continuation = fun minimaxResult ->
                let newAlpha = [alpha; minimaxResult] |> List.max

                if beta < newAlpha then k newAlpha
                else takeMax remainingChildren depth newAlpha beta k

            recurse firstChild (depth - 1) alpha beta continuation
    and takeMin children depth alpha beta k =      
        match children with
        | [] -> k beta
        | firstChild :: remainingChildren ->
            let continuation = fun minimaxResult ->
                let newBeta = [beta; minimaxResult] |> List.min

                if newBeta < alpha then k newBeta
                else takeMax remainingChildren depth alpha newBeta k

            recurse firstChild (depth - 1) alpha beta continuation
    recurse node depth alpha beta id

推荐答案

正如您在基本示例"中毫无疑问地看到的那样,总体思路是采用一个额外的参数("continuation",通常表示为k) ,它是一个函数,每次您返回一个值时,都应将该值传递给延续.因此,例如,以这种方式修改minimax,我们将得到:

As you have undoubtedly seen in the "basic examples", the general idea is to take one extra parameter (the "continuation", usually denoted k), which is a function, and every time you would return a value, pass that value to the continuation instead. So, for example, to modify minimax this way, we'd get:

let rec minimax node depth alpha beta k =
    if depth = 0 || nodeIsTerminal node then
        k (heuristicValue node)
    else
        match node.PlayerToMakeNextChoice with
        | PlayerOneMakesNextChoice ->
            k (takeMax (getChildren node) depth alpha beta)
        | PlayerTwoMakesNextChoice ->
            k (takeMin (getChildren node) depth alpha beta)

然后可以这么说,呼叫站点需要由内而外",而不是像这样:

And then the call site needs to be "turned inside out", so to speak, so instead of something like this:

let a = minimax ...
let b = f a
let c = g b
c

我们会这样写:

minimax ... (fun a ->
   let b = f a
   let c = g b
   c
)

看到了吗? a曾经是minimax的返回值,但是现在a是传递给minimax的延续的参数.运行时机制是,一旦minimax完成运行,它将调用此延续,并且其结果值将在此处显示为参数a.

See? a used to be a return value of minimax, but now a is the parameter of the continuation that is passed to minimax. The runtime mechanics is that, once minimax has finished running, it would call this continuation, and its result value would show up there as parameter a.

因此,要将其应用于您的真实代码,我们会得到:

So, to apply this to your real code, we'd get this:

| firstChild :: remainingChildren ->
    minimax firstChild (depth - 1) alpha beta (fun minimaxResult ->
        let newAlpha = [alpha; minimaxResult] |> List.max

        if beta < newAlpha then newAlpha
        else takeMax remainingChildren depth newAlpha beta
    )

好的,这很好,但这只是工作的一半:我们已经在CPS中重写了minimax,但是takeMintakeMax仍然是递归的.不好.

Ok, this is all well and good, but this is only half the job: we have rewritten minimax in CPS, but takeMin and takeMax are still recursive. Not good.

所以我们先做takeMax.相同的想法:添加一个额外的参数k,每次我们返回"值时,将其传递给k:

So let's do takeMax first. Same idea: add an extra parameter k and every time we would "return" a value, pass it to k instead:

and takeMax children depth alpha beta k =      
    match children with
    | [] -> k alpha
    | firstChild :: remainingChildren ->
        minimax firstChild (depth - 1) alpha beta (fun minimaxResult ->
            let newAlpha = [alpha; minimaxResult] |> List.max

            if beta < newAlpha then k newAlpha
            else takeMax remainingChildren depth newAlpha beta k
        )

现在,当然,我必须相应地修改呼叫站点:

And now, of course, I have to modify the call site correspondingly:

let minimax ... k =
    ...
    match node.PlayerToMakeNextChoice with
    | PlayerOneMakesNextChoice ->
        takeMax (getChildren node) depth alpha beta k

等等,刚才发生了什么?我只是说过,每次返回值时,我都应该将其传递给k,但是在这里我没有这样做.相反,我将k本身传递给takeMax.嗯?

Wait, what just happened? I just said that every time I return a value I should pass it to k instead, but here I'm not doing it. Instead, I'm passing k itself to takeMax. Huh?

好吧,而不是传回传给k"的规则只是该方法的第一部分.第二部分是-在每个递归调用中,将k向下传递".这样,原始的顶级k将沿着整个递归调用链传播,并最终由决定停止递归的任何函数调用.

Well, the rule that "instead of returning pass to k" is only the first part of the approach. The second part is - "on every recursive call, pass k down the chain". This way, the original top-level k would travel down the whole recursive chain of calls, and be ultimately called by whatever function decides to stop the recursion.

请记住,尽管CPS有助于解决堆栈溢出问题,但通常不会使您摆脱内存限制.所有这些中间值都不再放在堆栈中,但它们必须放在 somewhere 处.在这种情况下,每次构造lambda fun minimaxResult -> ...时,这都是堆分配.因此,所有中间值都放在堆上.

Keep in mind that, although CPS helps with stack overflow, it does not free you from memory limits in general. All those intermediate values don't go on the stack anymore, but they have to go somewhere. In this case, every time we construct that lambda fun minimaxResult -> ..., that's a heap allocation. So all your intermediate values go on the heap.

但是有一个很好的对称性:如果该算法确实是尾部递归的,那么您就可以在不分配任何中间lambda的情况下将原始的顶级连续性沿调用链向下传递,因此您不需要任何中间的lambda.堆内存.

There is a nice symmetry though: if the algorithm was truly tail-recursive, you'd be able to pass the original top-level continuation down the call chain without allocating any intermediate lambdas, and so you wouldn't require any heap memory.

这篇关于尝试使用延续传递样式以通过minimax算法避免堆栈溢出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆