为什么这个 F# 序列函数不是尾递归的? [英] Why is this F# sequence function not tail recursive?

查看:21
本文介绍了为什么这个 F# 序列函数不是尾递归的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

披露:这出现在 FsCheck 中,这是我维护的 F# 随机测试框架.我有一个解决方案,但我不喜欢它.而且,我不明白这个问题——它只是被规避了.

Disclosure: this came up in FsCheck, an F# random testing framework I maintain. I have a solution, but I do not like it. Moreover, I do not understand the problem - it was merely circumvented.

(monadic,如果我们要使用大词)序列的一个相当标准的实现是:

A fairly standard implementation of (monadic, if we're going to use big words) sequence is:

let sequence l = 
    let k m m' = gen { let! x = m
                       let! xs = m'
                       return (x::xs) }
    List.foldBack k l (gen { return [] })

其中 gen 可以被选择的计算构建器替换.不幸的是,该实现会消耗堆栈空间,因此如果列表足够长,最终堆栈会溢出.问题是:为什么?我知道原则上 foldBack 不是尾递归,但 F# 团队的聪明兔子在 foldBack 实现中规避了这一点.计算构建器的实现有问题吗?

Where gen can be replaced by a computation builder of choice. Unfortunately, that implementation consumes stack space, and so eventually stack overflows if the list is long enough.The question is: why? I know in principle foldBack is not tail recursive, but the clever bunnies of the F# team have circumvented that in the foldBack implementation. Is there a problem in the computation builder implementation?

如果我将实现更改为以下内容,则一切正常:

If I change the implementation to the below, everything is fine:

let sequence l =
    let rec go gs acc size r0 = 
        match gs with
        | [] -> List.rev acc
        | (Gen g)::gs' ->
            let r1,r2 = split r0
            let y = g size r1
            go gs' (y::acc) size r2
    Gen(fun n r -> go l [] n r)

为了完整性,可以找到 Gen 类型和计算构建器 在 FsCheck 源代码中

For completeness, the Gen type and computation builder can be found in the FsCheck source

推荐答案

基于 Tomas 的回答,让我们定义两个模块:

Building on Tomas's answer, let's define two modules:

module Kurt = 
    type Gen<'a> = Gen of (int -> 'a)

    let unit x = Gen (fun _ -> x)

    let bind k (Gen m) =     
        Gen (fun n ->       
            let (Gen m') = k (m n)       
            m' n)

    type GenBuilder() =
        member x.Return(v) = unit v
        member x.Bind(v,f) = bind f v

    let gen = GenBuilder()


module Tomas =
    type Gen<'a> = Gen of (int -> ('a -> unit) -> unit)

    let unit x = Gen (fun _ f -> f x)

    let bind k (Gen m) =     
        Gen (fun n f ->       
            m n (fun r ->         
                let (Gen m') = k r        
                m' n f))

    type GenBuilder() =
        member x.Return v = unit v
        member x.Bind(v,f) = bind f v

    let gen = GenBuilder()

为了简化一些事情,让我们将原始序列函数重写为

To simplify things a bit, let's rewrite your original sequence function as

let rec sequence = function
| [] -> gen { return [] }
| m::ms -> gen {
    let! x = m
    let! xs = sequence ms
    return x::xs }

现在,序列[for i in 1 .. 100000 ->无论 sequence 是根据 Kurt.gen 还是 Tomas.gen 定义的,单元 i] 都会运行到完成.问题不在于 sequence 在使用您的定义时导致堆栈溢出,而是从 sequence 调用返回的函数导致堆栈溢出 it 被调用.

Now, sequence [for i in 1 .. 100000 -> unit i] will run to completion regardless of whether sequence is defined in terms of Kurt.gen or Tomas.gen. The issue is not that sequence causes a stack overflow when using your definitions, it's that the function returned from the call to sequence causes a stack overflow when it is called.

要了解为什么会这样,让我们​​从底层的 monadic 操作的角度扩展 sequence 的定义:

To see why this is so, let's expand the definition of sequence in terms of the underlying monadic operations:

let rec sequence = function
| [] -> unit []
| m::ms ->
    bind (fun x -> bind (fun xs -> unit (x::xs)) (sequence ms)) m

内联 Kurt.unitKurt.bind 值并疯狂地简化,我们得到

Inlining the Kurt.unit and Kurt.bind values and simplifying like crazy, we get

let rec sequence = function
| [] -> Kurt.Gen(fun _ -> [])
| (Kurt.Gen m)::ms ->
    Kurt.Gen(fun n ->
            let (Kurt.Gen ms') = sequence ms
            (m n)::(ms' n))

现在希望清楚为什么调用 let (Kurt.Gen f) = sequence [for i in 1 .. 1000000 ->unit i] in f 0 溢出堆栈:f 需要对序列和结果函数的评估进行非尾递归调用,因此每个递归调用将有一个堆栈帧.

Now it's hopefully clear why calling let (Kurt.Gen f) = sequence [for i in 1 .. 1000000 -> unit i] in f 0 overflows the stack: f requires a non-tail-recursive call to sequence and evaluation of the resulting function, so there will be one stack frame for each recursive call.

Tomas.unitTomas.bind 内联到sequence 的定义中,我们得到以下简化版本:

Inlining Tomas.unit and Tomas.bind into the definition of sequence instead, we get the following simplified version:

let rec sequence = function
| [] -> Tomas.Gen (fun _ f -> f [])
| (Tomas.Gen m)::ms ->
    Tomas.Gen(fun n f ->  
        m n (fun r ->
            let (Tomas.Gen ms') = sequence ms
            ms' n (fun rs ->  f (r::rs))))

关于这个变体的推理很棘手.您可以凭经验验证它不会为某些任意大的输入炸毁堆栈(如 Tomas 在他的回答中所示),并且您可以逐步完成评估以说服自己这一事实.但是,堆栈消耗取决于传入列表中的 Gen 实例,并且对于本身不是尾递归的输入, 可能会破坏堆栈:

Reasoning about this variant is tricky. You can empirically verify that it won't blow the stack for some arbitrarily large inputs (as Tomas shows in his answer), and you can step through the evaluation to convince yourself of this fact. However, the stack consumption depends on the Gen instances in the list that's passed in, and it is possible to blow the stack for inputs that aren't themselves tail recursive:

// ok
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> unit i]
f 0 (fun list -> printfn "%i" list.Length)

// not ok...
let (Tomas.Gen f) = sequence [for i in 1 .. 1000000 -> Gen(fun _ f -> f i; printfn "%i" i)]
f 0 (fun list -> printfn "%i" list.Length)

这篇关于为什么这个 F# 序列函数不是尾递归的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆