避免堆栈溢出(使用 F# 无限序列序列) [英] Avoiding stack overflow (with F# infinite sequences of sequences)

查看:25
本文介绍了避免堆栈溢出(使用 F# 无限序列序列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我为 f# 中的 morris seq 编写的学习代码",它遭受堆栈溢出,我不知道如何避免.morris"返回无限序列的see and say"序列(即,{{1}, {1,1}, {2,1}, {1,2,1,1}, {1,1,1,2,2,1}, {3,1,2,2,1,1},...}).

I have this "learning code" I wrote for the morris seq in f# that suffers from stack overflow that I don't know how to avoid. "morris" returns an infinite sequence of "see and say" sequences (i.e., {{1}, {1,1}, {2,1}, {1,2,1,1}, {1,1,1,2,2,1}, {3,1,2,2,1,1},...}).

    let printList l =
        Seq.iter (fun n -> printf "%i" n) l
        printfn ""

    let rec morris s = 
        let next str = seq {
            let cnt = ref 1  // Stack overflow is below when enumerating
            for cur in [|0|] |> Seq.append str |> Seq.windowed 2 do
                if cur.[0] <> cur.[1] then
                    yield!( [!cnt ; cur.[0]] )
                    cnt := 0
                incr cnt
        }
        seq {
        yield s
        yield! morris (next s) // tail recursion, no stack overflow
        }

    // "main"
    // Print the nth iteration
    let _ =  [1] |> morris |> Seq.nth 3125 |> printList 

您可以使用 Seq.nth 选择第 n 次迭代,但您只能在遇到堆栈溢出之前进行到此为止.我拥有的一个递归是尾递归,它本质上构建了一组链接的枚举器.这不是问题所在.这是在第 4000 个序列上调用枚举"的时候.请注意,在 F# 1.9.6.16 中,之前的版本超过 14000).这是因为链接序列的解析方式.序列是惰性的,因此递归"是惰性的.也就是说,seq n 调用 seq n-1,后者调用 seq n-2 等等来获取第一项(第一个 # 是最坏的情况).

You can pick off the nth iteration using Seq.nth but you can only get so far before you hit a stack overflow. The one bit of recursion I have is tail recursion and it in essence builds a linked set of enumerators. That's not where the problem is. It's when "enum" is called on the say the 4000th sequence. Note that's with F# 1.9.6.16, the previous version topped out above 14000). It's because the way the linked sequences are resolved. The sequences are lazy and so the "recursion" is lazy. That is, seq n calls seq n-1 which calls seq n-2 and so forth to get the first item (the very first # is the worst case).

我明白[|0|] |>Seq.append str |>Seq.windowed 2 使我的问题变得更糟,如果我消除它,我可以生成的 # 的三倍.实际上,代码运行良好.morris 的第 3125 次迭代的长度将超过 10^359 个字符.

I understand that [|0|] |> Seq.append str |> Seq.windowed 2, is making my problem worse and I could triple the # I could generate if I eliminated that. Practically speaking the code works well enough. The 3125th iteration of morris would be over 10^359 characters in length.

我真正想解决的问题是如何保留惰性 eval 并根据我可以选择的迭代的堆栈大小没有限制.我正在寻找合适的 F# 习惯用法来根据内存大小进行限制.

The problem I'm really trying to solve is how to retain the lazy eval and have a no limit based on stack size for the iteration I can pick off. I'm looking for the proper F# idiom to make the limit based on memory size.

10 年 10 月更新

在学好一点 F#,一点点 Haskell 之后,思考 &研究这个问题一年多了,我终于可以回答我自己的问题了.但与困难问题一样,问题始于错误的问题.问题不在于序列的序列——这真的是因为递归定义的序列.我的函数式编程技能现在好了一点,所以更容易看到下面的版本发生了什么,它仍然会出现 stackoverflow

After learning F# a bit better, a tiny bit of Haskell, thinking & investigating this problem for over year, I finally can answer my own question. But as always with difficult problems, the problem starts with it being the wrong question. The problem isn't sequences of sequences - it's really because of a recursively defined sequence. My functional programming skills are a little better now and so it's easier to see what's going on with the version below, which still gets a stackoverflow

let next str = 
    Seq.append str [0]
    |> Seq.pairwise
    |> Seq.scan (fun (n,_) (c,v) ->
            if (c = v) then (n+1,Seq.empty)
            else (1,Seq.ofList [n;c]) ) (1,Seq.empty)
    |> Seq.collect snd

let morris = Seq.unfold(fun sq -> Some(sq,next sq))

这基本上创建了一个非常长的 Seq 处理函数调用链来生成序列.F#自带的Seq模块是不使用栈就不能跟链的.它对追加和递归定义的序列进行了优化,但该优化仅在递归实现追加时才有效.

That basicially creates a really long chain of Seq processing function calls to generate the sequnces. The Seq module that comes with F# is what can't follow the chain without using the stack. There's an optimization it uses for append and recursively defined sequences, but that optimization only works if the recursion is implementing an append.

这样就行了

let rec ints n = seq { yield n; yield! ints (n+1) }
printf "%A" (ints 0 |> Seq.nth 100000);;

这个会得到一个stackoverflow.

And this one will get a stackoverflow.

let rec ints n = seq { yield n; yield! (ints (n+1)|> Seq.map id) }
printf "%A" (ints 0 |> Seq.nth 100000);;

为了证明 F# 库是问题所在,我编写了自己的 Seq 模块,该模块使用延续实现了附加、成对、扫描和收集,现在我可以毫无问题地开始生成和打印 50,000 个 seq(它永远不会完成)因为它的长度超过 10^5697 位).

To prove the F# libary was the issue, I wrote my own Seq module that implemented append, pairwise, scan and collect using continutions and now I can begin generating and printing out the 50,000 seq without a problem (it'll never finish since it's over 10^5697 digits long).

一些附加说明:

  • 延续是我正在寻找的习惯用法,但在这种情况下,它们必须进入 F# 库,而不是我的代码.我从 Tomas Petricek 的 Real-World Functional Programming 一书中了解了 F# 中的延续.
  • 我接受的懒惰列表答案包含另一个习语;懒惰的评价.在我重写的库中,我还必须利用惰性类型来避免堆栈溢出.
  • 懒惰的列表版本有点靠运气(可能是设计使然,但这超出了我目前的判断能力) - 它在构建和迭代时使用的活动模式匹配导致列表在所需的递归之前计算值很深,所以它很懒惰,但又不那么懒惰,它需要延续以避免 stackoverflow.例如,当第二个序列需要第一个序列中的一个数字时,它已经被计算出来了.换句话说,LL 版本对于序列生成并不是严格的 JIT 惰性,只是列表管理.
  • Continuations were the idiom I was looking for, but in this case, they had to go into the F# library, not my code. I learned about continuations in F# from Tomas Petricek's Real-World Functional Programming book.
  • The lazy list answer that I accepted held the other idiom; lazy evaluation. In my rewritten library, I also had to leverage the lazy type to avoid stackoverflow.
  • The lazy list version sorta of works by luck (maybe by design but that's beyond my current ability to determine) - the active-pattern matching it uses while it's constructing and iterating causes the lists to calculate values before the required recursion gets too deep, so it's lazy, but not so lazy it needs continuations to avoid stackoverflow. For example, by the time the 2nd sequence needs a digit from the 1st sequence, it's already been calculated. In other words, the LL version is not strictly JIT lazy for sequence generation, only list management.

推荐答案

你一定要看看

http://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manual/FSharp.PowerPack/Microsoft.FSharp.Collections.LazyList.html

但我稍后会尝试发布更全面的答案.

but I will try to post a more comprehensive answer later.

更新

好的,解决方案如下.它将 Morris 序列表示为 int 的 LazyLists 的 LazyList,因为我认为您希望它在双向"上都是惰性的.

Ok, a solution is below. It represents the Morris sequence as a LazyList of LazyLists of int, since I presume you want it to be lazy in 'both directions'.

F# LazyList(在 FSharp.PowerPack.dll 中)具有三个有用的属性:

The F# LazyList (in the FSharp.PowerPack.dll) has three useful properties:

  • 它是惰性的(第 n 个元素的评估在第一次被要求之前不会发生)
  • 它不会重新计算(在同一对象实例上重新计算第 n 个元素不会重新计算它 - 它在第一次计算后缓存每个元素)
  • 您可以忘记"前缀(当您拖尾"到列表中时,不再引用的前缀可用于垃圾收集)

第一个属性与 seq (IEnumerable) 相同,但其他两个属性是 LazyList 独有的,对于计算问题非常有用,例如本问题中提出的问题.

The first property is common with seq (IEnumerable), but the other two are unique to LazyList and very useful for computational problems such as the one posed in this question.

废话少说,上代码:

// print a lazy list up to some max depth
let rec PrintList n ll =
    match n with
    | 0 -> printfn ""
    | _ -> match ll with
           | LazyList.Nil -> printfn ""
           | LazyList.Cons(x,xs) ->
               printf "%d" x
               PrintList (n-1) xs

// NextMorris : LazyList<int> -> LazyList<int>
let rec NextMorris (LazyList.Cons(cur,rest)) = 
    let count = ref 1
    let ll = ref rest
    while LazyList.nonempty !ll && (LazyList.hd !ll) = cur do
        ll := LazyList.tl !ll
        incr count
    LazyList.cons !count
        (LazyList.consf cur (fun() ->
            if LazyList.nonempty !ll then
                NextMorris !ll
            else
                LazyList.empty()))

// Morris : LazyList<int> -> LazyList<LazyList<int>>
let Morris s =
    let rec MakeMorris ll =
        LazyList.consf ll (fun () ->
            let next = NextMorris ll
            MakeMorris next
        )
    MakeMorris s

// "main"
// Print the nth iteration, up to a certain depth
[1] |> LazyList.of_list |> Morris |> Seq.nth 3125 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 3126 |> PrintList 10
[1] |> LazyList.of_list |> Morris |> Seq.nth 100000 |> PrintList 35
[1] |> LazyList.of_list |> Morris |> Seq.nth 100001 |> PrintList 35

更新2

如果你只是想数数,那也没关系:

If you just want to count, that's fine too:

let LLLength ll =
    let rec Loop ll acc =
        match ll with
        | LazyList.Cons(_,rest) -> Loop rest (acc+1N)
        | _ -> acc
    Loop ll 0N

let Main() =
    // don't do line below, it leaks
    //let hundredth = [1] |> LazyList.of_list |> Morris |> Seq.nth 100
    // if we only want to count length, make sure we throw away the only
    // copy as we traverse it to count
    [1] |> LazyList.of_list |> Morris |> Seq.nth 100
        |> LLLength |> printfn "%A" 
Main()    

内存使用量保持不变(我的机器上低于 16M)...还没有完成运行,但我计算了第 55 个长度,即使在我的慢速机器上也是如此,所以我认为这应该可以正常工作.另请注意,我使用了 'bignum's 作为长度,因为我认为这会溢出一个 'int'.

The memory usage stays flat (under 16M on my box)... hasn't finished running yet, but I computed the 55th length fast, even on my slow box, so I think this should work just fine. Note also that I used 'bignum's for the length, since I think this will overflow an 'int'.

这篇关于避免堆栈溢出(使用 F# 无限序列序列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆