Haskell 中的这个列表排列实现究竟做了什么? [英] What does this list permutations implementation in Haskell exactly do?

查看:32
本文介绍了Haskell 中的这个列表排列实现究竟做了什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究 Data.List 模块中的代码,但无法完全理解这种排列的实现:

I am studying the code in the Data.List module and can't exactly wrap my head around this implementation of permutations:

permutations            :: [a] -> [[a]]
permutations xs0        =  xs0 : perms xs0 []
  where
    perms []     _  = []
    perms (t:ts) is = foldr interleave (perms ts (t:is)) (permutations is)
      where interleave    xs     r = let (_,zs) = interleave' id xs r in zs
            interleave' _ []     r = (ts, r)
            interleave' f (y:ys) r = let (us,zs) = interleave' (f . (y:)) ys r
                                     in  (y:us, f (t:y:us) : zs)

有人能详细解释一下这些嵌套函数是如何相互连接/工作的吗?

Can somebody explain in detail how these nested functions connect/work with each other?

推荐答案

抱歉回复晚了,写的时间比预期的要长一些.

Sorry about the late answer, it took a bit longer to write down than expected.

因此,首先要在像这样的列表函数中最大化惰性有两个目标:

So, first of all to maximize lazyness in a list function like this there are two goals:

  • 在检查输入列表的下一个元素之前生成尽可能多的答案
  • 答案本身必须是懒惰的,因此也必须如此.

现在考虑 permutation 函数.这里最大的懒惰意味着:

Now consider the permutation function. Here maximal lazyness means:

  • 在只检查输入的 n 个元素后,我们应该确定至少有 n! 个排列
  • 对于这些 n! 排列中的每一个,前 n 个元素应该只依赖于输入的前 n 个元素.莉>
  • We should determine that there are at least n! permutations after inspecting just n elements of input
  • For each of these n! permutations, the first n elements should depend only on the first n elements of the input.

第一个条件可以形式化为

The first condition could be formalized as

length (take (factorial n) $ permutations ([1..n] ++ undefined))) `seq` () == ()

David Benbennick 将第二个条件形式化为

David Benbennick formalized the second condition as

map (take n) (take (factorial n) $ permutations [1..]) == permutations [1..n] 

结合起来,我们有

map (take n) (take (factorial n) $ permutations ([1..n] ++ undefined)) == permutations [1..n] 

让我们从一些简单的案例开始.第一个置换[1..].我们必须有

Let's start with some simple cases. First permutation [1..]. We must have

permutations [1..] = [1,???] : ???

我们必须有两个元素

permutations [1..] = [1,2,???] : [2,1,???] : ???

注意前两个元素的顺序没有选择,我们不能把[2,1,...]放在最前面,因为我们已经决定第一个排列必须从 1 开始.现在应该很清楚 permutations xs 的第一个元素必须等于 xs 本身.

Note that there is no choice about the order of the first two elements, we can't put [2,1,...] first, since we already decided that the first permutation must start with 1. It should be clear by now that the first element of permutations xs must be equal to xs itself.

现在开始实施.

首先,有两种不同的方法可以对列表进行所有排列:

First of all, there are two different ways to make all permutations of a list:

  1. 选择风格:不断从列表中选择元素,直到没有元素为止

  1. Selection style: keep picking elements from the list until there are none left

permutations []  = [[]]
permutations xxs = [(y:ys) | (y,xs) <- picks xxs, ys <- permutations xs]
  where
    picks (x:xs) = (x,xs) : [(y,x:ys) | (y,ys) <- picks xs]

  • 插入样式:在所有可能的位置插入或交错每个元素

  • Insertion style: insert or interleave each element in all possible places

    permutations []     = [[]]
    permutations (x:xs) = [y | p <- permutations xs, y <- interleave p]
      where
        interleave []     = [[x]]
        interleave (y:ys) = (x:y:ys) : map (y:) (interleave ys)
    

  • 请注意,这些都不是最大的懒惰.第一种情况,这个函数做的第一件事是从整个列表中挑选第一个元素,这一点都不懒惰.在第二种情况下,我们需要尾部的排列才能进行任何排列.

    Note that neither of these is maximally lazy. The first case, the first thing this function does is pick the first element from the entire list, which is not lazy at all. In the second case we need the permutations of the tail before we can make any permutation.

    首先,请注意 interleave 可以变得更懒惰.interleave yss 列表的第一个元素是 [x] 如果 yss=[](x:y:ys) 如果 yss=y:ys.但是这两个和x:yss是一样的,所以我们可以写

    To start, note that interleave can be made more lazy. The first element of interleave yss list is [x] if yss=[] or (x:y:ys) if yss=y:ys. But both of these are the same as x:yss, so we can write

    interleave yss = (x:yss) : interleave' yss
    interleave' [] = []
    interleave' (y:ys) = map (y:) (interleave ys)
    

    Data.List 中的实现延续了这个想法,但使用了更多的技巧.

    The implementation in Data.List continues on this idea, but uses a few more tricks.

    通过 邮件列表讨论.我们从 David Benbennick 的版本开始,它与我上面写的版本相同(没有惰性交错).我们已经知道 permutations xs 的第一个元素应该是 xs 本身.所以,让我们把它放在

    It is perhaps easiest to go through the mailing list discussion. We start with David Benbennick's version, which is the same as the one I wrote above (without the lazy interleave). We already know that the first elment of permutations xs should be xs itself. So, let's put that in

    permutations xxs     = xxs : permutations' xxs
    permutations' []     = []
    permutations' (x:xs) = tail $ concatMap interleave $ permutations xs
      where interleave = ..
    

    tail 的调用当然不是很好.但是如果我们内联 permutationsinterleave 的定义,我们会得到

    The call to tail is of course not very nice. But if we inline the definitions of permutations and interleave we get

    permutations' (x:xs)
      = tail $ concatMap interleave $ permutations xs
      = tail $ interleave xs ++ concatMap interleave (permutations' xs)
      = tail $ (x:xs) : interleave' xs ++ concatMap interleave (permutations' xs)
      = interleave' xs ++ concatMap interleave (permutations' xs)
    

    现在我们有

    permutations xxs     = xxs : permutations' xxs
    permutations' []     = []
    permutations' (x:xs) = interleave' xs ++ concatMap interleave (permutations' xs)
      where
       interleave yss = (x:yss) : interleave' yss
       interleave' [] = []
       interleave' (y:ys) = map (y:) (interleave ys)
    

    下一步是优化.一个重要的目标是消除交错中的 (++) 调用.这并不容易,因为最后一行,map (y:) (interleave ys).我们不能立即使用将尾部作为参数传递的 foldr/ShowS 技巧.出路是摆脱地图.如果我们传递一个参数 f 作为必须在最后映射到结果的函数,我们得到

    The next step is optimization. An important target would be to eliminate the (++) calls in interleave. This is not so easy, because of the last line, map (y:) (interleave ys). We can't immediately use the foldr/ShowS trick of passing the tail as a parameter. The way out is to get rid of the map. If we pass a parameter f as the function that has to be mapped over the result at the end, we get

    permutations' (x:xs) = interleave' id xs ++ concatMap (interleave id) (permutations' xs)
      where
       interleave f yss = f (x:yss) : interleave' f yss
       interleave' f [] = []
       interleave' f (y:ys) = interleave (f . (y:)) ys
    

    现在我们可以传入尾部了,

    Now we can pass in the tail,

    permutations' (x:xs) = interleave' id xs $ foldr (interleave id) [] (permutations' xs)
      where
       interleave  f yss    r = f (x:yss) : interleave' f yss r
       interleave' f []     r = r
       interleave' f (y:ys) r = interleave (f . (y:)) ys r
    

    这开始看起来像 Data.List 中的那个,但现在还不一样.特别是,它不像它可能的那样懒惰.让我们试一试:

    This is starting to look like the one in Data.List, but it is not the same yet. In particular, it is not as lazy as it could be. Let's try it out:

    *Main> let n = 4
    *Main> map (take n) (take (factorial n) $ permutations ([1..n] ++ undefined))
    [[1,2,3,4],[2,1,3,4],[2,3,1,4],[2,3,4,1]*** Exception: Prelude.undefined
    

    呃哦,只有第一个 n 元素是正确的,而不是第一个 factorial n.原因是我们仍然尝试将第一个元素(上例中的 1)放置在所有可能的位置,然后再尝试其他任何操作.

    Uh oh, only the first n elements are correct, not the first factorial n. The reason is that we still try to place the first element (the 1 in the above example) in all possible locations before trying anything else.

    Yitzchak Gale 想出了一个解决方案.考虑了将输入分成初始部分、中间元素和尾部的所有方法:

    Yitzchak Gale came up with a solution. Considered all ways to split the input into an initial part, a middle element, and a tail:

    [1..n] == []    ++ 1 : [2..n]
           == [1]   ++ 2 : [3..n]
           == [1,2] ++ 3 : [4..n]
    

    如果您之前没有见过生成这些的技巧,您可以使用 zip (inits xs) (tails xs) 来完成.现在 [1..n] 的排列是

    If you haven't seen the trick to generate these before before, you can do this with zip (inits xs) (tails xs). Now the permutations of [1..n] will be

    • [] ++ 1 : [2..n] 又名.[1..n], 或
    • 2 插入(交错)到 [1] 的排列中,然后是 [3..n].但是没有在 [1] 的末尾插入 2,因为我们已经在前面的项目符号中找到了那个结果.
    • 3 交织成 [1,2] 的排列(不是最后),后面跟着 [4..n].
    • [] ++ 1 : [2..n] aka. [1..n], or
    • 2 inserted (interleaved) somewhere into a permutation of [1], followed by [3..n]. But not 2 inserted at the end of [1], since we already go that result in the previous bullet point.
    • 3 interleaved into a permutation of [1,2] (not at the end), followed by [4..n].
    • etc.

    你可以看到这是最大的懒惰,因为在我们甚至考虑用 3 做一些事情之前,我们已经给出了所有以 [1,2] 的一些排列开头的排列代码>.Yitzchak 给出的代码是

    You can see that this is maximally lazy, since before we even consider doing something with 3, we have given all permutations that start with some permutation of [1,2]. The code that Yitzchak gave was

    permutations xs = xs : concat (zipWith newPerms (init $ tail $ tails xs)
                                                    (init $ tail $ inits xs))
      where
        newPerms (t:ts) = map (++ts) . concatMap (interleave t) . permutations3
        interleave t [y]        = [[t, y]]
        interleave t ys@(y:ys') = (t:ys) : map (y:) (interleave t ys') 
    

    注意对 permutations3 的递归调用,它可以是一个不必最大程度地懒惰的变体.

    Note the recursive call to permutations3, which can be a variant that doesn't have to be maximally lazy.

    如您所见,这比我们之前的优化稍差.但是我们可以应用一些相同的技巧.

    As you can see this is a bit less optimized than what we had before. But we can apply some of the same tricks.

    第一步是去掉inittail.我们来看看zip (init $ tail $ tails xs) (init $ tail $ inits xs) 究竟是什么

    The first step is to get rid of init and tail. Let's look at what zip (init $ tail $ tails xs) (init $ tail $ inits xs) actually is

    *Main> let xs = [1..5] in zip (init $ tail $ tails xs) (init $ tail $ inits xs)
    [([2,3,4,5],[1]),([3,4,5],[1,2]),([4,5],[1,2,3]),([5],[1,2,3,4])]
    

    init 去掉了组合 ([],[1..n]),而 tail 去掉了组合组合 ([1..n],[]).我们不想要前者,因为那会使 newPerms 中的模式匹配失败.后者将失败interleave.两者都很容易修复:只需为 newPerms []interleave t [] 添加一个 case.

    The init gets rid of the combination ([],[1..n]), while the tail gets rid of the combination ([1..n],[]). We don't want the former, because that would fail the pattern match in newPerms. The latter would fail interleave. Both are easy to fix: just add a case for newPerms [] and for interleave t [].

    permutations xs = xs : concat (zipWith newPerms (tails xs) (inits xs))
      where
        newPerms [] is = []
        newPerms (t:ts) is = map (++ts) (concatMap (interleave t) (permutations is))
        interleave t []         = []
        interleave t ys@(y:ys') = (t:ys) : map (y:) (interleave t ys') 
    

    现在我们可以尝试内联tailsinits.他们的定义是

    Now we can try to inline tails and inits. Their definition is

    tails xxs = xxs : case xxs of
      []     -> []
      (_:xs) -> tails xs
    
    inits xxs = [] : case xxs of
      []     -> []
      (x:xs) -> map (x:) (inits xs)
    

    问题在于 inits 不是尾递归的.但是因为无论如何我们都要对inits进行排列,所以我们不关心元素的顺序.所以我们可以使用一个累加参数,

    The problem is that inits is not tail recursive. But since we are going to take a permutation of the inits anyway, we don't care about the order of the elements. So we can use an accumulating parameter,

    inits' = inits'' []
      where
      inits'' is xxs = is : case xxs of
        []     -> []
        (x:xs) -> inits'' (x:is) xs
    

    现在我们让 newPerms 成为 xxs 和这个累加参数的函数,而不是 tails xxsinits xxs>.

    Now we make newPerms a function of xxs and this accumulating parameter, instead of tails xxs and inits xxs.

    permutations xs = xs : concat (newPerms' xs [])
      where
        newPerms' xxs is =
          newPerms xxs is :
          case xxs of
            []     -> []
            (x:xs) -> newPerms' xs (x:is)
        newPerms [] is = []
        newPerms (t:ts) is = map (++ts) (concatMap (interleave t) (permutations3 is))
    

    内联 newPermsnewPerms' 然后给出

    inlining newPerms into newPerms' then gives

    permutations xs = xs : concat (newPerms' xs [])
      where
        newPerms' []     is = [] : []
        newPerms' (t:ts) is =
          map (++ts) (concatMap (interleave t) (permutations is)) :
          newPerms' ts (t:is)
    

    内联和展开concat,并将最终的map(++ts)移入interleave

    inlining and unfolding concat, and moving the final map (++ts) into interleave,

    permutations xs = xs : newPerms' xs []
      where
        newPerms' []     is = []
        newPerms' (t:ts) is =
            concatMap interleave (permutations is) ++
            newPerms' ts (t:is)
          where
          interleave []     = []
          interleave (y:ys) = (t:y:ys++ts) : map (y:) (interleave ys) 
    

    最后,我们可以重新应用foldr技巧来摆脱(++):

    Then finally, we can reapply the foldr trick to get rid of the (++):

    permutations xs = xs : newPerms' xs []
      where
        newPerms' []     is = []
        newPerms' (t:ts) is =
            foldr (interleave id) (newPerms' ts (t:is)) (permutations is)
          where
          interleave f []     r = r
          interleave f (y:ys) r = f (t:y:ys++ts) : interleave (f . (y:)) ys r
    

    等等,我说去掉(++).我们去掉了其中之一,但没有去掉 interleave 中的那个.为此,我们可以看到我们总是将 yys 的一些尾部连接到 ts.所以,我们可以将计算(ys++ts)连同interleave的递归展开,得到函数interleave' f ys r返回元组 (ys++ts, interleave f ys r).这给

    Wait, I said get rid of the (++). We got rid of one of them, but not the one in interleave. For that, we can see that we are always concatenating some tail of yys to ts. So, we can unfold the calculating (ys++ts) along with the recursion of interleave, and have the function interleave' f ys r return the tuple (ys++ts, interleave f ys r). This gives

    permutations xs = xs : newPerms' xs []
      where
        newPerms' []     is = []
        newPerms' (t:ts) is =
            foldr interleave (newPerms' ts (t:is)) (permutations is)
          where
          interleave ys r = let (_,zs) = interleave' id ys r in zs
          interleave' f []     r = (ts,r)
          interleave' f (y:ys) r = 
            let (us,zs) = interleave' (f . (y:)) ys r
            in  (y:us, f (t:y:us) : zs)
    

    就这样,Data.List.permutations 尽享其最大程度的惰性优化功能.

    And there you have it, Data.List.permutations in all its maximally lazy optimized glory.

    Twan 写的很棒!我(@Yitz)只会添加一些参考:

    Great write-up by Twan! I (@Yitz) will just add a few references:

    • Twan 开发此算法的原始电子邮件线程,上面由 Twan 链接,读起来很有趣.

    • The original email thread where Twan developed this algorithm, linked above by Twan, is fascinating reading.

    Knuth 在卷中对满足这些标准的所有可能算法进行分类.4 法斯.2 秒.7.2.1.2.

    Knuth classifies all possible algorithms that satisfy these criteria in Vol. 4 Fasc. 2 Sec. 7.2.1.2.

    Twan 的 permutations3 本质上与 Knuth 的算法 P"相同.据 Knuth 所知,该算法最早由 1600 年代的英国教堂钟声者发布.

    Twan's permutations3 is essentially the same as Knuth's "Algorithm P". As far as Knuth knows, that algorithm was first published by English church bell ringers in the 1600's.

    这篇关于Haskell 中的这个列表排列实现究竟做了什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆