使用continuation monad在`Set`(以及其他具有约束的容器)上构建高效monad实例 [英] Constructing efficient monad instances on `Set` (and other containers with constraints) using the continuation monad

查看:133
本文介绍了使用continuation monad在`Set`(以及其他具有约束的容器)上构建高效monad实例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[] 类似,

设置具有完美定义的monadic操作。问题是它们要求值满足 Ord 约束,所以不可能定义 return >> = ,没有任何限制。同样的问题适用于许多其他需要对可能值进行约束的数据结构。



标准技巧(在haskell-cafe post )将 Set 包装为继续monad。 ContT 并不关心底层类型函子是否有任何约束。只有在将 Set s设置为/从延续中取出时才需要约束:

  import Control.Monad.Cont 
import Data.Foldable(foldrM)
import Data.Set

setReturn :: a - >设置
setReturn =单身

setBind ::(Ord b)=>设置 - > (a - > Set b) - >设置b
setBind set f = foldl'(\ s - > union s。f)空集

类型SetM ra = ContT r设置

fromSet ::(Ord r)=>设置 - >从Set = ContT SetM r a
。 setBind

toSet :: SetM r r - >将r
设置为Set c = runContT c setReturn

这可以根据需要使用。例如,我们可以模拟一个非确定性函数,它可以将其参数增加1或保持不变:

  step :: (Ord r)=> Int  - > SetM r Int 
step i = fromSet $ fromList [i,i + 1]

- 重复应用步骤:
stepN :: Int - > Int - > Set Int
stepN times start = toSet $ foldrM($)start(replicate times step)

确实, stepN 5 0 会从列表[0,1,2,3,4,5] 中产生。如果我们使用 [] monad来代替,我们会得到

  [0 ,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3 ,3,4,3,4,4,5] 

改为。






问题在于效率。如果我们调用 stepN 20 0 ,输出需要几秒钟, stepN 30 0 没有在合理的数量内完成的时间。事实证明,所有 Set.union 操作都是在最后执行的,而不是在每次单次计算之后执行它们。其结果是构造了许多 Set s,并且 union 仅在最后完成,这对于大多数任务是不可接受的。

有没有办法解决这个问题?我尝试过,但没有成功。



(我甚至怀疑可能存在一些类似于Curry-Howard同构和格利文科定理格利文科定理说,对任何命题重言式而言,公式¬¬φ可以用直觉逻辑来证明,但是,我怀疑证明的长度(以常规形式)可能是指数级的长度,那么,也许有可能会出现这样的情况,即将计算包装到连续monad中会使其指数级更长?)

解决方案

Monads是结构化和排序计算的一种特殊方式。单子的绑定不能神奇地重构你的计算,以便以更有效的方式发生。计算结构的方式有两个问题。


  1. 当评估 stepN 20 0 step 0 的结果将被计算20次。这是因为计算的每一步都会产生0作为一个选择,然后将它馈送到下一步,这也产生0作为选择,等等......




  2. 一个更大的问题是 ContT 的影响,在你的计算结构上。通过一些等同推理,扩展了复制20步的结果,定义了 foldrM 并简化了许多如果需要的话,我们可以看到 stepN 20 0 相当于:

     <$ c $ (步骤)>> = ...)
    < / code>

    此表达式的所有括号与左边关联。这很好,因为这意味着每次出现(>> =)的RHS是基本计算,即 step ,而不是一个合成的。但是,放大 ContT

    $ b (>> =)的定义
    $ b

      m>>> = k = ContT $ \c  - > runContT m(\ a  - > runContT(ka)c)

    我们看到,链的(>> =)关联到左边,每个绑定会将一个新的计算推送到当前的继续 c 。为了说明正在发生的事情,我们可以再次使用一些等式推理,扩展出(>> =)的定义和 runContT ,并简化,产生:

      setReturn 0`setBind` 
    \ x1 - > step x1`setBind`
    (\x2 - > step x2`setBind`(\x3 - > ...)...)
    setBind
    ,让我们问自己什么是RHS的论点是。对于最左边的事件,RHS参数是在 setReturn 0 之后的整个剩余计算。对于第二次出现,它是 step x1 等。让我们放大 setBind 的定义:

      setBind set f = foldl'(\ s  - > union s。f)empty set 

    这里 f 表示所有其余的计算, everything 在oc的右侧 setBind 的结果。这意味着在每一步中,我们都将其余的计算捕获为 f ,并多次应用 f 因为 set 中有元素。这些计算并不像以前那样简单,而是组成的,而且这些计算将被复制很多次。


症结问题在于 ContT monad转换器正在改变计算的初始结构,这意味着它是一个左结合链 setBind m f g 我们有

 (m>> = f)>> ; = g = m>> =(\ x  - > fx>> = g)


$ b $然而,单子法并没有强调复杂性在每个法则方程的每一侧都保持不变。事实上,在这种情况下,构造这种计算的左边联想方式更有效率。左边的 setBind 的关联链很快就会进行评估,因为只有基本的子计算被复制出来。


将其他解决方案shoehorning Set 设置为monad也会遇到同样的问题。特别是, set-monad 软件包会产生类似的运行时间。原因是,它也将左关联表达式重写为正确的关联表达式。



我认为你已经将手指放在一个非常重要但很微妙的问题上强调 Set 服从 Monad 界面。我不认为这是可以解决的。问题是monad绑定的类型需要是

 (>> =):: ma  - > (a  - > m b) - > mb 

a b 。这意味着我们不能在左边嵌套绑定,没有先调用单子法来重写成正确的关联链。这是为什么:给定(m>> = f)>> = g ,计算的类型(m>> = f)形式为 mb 。计算(m>> = f)的值是类型 b 。但是因为我们不能将任何类的约束挂到类型变量 b 上,所以我们不知道我们得到的值满足 Ord 约束,因此不能使用这个值作为我们希望能够计算 union 的集合的元素。


Set, similarly to [] has a perfectly defined monadic operations. The problem is that they require that the values satisfy Ord constraint, and so it's impossible to define return and >>= without any constraints. The same problem applies to many other data structures that require some kind of constraints on possible values.

The standard trick (suggested to me in a haskell-cafe post) is to wrap Set into the continuation monad. ContT doesn't care if the underlying type functor has any constraints. The constraints become only needed when wrapping/unwrapping Sets into/from continuations:

import Control.Monad.Cont
import Data.Foldable (foldrM)
import Data.Set

setReturn :: a -> Set a
setReturn = singleton

setBind :: (Ord b) => Set a -> (a -> Set b) -> Set b
setBind set f = foldl' (\s -> union s . f) empty set

type SetM r a = ContT r Set a

fromSet :: (Ord r) => Set a -> SetM r a
fromSet = ContT . setBind

toSet :: SetM r r -> Set r
toSet c = runContT c setReturn

This works as needed. For example, we can simulate a non-deterministic function that either increases its argument by 1 or leaves it intact:

step :: (Ord r) => Int -> SetM r Int
step i = fromSet $ fromList [i, i + 1]

-- repeated application of step:
stepN :: Int -> Int -> Set Int
stepN times start = toSet $ foldrM ($) start (replicate times step)

Indeed, stepN 5 0 yields fromList [0,1,2,3,4,5]. If we used [] monad instead, we would get

[0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5]

instead.


The problem is efficiency. If we call stepN 20 0 the output takes a few seconds and stepN 30 0 doesn't finish within a reasonable amount of time. It turns out that all Set.union operations are performed at the end, instead of performing them after each monadic computation. The result is that exponentially many Sets are constructed and unioned only at the end, which is unacceptable for most tasks.

Is there any way around it, to make this construction efficient? I tried but without success.

(I even suspect that there could be some kinds of theoretical limits following from Curry-Howard isomorphism and Glivenko's theorem. Glivenko's theorem says that for any propositional tautology φ the formula ¬¬φ can be proved in intuitionistic logic. However, I suspect that the length of the proof (in normal form) can be exponentially long. So, perhaps, there could be cases when wrapping a computation into the continuation monad will make it exponentially longer?)

解决方案

Monads are one particular way of structuring and sequencing computations. The bind of a monad cannot magically restructure your computation so as to happen in a more efficient way. There are two problems with the way you structure your computation.

  1. When evaluating stepN 20 0, the result of step 0 will be computed 20 times. This is because each step of the computation produces 0 as one alternative, which is then fed to the next step, which also produces 0 as alternative, and so on...

    Perhaps a bit of memoization here can help.

  2. A much bigger problem is the effect of ContT on the structure of your computation. With a bit of equational reasoning, expanding out the result of replicate 20 step, the definition of foldrM and simplifying as many times as necessary, we can see that stepN 20 0 is equivalent to:

    (...(return 0 >>= step) >>= step) >>= step) >>= ...)
    

    All parentheses of this expression associate to the left. That's great, because it means that the RHS of each occurrence of (>>=) is an elementary computation, namely step, rather than a composed one. However, zooming in on the definition of (>>=) for ContT,

    m >>= k = ContT $ \c -> runContT m (\a -> runContT (k a) c)
    

    we see that when evaluating a chain of (>>=) associating to the left, each bind will push a new computation onto the current continuation c. To illustrate what is going on, we can use again a bit of equational reasoning, expanding out this definition for (>>=) and the definition for runContT, and simplifying, yielding:

    setReturn 0 `setBind`
        (\x1 -> step x1 `setBind`
            (\x2 -> step x2 `setBind` (\x3 -> ...)...)
    

    Now, for each occurrence of setBind, let's ask ourselves what the RHS argument is. For the leftmost occurrence, the RHS argument is the whole rest of the computation after setReturn 0. For the second occurrence, it's everything after step x1, etc. Let's zoom in to the definition of setBind:

    setBind set f = foldl' (\s -> union s . f) empty set
    

    Here f represents all the rest of the computation, everything on the right hand side of an occurrence of setBind. That means that at each step, we are capturing the rest of the computation as f, and applying f as many times as there are elements in set. The computations are not elementary as before, but rather composed, and these computations will be duplicated many times.

The crux of the problem is that the ContT monad transformer is transforming the initial structure of the computation, which you meant as a left associative chain of setBind's, into a computation with a different structure, ie a right associative chain. This is after all perfectly fine, because one of the monad laws says that, for every m, f and g we have

(m >>= f) >>= g = m >>= (\x -> f x >>= g)

However, the monad laws do not impose that the complexity remain the same on each side of the equations of each law. And indeed, in this case, the left associative way of structuring this computation is a lot more efficient. The left associative chain of setBind's evaluates in no time, because only elementary subcomputations are duplicated.

It turns out that other solutions shoehorning Set into a monad also suffer from the same problem. In particular, the set-monad package, yields similar runtimes. The reason being, that it too, rewrites left associative expressions into right associative ones.

I think you have put the finger on a very important yet rather subtle problem with insisting that Set obeys a Monad interface. And I don't think it can be solved. The problem is that the type of the bind of a monad needs to be

(>>=) :: m a -> (a -> m b) -> m b

ie no class constraint allowed on either a or b. That means that we cannot nest binds on the left, without first invoking the monad laws to rewrite into a right associative chain. Here's why: given (m >>= f) >>= g, the type of the computation (m >>= f) is of the form m b. A value of the computation (m >>= f) is of type b. But because we can't hang any class constraint onto the type variable b, we can't know that the value we got satisfies an Ord constraint, and therefore cannot use this value as the element of a set on which we want to be able to compute union's.

这篇关于使用continuation monad在`Set`(以及其他具有约束的容器)上构建高效monad实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆