使用continuation monad在`Set`(以及其他具有约束的容器)上构建高效monad实例 [英] Constructing efficient monad instances on `Set` (and other containers with constraints) using the continuation monad
问题描述
[]
类似, 设置具有完美定义的monadic操作。问题是它们要求值满足
Ord
约束,所以不可能定义 return
和>> =
,没有任何限制。同样的问题适用于许多其他需要对可能值进行约束的数据结构。
标准技巧(在haskell-cafe post )将 Set
包装为继续monad。 ContT
并不关心底层类型函子是否有任何约束。只有在将 Set
s设置为/从延续中取出时才需要约束:
import Control.Monad.Cont
import Data.Foldable(foldrM)
import Data.Set
setReturn :: a - >设置
setReturn =单身
setBind ::(Ord b)=>设置 - > (a - > Set b) - >设置b
setBind set f = foldl'(\ s - > union s。f)空集
类型SetM ra = ContT r设置
fromSet ::(Ord r)=>设置 - >从Set = ContT SetM r a
。 setBind
toSet :: SetM r r - >将r
设置为Set c = runContT c setReturn
这可以根据需要使用。例如,我们可以模拟一个非确定性函数,它可以将其参数增加1或保持不变:
step :: (Ord r)=> Int - > SetM r Int
step i = fromSet $ fromList [i,i + 1]
- 重复应用步骤:
stepN :: Int - > Int - > Set Int
stepN times start = toSet $ foldrM($)start(replicate times step)
确实, stepN 5 0
会从列表[0,1,2,3,4,5] 中产生。如果我们使用
[]
monad来代替,我们会得到
[0 ,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3 ,3,4,3,4,4,5]
改为。
问题在于效率。如果我们调用 有没有办法解决这个问题?我尝试过,但没有成功。 (我甚至怀疑可能存在一些类似于Curry-Howard同构和格利文科定理格利文科定理说,对任何命题重言式而言,公式¬¬φ可以用直觉逻辑来证明,但是,我怀疑证明的长度(以常规形式)可能是指数级的长度,那么,也许有可能会出现这样的情况,即将计算包装到连续monad中会使其指数级更长?) Monads是结构化和排序计算的一种特殊方式。单子的绑定不能神奇地重构你的计算,以便以更有效的方式发生。计算结构的方式有两个问题。 当评估 一个更大的问题是 此表达式的所有括号与左边关联。这很好,因为这意味着每次出现 stepN 20 0
,输出需要几秒钟, stepN 30 0
没有在合理的数量内完成的时间。事实证明,所有 Set.union
操作都是在最后执行的,而不是在每次单次计算之后执行它们。其结果是构造了许多 Set
s,并且
stepN 20 0
,
step 0
的结果将被计算20次。这是因为计算的每一步都会产生0作为一个选择,然后将它馈送到下一步,这也产生0作为选择,等等......
ContT
的影响,在你的计算结构上。通过一些等同推理,扩展了复制20步
的结果,定义了 foldrM
并简化了许多如果需要的话,我们可以看到 stepN 20 0
相当于:
<$ c $ (步骤)>> = ...)
< / code> (>> =)
的RHS是基本计算,即 step
,而不是一个合成的。但是,放大 ContT
,(>> =)
的定义
$ b
m>>> = k = ContT $ \c - > runContT m(\ a - > runContT(ka)c)
我们看到,链的(>> =)
关联到左边,每个绑定会将一个新的计算推送到当前的继续 c
。为了说明正在发生的事情,我们可以再次使用一些等式推理,扩展出(>> =)
的定义和 runContT
,并简化,产生:
setReturn 0`setBind`
\ x1 - > step x1`setBind`
(\x2 - > step x2`setBind`(\x3 - > ...)...)
$ c现在,对于每一个出现的 setBind
,让我们问自己什么是RHS的论点是。对于最左边的事件,RHS参数是在 setReturn 0
之后的整个剩余计算。对于第二次出现,它是 step x1
等。让我们放大 setBind
的定义: setBind set f = foldl'(\ s - > union s。f)empty set
这里 f
表示所有其余的计算, everything 在oc的右侧 setBind
的结果。这意味着在每一步中,我们都将其余的计算捕获为 f
,并多次应用 f
因为 set
中有元素。这些计算并不像以前那样简单,而是组成的,而且这些计算将被复制很多次。
症结问题在于 ContT
monad转换器正在改变计算的初始结构,这意味着它是一个左结合链 setBind $转化为具有不同结构的计算,即右联合链。这毕竟是完美无缺的,因为其中一条法律规定,对于每一个 m
, f
和 g
我们有
(m>> = f)>> ; = g = m>> =(\ x - > fx>> = g)
$ b $然而,单子法并没有强调复杂性在每个法则方程的每一侧都保持不变。事实上,在这种情况下,构造这种计算的左边联想方式更有效率。左边的 setBind
的关联链很快就会进行评估,因为只有基本的子计算被复制出来。
将其他解决方案shoehorning Set
设置为monad也会遇到同样的问题。特别是, set-monad 软件包会产生类似的运行时间。原因是,它也将左关联表达式重写为正确的关联表达式。
我认为你已经将手指放在一个非常重要但很微妙的问题上强调 Set
服从 Monad
界面。我不认为这是可以解决的。问题是monad绑定的类型需要是
(>> =):: ma - > (a - > m b) - > mb
即 a
或 b
。这意味着我们不能在左边嵌套绑定,没有先调用单子法来重写成正确的关联链。这是为什么:给定(m>> = f)>> = g
,计算的类型(m>> = f)
形式为 mb
。计算(m>> = f)
的值是类型 b
。但是因为我们不能将任何类的约束挂到类型变量 b
上,所以我们不知道我们得到的值满足 Ord 约束,因此不能使用这个值作为我们希望能够计算 union
的集合的元素。
Set
, similarly to []
has a perfectly defined monadic operations. The problem is that they require that the values satisfy Ord
constraint, and so it's impossible to define return
and >>=
without any constraints. The same problem applies to many other data structures that require some kind of constraints on possible values.
The standard trick (suggested to me in a haskell-cafe post) is to wrap Set
into the continuation monad. ContT
doesn't care if the underlying type functor has any constraints. The constraints become only needed when wrapping/unwrapping Set
s into/from continuations:
import Control.Monad.Cont
import Data.Foldable (foldrM)
import Data.Set
setReturn :: a -> Set a
setReturn = singleton
setBind :: (Ord b) => Set a -> (a -> Set b) -> Set b
setBind set f = foldl' (\s -> union s . f) empty set
type SetM r a = ContT r Set a
fromSet :: (Ord r) => Set a -> SetM r a
fromSet = ContT . setBind
toSet :: SetM r r -> Set r
toSet c = runContT c setReturn
This works as needed. For example, we can simulate a non-deterministic function that either increases its argument by 1 or leaves it intact:
step :: (Ord r) => Int -> SetM r Int
step i = fromSet $ fromList [i, i + 1]
-- repeated application of step:
stepN :: Int -> Int -> Set Int
stepN times start = toSet $ foldrM ($) start (replicate times step)
Indeed, stepN 5 0
yields fromList [0,1,2,3,4,5]
. If we used []
monad instead, we would get
[0,1,1,2,1,2,2,3,1,2,2,3,2,3,3,4,1,2,2,3,2,3,3,4,2,3,3,4,3,4,4,5]
instead.
The problem is efficiency. If we call stepN 20 0
the output takes a few seconds and stepN 30 0
doesn't finish within a reasonable amount of time. It turns out that all Set.union
operations are performed at the end, instead of performing them after each monadic computation. The result is that exponentially many Set
s are constructed and union
ed only at the end, which is unacceptable for most tasks.
Is there any way around it, to make this construction efficient? I tried but without success.
(I even suspect that there could be some kinds of theoretical limits following from Curry-Howard isomorphism and Glivenko's theorem. Glivenko's theorem says that for any propositional tautology φ the formula ¬¬φ can be proved in intuitionistic logic. However, I suspect that the length of the proof (in normal form) can be exponentially long. So, perhaps, there could be cases when wrapping a computation into the continuation monad will make it exponentially longer?)
解决方案 Monads are one particular way of structuring and sequencing computations. The bind of a monad cannot magically restructure your computation so as to happen in a more efficient way. There are two problems with the way you structure your computation.
When evaluating stepN 20 0
, the result of step 0
will be computed 20 times. This is because each step of the computation produces 0 as one alternative, which is then fed to the next step, which also produces 0 as alternative, and so on...
Perhaps a bit of memoization here can help.
A much bigger problem is the effect of ContT
on the structure of your computation. With a bit of equational reasoning, expanding out the result of replicate 20 step
, the definition of foldrM
and simplifying as many times as necessary, we can see that stepN 20 0
is equivalent to:
(...(return 0 >>= step) >>= step) >>= step) >>= ...)
All parentheses of this expression associate to the left. That's great, because it means that the RHS of each occurrence of (>>=)
is an elementary computation, namely step
, rather than a composed one. However, zooming in on the definition of (>>=)
for ContT
,
m >>= k = ContT $ \c -> runContT m (\a -> runContT (k a) c)
we see that when evaluating a chain of (>>=)
associating to the left, each bind will push a new computation onto the current continuation c
. To illustrate what is going on, we can use again a bit of equational reasoning, expanding out this definition for (>>=)
and the definition for runContT
, and simplifying, yielding:
setReturn 0 `setBind`
(\x1 -> step x1 `setBind`
(\x2 -> step x2 `setBind` (\x3 -> ...)...)
Now, for each occurrence of setBind
, let's ask ourselves what the RHS argument is. For the leftmost occurrence, the RHS argument is the whole rest of the computation after setReturn 0
. For the second occurrence, it's everything after step x1
, etc. Let's zoom in to the definition of setBind
:
setBind set f = foldl' (\s -> union s . f) empty set
Here f
represents all the rest of the computation, everything on the right hand side of an occurrence of setBind
. That means that at each step, we are capturing the rest of the computation as f
, and applying f
as many times as there are elements in set
. The computations are not elementary as before, but rather composed, and these computations will be duplicated many times.
The crux of the problem is that the ContT
monad transformer is transforming the initial structure of the computation, which you meant as a left associative chain of setBind
's, into a computation with a different structure, ie a right associative chain. This is after all perfectly fine, because one of the monad laws says that, for every m
, f
and g
we have
(m >>= f) >>= g = m >>= (\x -> f x >>= g)
However, the monad laws do not impose that the complexity remain the same on each side of the equations of each law. And indeed, in this case, the left associative way of structuring this computation is a lot more efficient. The left associative chain of setBind
's evaluates in no time, because only elementary subcomputations are duplicated.
It turns out that other solutions shoehorning Set
into a monad also suffer from the same problem. In particular, the set-monad package, yields similar runtimes. The reason being, that it too, rewrites left associative expressions into right associative ones.
I think you have put the finger on a very important yet rather subtle problem with insisting that Set
obeys a Monad
interface. And I don't think it can be solved. The problem is that the type of the bind of a monad needs to be
(>>=) :: m a -> (a -> m b) -> m b
ie no class constraint allowed on either a
or b
. That means that we cannot nest binds on the left, without first invoking the monad laws to rewrite into a right associative chain. Here's why: given (m >>= f) >>= g
, the type of the computation (m >>= f)
is of the form m b
. A value of the computation (m >>= f)
is of type b
. But because we can't hang any class constraint onto the type variable b
, we can't know that the value we got satisfies an Ord
constraint, and therefore cannot use this value as the element of a set on which we want to be able to compute union
's.
这篇关于使用continuation monad在`Set`(以及其他具有约束的容器)上构建高效monad实例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!