为什么包装 Data.Binary.Put monad 会造成内存泄漏?(第2部分) [英] Why wrapping the Data.Binary.Put monad creates a memory leak? (Part 2)

查看:22
本文介绍了为什么包装 Data.Binary.Put monad 会造成内存泄漏?(第2部分)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如我的上一个问题,我正在尝试将 Data.Binary.Put monad 包装到另一个 monad 中,以便稍后我可以问它诸如它将要写入多少字节"或文件中的当前位置"之类的问题.

之前,我认为理解为什么在使用简单的(IdentityT?)包装器时会泄漏内存将引导我解决我的问题.但是即使你们帮我解决了这个简单的包装器的问题,用像 StateT 或 WriterT 这样有用的东西来包装它仍然消耗太多内存(通常会崩溃).

例如,这是我尝试包装它的一种方式,它会为大输入泄漏内存:

<前>type Out = StateT Integer P.PutM()writeToFile :: String -> Out -> IO ()writeToFile 路径输出 = BL.writeFile 路径 $ P.runPut $ do runStateT 输出 0返回 ()

这里是演示问题的更完整的代码示例.

我想知道的是:

  1. 程序内部发生了什么导致内存泄漏?
  2. 我能做些什么来修复它?

对于我的第二个问题,我想我应该更详细地解释我希望数据在磁盘上查看的内容:它基本上是一个树结构,其中树的每个节点都表示为其子节点的偏移表(加上一些额外的数据).因此,要计算第 n 个子节点在偏移表中的偏移量,我需要知道 0 到 n-1 子节点的大小加上当前偏移量(为了简化事情,假设每个节点都有固定数量的子节点).

感谢您的关注.

更新:感谢 nominolo,我现在可以创建一个环绕 Data.Binary.Put 的 monad,跟踪当前偏移并且几乎不使用内存.这是通过放弃使用 StateT 转换器,转而使用使用 Continuations 的不同状态线程机制来实现的.

像这样:

<前>类型偏移 = Intnewtype MyPut a = MyPut{ unS :: forall r .(偏移量 -> a -> P.PutM r) -> 偏移量 -> P.PutM r }实例 Monad MyPut where返回 a = MyPut $ f s -> f s ama >>= f = MyPut $ fb s -> unS ma (s' a -> unS (f a) fb s') swriteToFile :: String -> MyPut () -> IO ()writeToFile 路径 put =BL.writeFile 路径 $P.runPut $peal put >> return()其中 peal myput = unS myput (o -> return) 0getCurrentOffset :: MyPut IntgetCurrentOffset = MyPut $ f o -> f o oLift' n ma = MyPut $ f s -> ma >>= f (s+n)

但是,我仍然无法跟踪 MyPut 将在磁盘上写入多少字节.特别是,我需要一个带有签名的函数,如下所示:<前>getSize :: MyPut a -> MyPut Int

getSize :: MyPut a -> Int

我的方法是将 MyPut monad 包裹在 WriterT 转换器中(类似于 this).但这又开始消耗太多内存.正如 sclv 在 nominolos 回答下的评论中提到的那样,WriterT 以某种方式抵消了延续的影响.他还提到应该可以直接从我已经拥有的 MyPut monad 获取大小,但我所有的尝试都以不可编译的代码或无限循环结束:-|.

有人可以帮忙吗?

解决方案

看来 monad 转换器太懒了.您可以通过使用以下命令运行程序来创建堆配置文件(无需专门构建它):

$ ./myprog +RTS -hT$ hp2ps myprog.hp$ open hp2ps.ps # 或者你有的任何查看器

在这种情况下,它不是特别有用,因为它只显示了很多 PAPFUN_1_0FUN_2_0.这意味着堆由许多部分应用的函数以及一个参数和两个参数的函数组成.这通常意味着某些东西没有得到足够的评估.Monad 转换器因此而臭名昭著.

解决方法是使用更严格的 monad 转换器,使用 连续传递样式.(他需要 {-# LANGUAGE Rank2Types #-}.

newtype MyStateT s m a =MyStateT { unMyStateT :: forall r.(s -> a -> m r) ->s ->先生 }

Continuation 传递风格意味着我们不直接返回结果,而是调用另一个函数,continuation,带有我们的结果,在本例中为 sa.实例定义看起来有点滑稽.要了解它,请阅读上面的链接(维基百科).

instance Monad m =>Monad (MyStateTs m) 其中返回 x = MyStateT (k s -> k s x)MyStateT f >>= kk = MyStateT (k s ->f (s' a -> unMyStateT (kk a) k s') s)runMyStateT :: Monad m =>MyStateT s m a ->s ->米 (a, s)runMyStateT (MyStateT f) s0 = f (s a -> return (a, s)) s0实例 MonadTrans (MyStateT s) 其中Lift act = MyStateT (k s -> do a <- act; k s a)type Out = MyStateT Integer P.PutM()

现在运行它会提供恒定的空间(最大驻留时间"位):

$ ./so1 +RTS -s开始结尾在堆中分配了 8,001,343,308 个字节GC 期间复制了 877,696,096 字节46,628 字节最大驻留(861 个样本)33,196 字节最大斜率正在使用的总内存为 2 MB(由于碎片而丢失了 0 MB)第 0 代:14345 个集合,0 个并行,3.32 秒,3.38 秒过去第 1 代:861 个集合,0 个并行,0.08 秒,0.08 秒过去

使用这种严格的转换器的缺点是你不能再定义 MonadFix 实例并且某些懒惰的技巧不再起作用.

As in my previous question, I'm trying to wrap the Data.Binary.Put monad into another monad so that later I can ask it questions like "how many bytes it's going to write" or "what is the current position in file".

Before, I thought that understanding why it leaks memory while using a trivial (IdentityT?) wrapper would lead me to solving my problem. But even though you guys have helped me resolve the problem with the trivial wrapper, wrapping it with something usefull like StateT or WriterT still consumes too much memory (and usually crashes).

For example, this is one way I'm trying to wrap it and which leaks memory for big input:

type Out = StateT Integer P.PutM ()

writeToFile :: String -> Out -> IO ()
writeToFile path out = BL.writeFile path $ P.runPut $ do runStateT out 0
                                                         return ()

Here is a more complete code sample that demonstrates the problem.

What I would like to know is this:

  1. What is happending inside the program that causes the memory leak?
  2. What can I do to fix it?

For my second question I think I should explain in more details what I intend the data to look on disk: It is basically a tree structure where each node of the tree is represented as an offset table to it's children (plus some additional data). So to calculate offset of n-th children into the offset table I need to know the sizes of children 0 to n-1 plus the current offset (to simplify things, let's say each node has fixed number of childs).

Thanks for looking.

UPDATE: Thanks to nominolo I can now create a monad that wraps around the Data.Binary.Put, tracks current offset and uses almost no memory. This is done by dropping the use of StateT transformer in favor of a different state threading mechanism that uses Continuations.

Like this:

type Offset = Int

newtype MyPut a = MyPut
  { unS :: forall r . (Offset -> a -> P.PutM r) -> Offset -> P.PutM r }

instance Monad MyPut where
  return a = MyPut $ f s -> f s a
  ma >>= f = MyPut $ fb s -> unS ma (s' a -> unS (f a) fb s') s

writeToFile :: String -> MyPut () -> IO ()
writeToFile path put =
  BL.writeFile path $ P.runPut $ peal put >> return ()
  where peal myput = unS myput (o -> return) 0

getCurrentOffset :: MyPut Int
getCurrentOffset = MyPut $ f o -> f o o

lift' n ma = MyPut $ f s -> ma >>= f (s+n)

However I still have a problem with tracking how many bytes is MyPut going to write on disk. In particular, I need to have a function with signature like this:

getSize :: MyPut a -> MyPut Int

or

getSize :: MyPut a -> Int

My aproach was to wrap the MyPut monad inside WriterT transformer (something like this). But that started to consume too much memory again. As sclv mentions in comments under nominolos answer, WriterT somehow cancels out the effect of continuations. He also mentions that getting the size should be possible directly from the MyPut monad that I already have, but all my attempts to do so ended in non compilable code or an infinite loop :-|.

Could someone please help further?

解决方案

It looks like the monad transformer is too lazy. You can create a heap profile (without having to build it specially) by running the program with:

$ ./myprog +RTS -hT
$ hp2ps myprog.hp
$ open hp2ps.ps    # Or whichever viewer you have

In this case it's not particularly helpful, because it only shows lots of PAPs, FUN_1_0s and FUN_2_0s. This means the heap is made up of lots of partially applied functions, and functions of one argument and two arguments. This usually means that something is not evaluated enough. Monad transformers are somewhat notorious for this.

The workaround is to use a more strict monad transformers using continuation passing style. (his requires {-# LANGUAGE Rank2Types #-}.

newtype MyStateT s m a =
  MyStateT { unMyStateT :: forall r. (s -> a -> m r) -> s -> m r }

Continuation passing style means that instead of returning a result directly, we call another function, the continuation, with our result, in this case s and a. The instance definitions look a bit funny. To understand it read the link above (Wikipedia).

instance Monad m => Monad (MyStateT s m) where
  return x = MyStateT (k s -> k s x)
  MyStateT f >>= kk = MyStateT (k s ->
    f (s' a -> unMyStateT (kk a) k s') s)

runMyStateT :: Monad m => MyStateT s m a -> s -> m (a, s)
runMyStateT (MyStateT f) s0 = f (s a -> return (a, s)) s0

instance MonadTrans (MyStateT s) where
  lift act = MyStateT (k s -> do a <- act; k s a)

type Out = MyStateT Integer P.PutM ()

Running it now gives constant space (the "maximum residency" bit):

$ ./so1 +RTS -s 
begin
end
   8,001,343,308 bytes allocated in the heap
     877,696,096 bytes copied during GC
          46,628 bytes maximum residency (861 sample(s))
          33,196 bytes maximum slop
            2 MB total memory in use (0 MB lost due to fragmentation)

Generation 0: 14345 collections,     0 parallel,  3.32s,  3.38s elapsed
Generation 1:   861 collections,     0 parallel,  0.08s,  0.08s elapsed

The downside of using such strict transformers is that you can no longer define MonadFix instances and certain laziness tricks no longer work.

这篇关于为什么包装 Data.Binary.Put monad 会造成内存泄漏?(第2部分)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆