为什么包裹Data.Binary.Put单子造成内存泄漏? (第2部分) [英] Why wrapping the Data.Binary.Put monad creates a memory leak? (Part 2)

查看:118
本文介绍了为什么包裹Data.Binary.Put单子造成内存泄漏? (第2部分)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正如我在<一个href=\"http://stackoverflow.com/questions/4828902/why-wrapping-the-data-binary-put-monad-creates-a-memory-leak\">$p$pvious问题,我想换行Data.Binary.Put单子到另一个单子以便以后我可以问它问题,如它会如何写入的字节数或什么是在文件的当前位置

As in my previous question, I'm trying to wrap the Data.Binary.Put monad into another monad so that later I can ask it questions like "how many bytes it's going to write" or "what is the current position in file".

之前,我认为理解为什么它同时使用一个普通(IdentityT?)包装将导致我解决我的问题出现内存泄漏。但是,即使你们已经帮我解决了问题,琐碎的包装,东西环绕它有用喜欢StateT或WriterT仍然消耗了太多的内存(通常崩溃)。

Before, I thought that understanding why it leaks memory while using a trivial (IdentityT?) wrapper would lead me to solving my problem. But even though you guys have helped me resolve the problem with the trivial wrapper, wrapping it with something usefull like StateT or WriterT still consumes too much memory (and usually crashes).

例如,这是我试图把它包起来的一种方式和泄漏的大输入存储器:

For example, this is one way I'm trying to wrap it and which leaks memory for big input:


type Out = StateT Integer P.PutM ()

writeToFile :: String -> Out -> IO ()
writeToFile path out = BL.writeFile path $ P.runPut $ do runStateT out 0
                                                         return ()

这里是演示该问题的更完整code样本。

Here is a more complete code sample that demonstrates the problem.

我想知道的是:


  1. 什么是导致内存泄漏的程序中happending?

  2. 我能做些什么来解决这个问题?

有关我的第二个问题,我想我应该在更多的细节我打算将数据看磁盘上的解释:它基本上是一个树状结构,其中树的每个节点都是psented作为偏移表到它的孩子们重新$ P $ (加上一些额外的数据)。所以,要计算偏移n个儿童纳入偏移表,我需要知道孩子0到n-1加上当前偏移量(以简化的东西,比方说每个节点都有固定的孩子的数量)的大小。

For my second question I think I should explain in more details what I intend the data to look on disk: It is basically a tree structure where each node of the tree is represented as an offset table to it's children (plus some additional data). So to calculate offset of n-th children into the offset table I need to know the sizes of children 0 to n-1 plus the current offset (to simplify things, let's say each node has fixed number of childs).

感谢您寻找。

更新:
感谢nominolo我现在可以创建围绕Data.Binary.Put包装了一个单子,跟踪当前的偏移和使用几乎没有任何记忆。这是由下降赞成使用延续一个不同的状态线程机制的使用StateT变压器完成。

UPDATE: Thanks to nominolo I can now create a monad that wraps around the Data.Binary.Put, tracks current offset and uses almost no memory. This is done by dropping the use of StateT transformer in favor of a different state threading mechanism that uses Continuations.

这样的:


type Offset = Int

newtype MyPut a = MyPut
  { unS :: forall r . (Offset -> a -> P.PutM r) -> Offset -> P.PutM r }

instance Monad MyPut where
  return a = MyPut $ \f s -> f s a
  ma >>= f = MyPut $ \fb s -> unS ma (\s' a -> unS (f a) fb s') s

writeToFile :: String -> MyPut () -> IO ()
writeToFile path put =
  BL.writeFile path $ P.runPut $ peal put >> return ()
  where peal myput = unS myput (\o -> return) 0

getCurrentOffset :: MyPut Int
getCurrentOffset = MyPut $ \f o -> f o o

lift' n ma = MyPut $ \f s -> ma >>= f (s+n)

不过,我仍然有跟踪MyPut去多少字节磁盘上写一个问题。特别是,我需要有这样的签名的函数:

However I still have a problem with tracking how many bytes is MyPut going to write on disk. In particular, I need to have a function with signature like this:


getSize :: MyPut a -> MyPut Int


getSize :: MyPut a -> Int

我的形式给出了包裹MyPut单子WriterT变压器内部(像这个)。但是,这又开始占用太多内存。如sclv中提及nominolos答案下评论,WriterT某种程度上抵消延续的效果。他还提到,获得规模应该可以直接从我已经有了MyPut单子,但我所有的尝试以非编译code或一个无限循环,这样做的结束: - |。

My aproach was to wrap the MyPut monad inside WriterT transformer (something like this). But that started to consume too much memory again. As sclv mentions in comments under nominolos answer, WriterT somehow cancels out the effect of continuations. He also mentions that getting the size should be possible directly from the MyPut monad that I already have, but all my attempts to do so ended in non compilable code or an infinite loop :-|.

可能有人请进一步帮助吗?

Could someone please help further?

推荐答案

它看起来像单子变压器是太懒惰。您可以创建一个堆轮廓(无需专门建立它)通过运行程序:

It looks like the monad transformer is too lazy. You can create a heap profile (without having to build it specially) by running the program with:

$ ./myprog +RTS -hT
$ hp2ps myprog.hp
$ open hp2ps.ps    # Or whichever viewer you have

在这种情况下,它不是特别有用,因为它只能显示大量 PAP S, FUN_1_0 和的 FUN_2_0 秒。这意味着堆是由很多部分应用功能,以及一个参数和两个参数的功能。这通常意味着什么不计算不够。 Monad的变压器是这个有点臭名昭著。

In this case it's not particularly helpful, because it only shows lots of PAPs, FUN_1_0s and FUN_2_0s. This means the heap is made up of lots of partially applied functions, and functions of one argument and two arguments. This usually means that something is not evaluated enough. Monad transformers are somewhat notorious for this.

解决方法是使用使用延续传递风格更严格的单子变压器。 (他的要求 { - #语言Rank2Types# - }

The workaround is to use a more strict monad transformers using continuation passing style. (his requires {-# LANGUAGE Rank2Types #-}.

newtype MyStateT s m a =
  MyStateT { unMyStateT :: forall r. (s -> a -> m r) -> s -> m r }

延续传球方式意味着,而不是结果直接返回,我们调用另一个函数时,的继续的,与我们的结果,在这种情况下,取值 A 。实例定义看起来有点滑稽。要理解它上面阅读(维基百科)的链接。

Continuation passing style means that instead of returning a result directly, we call another function, the continuation, with our result, in this case s and a. The instance definitions look a bit funny. To understand it read the link above (Wikipedia).

instance Monad m => Monad (MyStateT s m) where
  return x = MyStateT (\k s -> k s x)
  MyStateT f >>= kk = MyStateT (\k s ->
    f (\s' a -> unMyStateT (kk a) k s') s)

runMyStateT :: Monad m => MyStateT s m a -> s -> m (a, s)
runMyStateT (MyStateT f) s0 = f (\s a -> return (a, s)) s0

instance MonadTrans (MyStateT s) where
  lift act = MyStateT (\k s -> do a <- act; k s a)

type Out = MyStateT Integer P.PutM ()

运行它现在给定空间(最大居住位):

Running it now gives constant space (the "maximum residency" bit):

$ ./so1 +RTS -s 
begin
end
   8,001,343,308 bytes allocated in the heap
     877,696,096 bytes copied during GC
          46,628 bytes maximum residency (861 sample(s))
          33,196 bytes maximum slop
            2 MB total memory in use (0 MB lost due to fragmentation)

Generation 0: 14345 collections,     0 parallel,  3.32s,  3.38s elapsed
Generation 1:   861 collections,     0 parallel,  0.08s,  0.08s elapsed

使用这种严格的变压器的缺点是,你不能再定义 MonadFix 实例和某些懒惰的技巧不再起作用。

The downside of using such strict transformers is that you can no longer define MonadFix instances and certain laziness tricks no longer work.

这篇关于为什么包裹Data.Binary.Put单子造成内存泄漏? (第2部分)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆