二进制序列化的哈斯克尔未定义长度名单 [英] Binary Serialization for Lists of Undefined Length in Haskell

查看:140
本文介绍了二进制序列化的哈斯克尔未定义长度名单的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用Data.Binary序列化的数据文件。在我的应用程序逐步将项目添加到这些文件。两种最流行的序列化封装,二进制和谷物,既序列名单作为计数后跟列表项。正因为如此,我不能附加到我的序列化文件。我目前在整个文件中读取序列化的列表,添加到列表中,重新序列名单,并写回该文件。但是,我的数据集越来越大,我开始耗尽内存。我大概可以去走一走拆箱我的数据结构来获得一些空间,但这种做法没有规模。

I've been using Data.Binary to serialize data to files. In my application I incrementally add items to these files. The two most popular serialization packages, binary and cereal, both serialize lists as a count followed by the list items. Because of this, I can't append to my serialized files. I currently read in the whole file, deserialize the list, append to the list, re-serialize the list, and write it back out to the file. However, my data set is getting large and I'm starting to run out of memory. I could probably go around unboxing my data structures to gain some space, but that approach doesn't scale.

一个解决办法是踏踏实实脏文件格式更改初始计数,然后就追加我的元素。但是,这并不是非常令人满意,更何况是在文件格式打破了抽象的结果,未来的变化很敏感。 Iteratees /普查员想到这里有吸引力的选择。我看着他们一个二进制序列组合库,但没有发现任何东西。任何人都知道这已经被做了什么?如果没有,会为这个图书馆是有用的?还是我失去了一些东西?

One solution would be to get down and dirty with the file format to change the initial count, then just append my elements. But that's not very satisfying, not to mention being sensitive to future changes in the file format as a result of breaking the abstraction. Iteratees/Enumerators come to mind as an attractive option here. I looked for a library combining them with a binary serialization, but didn't find anything. Anyone know if this has been done already? If not, would a library for this be useful? Or am I missing something?

推荐答案

所以我说坚持用 Data.Binary 但是写可扩展的名单一​​个新的实例。下面是目前的(严格)例如:

So I say stick with Data.Binary but write a new instance for growable lists. Here's the current (strict) instance:

instance Binary a => Binary [a] where
    put l  = put (length l) >> mapM_ put l
    get    = do n <- get :: Get Int
                getMany n

-- | 'getMany n' get 'n' elements in order, without blowing the stack.
getMany :: Binary a => Int -> Get [a]
getMany n = go [] n
 where
    go xs 0 = return $! reverse xs
    go xs i = do x <- get
                 x `seq` go (x:xs) (i-1)
{-# INLINE getMany #-}

现在,可以让你流(二进制)版本附加到文件将需要急于或懒惰。懒惰的版本是最简单的。是这样的:

Now, a version that lets you stream (in binary) to append to a file would need to be eager or lazy. The lazy version is the most trivial. Something like:

import Data.Binary

newtype Stream a = Stream { unstream :: [a] }

instance Binary a => Binary (Stream a) where

    put (Stream [])     = putWord8 0
    put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)

    get = do
        t <- getWord8
        case t of
            0 -> return (Stream [])
            1 -> do x         <- get
                    Stream xs <- get
                    return (Stream (x:xs))

适当按摩的工作流。现在,处理默默追加,我们需要能够寻求到文件的末尾,覆盖最后的 0 标签,加入了更多的元素了。

Massaged appropriately works for streaming. Now, to handle silently appending, we'll need to be able to seek to the end of the file, and overwrite the final 0 tag, before adding more elements.

这篇关于二进制序列化的哈斯克尔未定义长度名单的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆