Haskell 中未定义长度列表的二进制序列化 [英] Binary Serialization for Lists of Undefined Length in Haskell

查看:18
本文介绍了Haskell 中未定义长度列表的二进制序列化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在使用 Data.Binary 将数据序列化为文件.在我的应用程序中,我逐渐将项目添加到这些文件中.两个最流行的序列化包,二进制和谷物,都将列表序列化为一个计数,然后是列表项.因此,我无法附加到我的序列化文件中.我目前读入整个文件,反序列化列表,附加到列表,重新序列化列表,然后将其写回文件.但是,我的数据集越来越大,而且我的内存开始耗尽.我可能会四处拆箱我的数据结构以获得一些空间,但这种方法无法扩展.

I've been using Data.Binary to serialize data to files. In my application I incrementally add items to these files. The two most popular serialization packages, binary and cereal, both serialize lists as a count followed by the list items. Because of this, I can't append to my serialized files. I currently read in the whole file, deserialize the list, append to the list, re-serialize the list, and write it back out to the file. However, my data set is getting large and I'm starting to run out of memory. I could probably go around unboxing my data structures to gain some space, but that approach doesn't scale.

一种解决方案是修改文件格式以更改初始计数,然后添加我的元素.但这并不是很令人满意,更不用说由于打破抽象而对文件格式的未来变化很敏感.Iteratees/Enumerators 在这里被认为是一个有吸引力的选择.我寻找了一个将它们与二进制序列化相结合的库,但没有找到任何东西.有谁知道这是否已经完成?如果没有,这个库会有用吗?还是我遗漏了什么?

One solution would be to get down and dirty with the file format to change the initial count, then just append my elements. But that's not very satisfying, not to mention being sensitive to future changes in the file format as a result of breaking the abstraction. Iteratees/Enumerators come to mind as an attractive option here. I looked for a library combining them with a binary serialization, but didn't find anything. Anyone know if this has been done already? If not, would a library for this be useful? Or am I missing something?

推荐答案

所以我说坚持使用 Data.Binary 但为可增长列表编写一个新实例.这是当前的(严格的)实例:

So I say stick with Data.Binary but write a new instance for growable lists. Here's the current (strict) instance:

instance Binary a => Binary [a] where
    put l  = put (length l) >> mapM_ put l
    get    = do n <- get :: Get Int
                getMany n

-- | 'getMany n' get 'n' elements in order, without blowing the stack.
getMany :: Binary a => Int -> Get [a]
getMany n = go [] n
 where
    go xs 0 = return $! reverse xs
    go xs i = do x <- get
                 x `seq` go (x:xs) (i-1)
{-# INLINE getMany #-}

现在,允许您以流式传输(以二进制形式)以附加到文件的版本需要急切或懒惰.懒惰的版本是最简单的.类似的东西:

Now, a version that lets you stream (in binary) to append to a file would need to be eager or lazy. The lazy version is the most trivial. Something like:

import Data.Binary

newtype Stream a = Stream { unstream :: [a] }

instance Binary a => Binary (Stream a) where

    put (Stream [])     = putWord8 0
    put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)

    get = do
        t <- getWord8
        case t of
            0 -> return (Stream [])
            1 -> do x         <- get
                    Stream xs <- get
                    return (Stream (x:xs))

Massaged 适用于流式传输.现在,要处理静默追加,我们需要能够找到文件末尾,并在添加更多元素之前覆盖最后的 0 标记.

Massaged appropriately works for streaming. Now, to handle silently appending, we'll need to be able to seek to the end of the file, and overwrite the final 0 tag, before adding more elements.

这篇关于Haskell 中未定义长度列表的二进制序列化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆