用Data.Binary对列表进行延迟解码 [英] Lazy decoding of a list with Data.Binary
问题描述
我使用此代码懒惰地编码列表(摘自此 SO问题):
I am lazily encoding lists using this code (taken from this SO question):
import Data.Binary
newtype Stream a = Stream { unstream :: [a] }
instance Binary a => Binary (Stream a) where
put (Stream []) = putWord8 0
put (Stream (x:xs)) = putWord8 1 >> put x >> put (Stream xs)
问题是解码实现不是懒惰的:
The problem is that the decoding implementation is not lazy:
get = do
t <- getWord8
case t of
0 -> return (Stream [])
1 -> do x <- get
Stream xs <- get
return (Stream (x:xs))
这在我看来应该是懒惰的,但是如果我们运行这个测试代码:
This looks to me like it should be lazy, but if we run this test code:
head $ unstream (decode $ encode $ Stream [1..10000000::Integer] :: Stream Integer)
<内存使用情况爆炸。出于某种原因,它想先解码整个列表,然后再让我看看第一个元素。
memory usage explodes. For some reason it wants to decode the whole list before letting me look at the first element.
为什么这不是懒惰,我怎么能使它懒惰?因为获取
monad是一个严格的状态monad(p>
Why is this not lazy, and how I can make it lazy?
推荐答案
在 binary-0.5.0.2到0.5.1.1 ;这是一个懒惰的状态monad之前和 binary-0.6。* 它已经成为继续monad,我没有分析这种变化的严格影响):
It is not lazy because the Get
monad is a strict state monad (in binary-0.5.0.2 to 0.5.1.1; it was a lazy state monad before, and in binary-0.6.* it has become a continuation monad, I haven't analysed the strictness implications of that change):
-- | The parse state
data S = S {-# UNPACK #-} !B.ByteString -- current chunk
L.ByteString -- the rest of the input
{-# UNPACK #-} !Int64 -- bytes read
-- | The Get monad is just a State monad carrying around the input ByteString
-- We treat it as a strict state monad.
newtype Get a = Get { unGet :: S -> (# a, S #) }
-- Definition directly from Control.Monad.State.Strict
instance Monad Get where
return a = Get $ \s -> (# a, s #)
{-# INLINE return #-}
m >>= k = Get $ \s -> case unGet m s of
(# a, s' #) -> unGet (k a) s'
{-# INLINE (>>=) #-}
因此最终递归
thus the final recursive
get >>= \x ->
get >>= \(Stream xs) ->
return (Stream (x:xs))
强制整个 Stream
在它可以被返回之前被读取。
forces the entire Stream
to be read before it can be returned.
我不认为可以懒惰地解码 Stream
在获取
monad(所以不适用于 Binary
实例)。但你可以使用 runGetState
编写一个懒惰的解码函数:
I don't think it's possible to lazily decode a Stream
in the Get
monad (so a fortiori not with the Binary
instance). But you can write a lazy decoding function using runGetState
:
-- | Run the Get monad applies a 'get'-based parser on the input
-- ByteString. Additional to the result of get it returns the number of
-- consumed bytes and the rest of the input.
runGetState :: Get a -> L.ByteString -> Int64 -> (a, L.ByteString, Int64)
runGetState m str off =
case unGet m (mkState str off) of
(# a, ~(S s ss newOff) #) -> (a, s `join` ss, newOff)
首先写一个 Get
解析器返回一个也许一个
,
First write a Get
parser that returns a Maybe a
,
getMaybe :: Binary a => Get (Maybe a)
getMaybe = do
t <- getWord8
case t of
0 -> return Nothing
_ -> fmap Just get
然后使用它来创建类型的函数(ByteString, Int64) - >可能(a,(ByteString,Int64))
:
step :: Binary a => (ByteString,Int64) -> Maybe (a,(ByteString,Int64))
step (xs,offset) = case runGetState getMaybe xs offset of
(Just v, ys, newOffset) -> Just (v,(ys,newOffset))
_ -> Nothing
然后您可以使用 Data.List.unfoldr $ c
and then you can use Data.List.unfoldr
to lazily decode a list,
lazyDecodeList :: Binary a => ByteString -> [a]
lazyDecodeList xs = unfoldr step (xs,0)
a Stream
lazyDecodeStream :: Binary a => ByteString -> Stream a
lazyDecodeStream = Stream . lazyDecodeList
这篇关于用Data.Binary对列表进行延迟解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!