Haskell Lazy ByteString + 读写进度函数 [英] Haskell Lazy ByteString + read/write progress function

查看:16
本文介绍了Haskell Lazy ByteString + 读写进度函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在学习 Haskell Lazy IO.

I am learing Haskell Lazy IO.

我正在寻找一种优雅的方式来复制大文件 (8Gb),同时将复制进度打印到控制台.

I am looking for an elegant way to copy a large file (8Gb) while printing copy progress to console.

考虑以下简单的程序,它可以静默复制文件.

Consider the following simple program that copies a file silently.

module Main where

import System
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          body <- B.readFile from
          B.writeFile to body

想象一下你想用一个回调函数来报告:

Imgine there is a callback function you want to use for reporting:

onReadBytes :: Integer -> IO ()
onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)

问题: 如何将 onReadBytes 函数编入 Lazy ByteString 以便在成功读取时回调?或者如果这个设计不好,那么Haskell的方式是什么?

QUESTION: how to weave onReadBytes function into Lazy ByteString so it will be called back on successfull read? Or if this design is not good, then what is the Haskell way to do it?

注意:回调的频率并不重要,可以每 1024 字节或每 1 Mb 调用一次 -- 不重要

NOTE: the frequency of callback is not important, it can be called every 1024 bytes or every 1 Mb -- not important

答案: 非常感谢 camccann 的回答.我建议通读一遍.

ANSWER: Many thanks to camccann for the answer. I suggest to read it entirely.

Bellow 是我基于 camccann 代码的代码版本,您可能会发现它很有用.

Bellow is my version of the code based on camccann's code, you may find it useful.

module Main where

import System
import System.IO
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          withFile from ReadMode $ fromH ->
            withFile to WriteMode $ 	oH ->
              copyH fromH toH $ x -> putStrLn $ "Bytes copied: " ++ show x

copyH :: Handle -> Handle -> (Integer -> IO()) -> IO ()
copyH fromH toH onProgress =
    copy (B.hGet fromH (256 * 1024)) (write toH) B.null onProgress
    where write o x  = do B.hPut o x
                          return . fromIntegral $ B.length x

copy :: (Monad m) => m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m()
copy = copy_ 0

copy_ :: (Monad m) => Integer -> m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m()
copy_ count inp outp done onProgress = do x <- inp
                                          unless (done x) $
                                            do n <- outp x
                                               onProgress (n + count)
                                               copy_ (n + count) inp outp done onProgress

推荐答案

首先,我想指出,相当多的 Haskell 程序员通常对惰性 IO 持怀疑态度.它在技术上违反了纯度,但以有限的方式(据我所知)在一致的输入上运行单个程序时并不明显[0].另一方面,很多人对它很好,同样是因为它只涉及一种非常有限的杂质.

First, I'd like to note that a fair number of Haskell programmers regard lazy IO in general with some suspicion. It technically violates purity, but in a limited way that (as far as I'm aware) isn't noticeable when running a single program on consistent input[0]. On the other hand, plenty of people are fine with it, again because it involves only a very restricted kind of impurity.

为了营造一种实际使用按需 I/O 创建的惰性数据结构的错觉,像 readFile 这样的函数是在幕后使用狡猾的诡计来实现的.在按需 I/O 中编织是该函数所固有的,并且它并不是真正可扩展的,其原因几乎与从它获得常规 ByteString 的错觉是令人信服的相同.

To create the illusion of a lazy data structure that's actually created with on-demand I/O, functions like readFile are implemented using sneaky shenanigans behind the scenes. Weaving in the on-demand I/O is inherent to the function, and it's not really extensible for pretty much the same reasons that the illusion of getting a regular ByteString from it is convincing.

手动处理细节并编写伪代码,像 readFile 这样的东西基本上是这样工作的:

Handwaving the details and writing pseudocode, something like readFile basically works like this:

lazyInput inp = lazyIO (lazyInput' inp)
lazyInput' inp = do x <- readFrom inp
                    if (endOfInput inp)
                        then return []
                        else do xs <- lazyInput inp
                                return (x:xs)

...每次调用 lazyIO 时,它都会推迟 I/O,直到实际使用该值.要在每次实际读取时调用您的报告函数,您需要直接将其编入,虽然可以编写此类函数的通用版本,但据我所知,不存在.

...where each time lazyIO is called, it defers the I/O until the value is actually used. To invoke your reporting function each time the actual read occurs, you'd need to weave it in directly, and while a generalized version of such a function could be written, to my knowledge none exist.

鉴于上述情况,您有几个选择:

Given the above, you have a few options:

  • 查找您正在使用的惰性 I/O 函数的实现,并实现您自己的,包括进度报告功能.如果这感觉像一个肮脏的黑客,那是因为它几乎是,但你去.

  • Look up the implementation of the lazy I/O functions you're using, and implement your own that include the progress reporting function. If this feels like a dirty hack, that's because it pretty much is, but there you go.

放弃懒惰的 I/O 并切换到更明确和可组合的东西.这是整个 Haskell 社区似乎正在朝着的方向发展,特别是针对 Iteratees,它为您提供了可组合的小型流处理器构建块,这些构建块具有更可预测的行为.缺点是该概念仍在积极开发中,因此在实施或学习使用它们的单一起点方面没有达成共识.

Abandon lazy I/O and switch to something more explicit and composable. This is the direction that the Haskell community as a whole seems to be heading in, specifically toward some variation on Iteratees, which give you nicely composable little stream processor building blocks that have more predictable behavior. The downside is that the concept is still under active development so there's no consensus on implementation or single starting point for learning to use them.

放弃懒惰的 I/O 并切换到普通的旧式常规 I/O:编写一个 IO 操作来读取块、打印报告信息并处理尽可能多的输入;然后在循环中调用它直到完成.根据您对输入的处理方式以及您在处理过程中对惰性的依赖程度,这可能涉及从编写几个几乎无关紧要的函数到构建一堆有限状态机流处理器和获得 90% 的方式来重塑迭代器.

Abandon lazy I/O and switch to plain old regular I/O: Write an IO action that reads a chunk, prints the reporting info, and processes as much input as it can; then invoke it in a loop until done. Depending on what you're doing with the input and how much you're relying on laziness in your processing, this could involve anything from writing a couple nearly-trivial functions to building a bunch of finite-state-machine stream processors and getting 90% of the way to reinventing Iteratees.

[0]:这里的底层函数称为unsafeInterleaveIO,据我所知,从中观察杂质的唯一方法需要运行程序不同的输入(在这种情况下,无论如何它都有权以不同的方式表现,它只是可能以在纯代码中没有意义的方式这样做),或以某些方式更改代码(即,本应无效的重构可能会产生非局部效应).

[0]: The underlying function here is called unsafeInterleaveIO, and to the best of my knowledge the only ways to observe impurity from it require either running the program on different input (in which case it's entitled to behave differently anyhow, it just may be doing so in ways that don't make sense in pure code), or changing the code in certain ways (i.e., refactorings that should have no effect can have non-local effects).

这里有一个粗略的例子,它使用更多可组合的函数,以普通的常规 I/O"方式做事:

Here's a rough example of doing things the "plain old regular I/O" way, using more composable functions:

import System
import System.IO
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          -- withFile closes the handle for us after the action completes
          withFile from ReadMode $ inH ->
            withFile to WriteMode $ outH ->
                -- run the loop with the appropriate actions
                runloop (B.hGet inH 128) (processBytes outH) B.null

-- note the very generic type; this is useful, because it proves that the
-- runloop function can only execute what it's given, not do anything else
-- behind our backs.
runloop :: (Monad m) => m a -> (a -> m ()) -> (a -> Bool) -> m ()
runloop inp outp done = do x <- inp
                           if done x
                             then return ()
                             else do outp x
                                     runloop inp outp done

-- write the output and report progress to stdout. note that this can be easily
-- modified, or composed with other output functions.
processBytes :: Handle -> B.ByteString -> IO ()
processBytes h bs | B.null bs = return ()
                  | otherwise = do onReadBytes (fromIntegral $ B.length bs)
                                   B.hPut h bs

onReadBytes :: Integer -> IO ()
onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)

上面的128"是一次读取的字节数.在我的Stack Overflow snippets"目录中的随机源文件上运行它:

The "128" up there is how many bytes to read at a time. Running this on a random source file in my "Stack Overflow snippets" directory:

$ runhaskell ReadBStr.hs Corec.hs temp
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 83
$

这篇关于Haskell Lazy ByteString + 读写进度函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆