Haskell懒惰ByteString +读/写进度函数 [英] Haskell Lazy ByteString + read/write progress function

查看：111 发布时间：2018/6/4 15:06:20 haskell io progress lazy-evaluation bytestring

本文介绍了Haskell懒惰ByteString +读/写进度函数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一个优雅的方式来复制大文件（8Gb），同时将复制进度打印到控制台。

请考虑以下简单的程序，它可以静默地复制文件。

  module Main 
 
导入系统
将合格的Data.ByteString.Lazy导入为B 
 
 main = do [from，to]<  -  getArgs 
 body< -  B.readFile from 
 B.writeFile to body

Imgine有一个回调函数要用于报告：

  onReadBytes :: Integer  - > IO（）
 onReadBytes count = putStrLn $Bytes read：++（show count）

<问题：如何将onReadBytes函数编织成Lazy ByteString，以便在成功读取时调用它？或者如果这种设计不好，那么Haskell的做法是什么？

注意：回调的频率并不重要，它可以被调用每1024字节或每1 MB - 不重要

答案：非常感谢camccann的答案。我建议完全阅读它。

Bellow是基于camccann代码的代码版本，您可能会觉得它很有用。
模块Main其中导入系统导入System.IO 将限定的Data.ByteString.Lazy导入为B main = do [from，to]< - getArgs withFile from ReadMode $ \fromH - > withFile写入模式$ \toH - > copyH fromH toH $ \x - > putStrLn $复制的字节数：++ show x copyH :: Handle - >手柄 - > （整数 - > IO（）） - > IO（） copyH fromH toH onProgress = copy（B.hGet fromH（256 * 1024））（write toH）B.null onProgress where ox = do B.hPut ox 回报。 fromIntegral $ B.length x copy ::（Monad m）=> m a - > （a - > m整数） - > （a - > Bool） - > （整数 - > m（）） - > m（） copy = copy_ 0 copy_ ::（Monad m）=>整数 - > m a - > （a - > m整数） - > （a - > Bool） - > （整数 - > m（）） - > m（） copy_ count inp outp done onProgress = do x < - inp 除非（完成x）$ 执行n < - 输出x onProgress（n + count ） copy_（n + count）inp outp done onProgress

解决方案 [0] 上运行单个程序时不明显。另一方面，很多人对此也很满意，因为它只涉及一种非常有限的杂质。为了创建实际上使用按需I / O创建的惰性数据结构的错觉，像 readFile 是在幕后使用鬼祟的恶棍实施的。在按需I / O中编织是该函数的固有功能，并且由于与从中获取常规 ByteString 的错觉几乎相同的原因，它并不是真正可扩展的令人信服的。

Handwiving的细节和编写伪代码，像readFile基本上是这样工作的：

lazyInput inp = lazyIO（lazyInput'inp） lazyInput'inp = do x < - readFrom inp if（endOfInput inp）然后返回[] else do xs< - lazyInput inp return（x：xs）
.. 。每次调用 lazyIO 时，它都会延迟I / O直到实际使用该值。为了在每次实际读取时调用您的报告函数，您需要直接编写它，并且可以编写这种函数的通用版本，据我所知，没有任何存在。

鉴于上述情况，您可以选择以下几种选择：

查看惰性I / O功能，并实现自己的功能，包括进度报告功能。如果这感觉像一个肮脏的黑客，这是因为它很多，但你去。

放弃惰性I / O并切换到更明确的和可组合。这是Haskell社区整体的方向，特别是针对迭代器 a>，它给你很好的可组合的小流处理器构建块，具有更可预测的行为。缺点是这个概念还处于积极的发展阶段，所以没有达成一致意见或学习使用它们的单一起点。

放弃惰性I / O并切换到普通的常规I / O：编写一个读取块的 IO 操作，打印报告信息并尽可能多地处理输入;然后在循环中调用它直到完成。根据你对输入的处理方式以及你在处理中依赖懒惰的程度，这可能涉及到编写几个几乎不重要的函数来构建一堆有限状态机流处理器，％重新创建迭代的方式。

[0] 被称为 unsafeInterleaveIO ，并且据我所知，唯一的方法是从不同的输入中运行程序（在这种情况下，它有权表现出不同的行为，它可能只是以纯代码无意义的方式这样做），或者以某种方式更改代码（即，应该没有效果的重构可能具有非本地效果）。

下面是使用更多可组合函数来处理普通旧式常规I / O方式的粗略示例： / p>

导入系统导入System.IO 将合格的Data.ByteString.Lazy导入为B main = do [from，to]< - getArgs - withFile在操作完成后关闭我们的句柄 withFile from ReadMode $ \inH - > withFile写入模式$ \outH - > - 用适当的动作运行循环 runloop（B.hGet inH 128）（processBytes outH）B.null - 注意非常通用的类型;这很有用，因为它证明 - runloop函数只能执行它给出的内容，而不会执行其他任何 - 在我们背后。 runloop ::（Monad m）=> m a - > （a - > m（）） - > （a - > Bool） - > m（） runloop inp outp done = do x < - inp 如果已完成x 然后返回（） else dop x runloop inp outp done - 写入输出并向stdout报告进度。请注意，这可以很容易地被修改，或者由其他输出函数组成。 processBytes :: Handle - > B.ByteString - > IO（） processBytes h bs | B.null bs = return（） |否则=做onReadBytes（fromIntegral $ B.length bs） B.hPut h bs onReadBytes :: Integer - > IO（） onReadBytes count = putStrLn $Bytes read：++（show count）
128最多有一次要读取多少个字节。在我的Stack Overflow snippets目录中的一个随机源文件中运行：

$ runhaskell ReadBStr.hs Corec.hs temp 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：128 读取的字节数：83 $

I am learing Haskell Lazy IO.

I am looking for an elegant way to copy a large file (8Gb) while printing copy progress to console.

Consider the following simple program that copies a file silently.
module Main where import System import qualified Data.ByteString.Lazy as B main = do [from, to] <- getArgs body <- B.readFile from B.writeFile to body
Imgine there is a callback function you want to use for reporting:
onReadBytes :: Integer -> IO () onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)
QUESTION: how to weave onReadBytes function into Lazy ByteString so it will be called back on successfull read? Or if this design is not good, then what is the Haskell way to do it?

NOTE: the frequency of callback is not important, it can be called every 1024 bytes or every 1 Mb -- not important

ANSWER: Many thanks to camccann for the answer. I suggest to read it entirely.

Bellow is my version of the code based on camccann's code, you may find it useful.
module Main where import System import System.IO import qualified Data.ByteString.Lazy as B main = do [from, to] <- getArgs withFile from ReadMode $ \fromH -> withFile to WriteMode $ \toH -> copyH fromH toH $ \x -> putStrLn $ "Bytes copied: " ++ show x copyH :: Handle -> Handle -> (Integer -> IO()) -> IO () copyH fromH toH onProgress = copy (B.hGet fromH (256 * 1024)) (write toH) B.null onProgress where write o x = do B.hPut o x return . fromIntegral $ B.length x copy :: (Monad m) => m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m() copy = copy_ 0 copy_ :: (Monad m) => Integer -> m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m() copy_ count inp outp done onProgress = do x <- inp unless (done x) $ do n <- outp x onProgress (n + count) copy_ (n + count) inp outp done onProgress

解决方案
First, I'd like to note that a fair number of Haskell programmers regard lazy IO in general with some suspicion. It technically violates purity, but in a limited way that (as far as I'm aware) isn't noticeable when running a single program on consistent input^[0]. On the other hand, plenty of people are fine with it, again because it involves only a very restricted kind of impurity.

To create the illusion of a lazy data structure that's actually created with on-demand I/O, functions like readFile are implemented using sneaky shenanigans behind the scenes. Weaving in the on-demand I/O is inherent to the function, and it's not really extensible for pretty much the same reasons that the illusion of getting a regular ByteString from it is convincing.

Handwaving the details and writing pseudocode, something like readFile basically works like this:
lazyInput inp = lazyIO (lazyInput' inp) lazyInput' inp = do x <- readFrom inp if (endOfInput inp) then return [] else do xs <- lazyInput inp return (x:xs)
...where each time lazyIO is called, it defers the I/O until the value is actually used. To invoke your reporting function each time the actual read occurs, you'd need to weave it in directly, and while a generalized version of such a function could be written, to my knowledge none exist.

Given the above, you have a few options:

Look up the implementation of the lazy I/O functions you're using, and implement your own that include the progress reporting function. If this feels like a dirty hack, that's because it pretty much is, but there you go.

Abandon lazy I/O and switch to something more explicit and composable. This is the direction that the Haskell community as a whole seems to be heading in, specifically toward some variation on Iteratees, which give you nicely composable little stream processor building blocks that have more predictable behavior. The downside is that the concept is still under active development so there's no consensus on implementation or single starting point for learning to use them.

Abandon lazy I/O and switch to plain old regular I/O: Write an IO action that reads a chunk, prints the reporting info, and processes as much input as it can; then invoke it in a loop until done. Depending on what you're doing with the input and how much you're relying on laziness in your processing, this could involve anything from writing a couple nearly-trivial functions to building a bunch of finite-state-machine stream processors and getting 90% of the way to reinventing Iteratees.

[0]: The underlying function here is called unsafeInterleaveIO, and to the best of my knowledge the only ways to observe impurity from it require either running the program on different input (in which case it's entitled to behave differently anyhow, it just may be doing so in ways that don't make sense in pure code), or changing the code in certain ways (i.e., refactorings that should have no effect can have non-local effects).

Here's a rough example of doing things the "plain old regular I/O" way, using more composable functions:
import System import System.IO import qualified Data.ByteString.Lazy as B main = do [from, to] <- getArgs -- withFile closes the handle for us after the action completes withFile from ReadMode $ \inH -> withFile to WriteMode $ \outH -> -- run the loop with the appropriate actions runloop (B.hGet inH 128) (processBytes outH) B.null -- note the very generic type; this is useful, because it proves that the -- runloop function can only execute what it's given, not do anything else -- behind our backs. runloop :: (Monad m) => m a -> (a -> m ()) -> (a -> Bool) -> m () runloop inp outp done = do x <- inp if done x then return () else do outp x runloop inp outp done -- write the output and report progress to stdout. note that this can be easily -- modified, or composed with other output functions. processBytes :: Handle -> B.ByteString -> IO () processBytes h bs | B.null bs = return () | otherwise = do onReadBytes (fromIntegral $ B.length bs) B.hPut h bs onReadBytes :: Integer -> IO () onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)
The "128" up there is how many bytes to read at a time. Running this on a random source file in my "Stack Overflow snippets" directory:
$ runhaskell ReadBStr.hs Corec.hs temp Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 128 Bytes read: 83 $

这篇关于Haskell懒惰ByteString +读/写进度函数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Haskell懒惰ByteString +读/写进度函数 [英] Haskell Lazy ByteString + read/write progress function

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Haskell懒惰ByteString +读/写进度函数 [英] Haskell Lazy ByteString + read/write progress function

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭