Haskell懒惰ByteString +读/写进度函数 [英] Haskell Lazy ByteString + read/write progress function

查看:111
本文介绍了Haskell懒惰ByteString +读/写进度函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我正在寻找一个优雅的方式来复制大文件(8Gb),同时将复制进度打印到控制台。

>

请考虑以下简单的程序,它可以静默地复制文件。

  module Main 

导入系统
将合格的Data.ByteString.Lazy导入为B

main = do [from,to]< - getArgs
body< - B.readFile from
B.writeFile to body

Imgine有一个回调函数要用于报告:

  onReadBytes :: Integer  - > IO()
onReadBytes count = putStrLn $Bytes read:++(show count)



<问题:如何将onReadBytes函数编织成Lazy ByteString,以便在成功读取时调用它?或者如果这种设计不好,那么Haskell的做法是什么?


注意:回调的频率并不重要,它可以被调用每1024字节或每1 MB - 不重要

答案:非常感谢camccann的答案。我建议完全阅读它。



Bellow是基于camccann代码的代码版本,您可能会觉得它很有用。

 模块Main其中

导入系统
导入System.IO
将限定的Data.ByteString.Lazy导入为B

main = do [from,to]< - getArgs
withFile from ReadMode $ \fromH - >
withFile写入模式$ \toH - >
copyH fromH toH $ \x - > putStrLn $复制的字节数:++ show x

copyH :: Handle - >手柄 - > (整数 - > IO()) - > IO()
copyH fromH toH onProgress =
copy(B.hGet fromH(256 * 1024))(write toH)B.null onProgress
where ox = do B.hPut ox
回报。 fromIntegral $ B.length x

copy ::(Monad m)=> m a - > (a - > m整数) - > (a - > Bool) - > (整数 - > m()) - > m()
copy = copy_ 0

copy_ ::(Monad m)=>整数 - > m a - > (a - > m整数) - > (a - > Bool) - > (整数 - > m()) - > m()
copy_ count inp outp done onProgress = do x < - inp
除非(完成x)$
执行n < - 输出x
onProgress(n + count )
copy_(n + count)inp outp done onProgress


解决方案 [0] 上运行单个程序时不明显。另一方面,很多人对此也很满意,因为它只涉及一种非常有限的杂质。为了创建实际上使用按需I / O创建的惰性数据结构的错觉,像 readFile 是在幕后使用鬼祟的恶棍实施的。在按需I / O中编织是该函数的固有功能,并且由于与从中获取常规 ByteString 的错觉几乎相同的原因,它并不是真正可扩展的令人信服的。

Handwiving的细节和编写伪代码,像readFile基本上是这样工作的:

  lazyInput inp = lazyIO(lazyInput'inp)
lazyInput'inp = do x < - readFrom inp
if(endOfInput inp)
然后返回[]
else do xs< - lazyInput inp
return(x:xs)

.. 。每次调用 lazyIO 时,它都会延迟I / O直到实际使用该值。为了在每次实际读取时调用您的报告函数,您需要直接编写它,并且可以编写这种函数的通用版本,据我所知,没有任何存在。



鉴于上述情况,您可以选择以下几种选择:



[0] 被称为 unsafeInterleaveIO ,并且据我所知,唯一的方法是从不同的输入中运行程序(在这种情况下,它有权表现出不同的行为,它可能只是以纯代码无意义的方式这样做),或者以某种方式更改代码(即,应该没有效果的重构可能具有非本地效果)。




下面是使用更多可组合函数来处理普通旧式常规I / O方式的粗略示例: / p>

 导入系统
导入System.IO
将合格的Data.ByteString.Lazy导入为B

main = do [from,to]< - getArgs
- withFile在操作完成后关闭我们的句柄
withFile from ReadMode $ \inH - >
withFile写入模式$ \outH - >
- 用适当的动作运行循环
runloop(B.hGet inH 128)(processBytes outH)B.null

- 注意非常通用的类型;这很有用,因为它证明
- runloop函数只能执行它给出的内容,而不会执行其他任何
- 在我们背后。
runloop ::(Monad m)=> m a - > (a - > m()) - > (a - > Bool) - > m()
runloop inp outp done = do x < - inp
如果已完成x
然后返回()
else dop x
runloop inp outp done

- 写入输出并向stdout报告进度。请注意,这可以很容易地被
修改,或者由其他输出函数组成。
processBytes :: Handle - > B.ByteString - > IO()
processBytes h bs | B.null bs = return()
|否则=做onReadBytes(fromIntegral $ B.length bs)
B.hPut h bs

onReadBytes :: Integer - > IO()
onReadBytes count = putStrLn $Bytes read:++(show count)

128最多有一次要读取多少个字节。在我的Stack Overflow snippets目录中的一个随机源文件中运行:

  $ runhaskell ReadBStr.hs Corec.hs temp 
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:128
读取的字节数:83
$


I am learing Haskell Lazy IO.

I am looking for an elegant way to copy a large file (8Gb) while printing copy progress to console.

Consider the following simple program that copies a file silently.

module Main where

import System
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          body <- B.readFile from
          B.writeFile to body

Imgine there is a callback function you want to use for reporting:

onReadBytes :: Integer -> IO ()
onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)

QUESTION: how to weave onReadBytes function into Lazy ByteString so it will be called back on successfull read? Or if this design is not good, then what is the Haskell way to do it?

NOTE: the frequency of callback is not important, it can be called every 1024 bytes or every 1 Mb -- not important

ANSWER: Many thanks to camccann for the answer. I suggest to read it entirely.

Bellow is my version of the code based on camccann's code, you may find it useful.

module Main where

import System
import System.IO
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          withFile from ReadMode $ \fromH ->
            withFile to WriteMode $ \toH ->
              copyH fromH toH $ \x -> putStrLn $ "Bytes copied: " ++ show x

copyH :: Handle -> Handle -> (Integer -> IO()) -> IO ()
copyH fromH toH onProgress =
    copy (B.hGet fromH (256 * 1024)) (write toH) B.null onProgress
    where write o x  = do B.hPut o x
                          return . fromIntegral $ B.length x

copy :: (Monad m) => m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m()
copy = copy_ 0

copy_ :: (Monad m) => Integer -> m a -> (a -> m Integer) -> (a -> Bool) -> (Integer -> m()) -> m()
copy_ count inp outp done onProgress = do x <- inp
                                          unless (done x) $
                                            do n <- outp x
                                               onProgress (n + count)
                                               copy_ (n + count) inp outp done onProgress

解决方案

First, I'd like to note that a fair number of Haskell programmers regard lazy IO in general with some suspicion. It technically violates purity, but in a limited way that (as far as I'm aware) isn't noticeable when running a single program on consistent input[0]. On the other hand, plenty of people are fine with it, again because it involves only a very restricted kind of impurity.

To create the illusion of a lazy data structure that's actually created with on-demand I/O, functions like readFile are implemented using sneaky shenanigans behind the scenes. Weaving in the on-demand I/O is inherent to the function, and it's not really extensible for pretty much the same reasons that the illusion of getting a regular ByteString from it is convincing.

Handwaving the details and writing pseudocode, something like readFile basically works like this:

lazyInput inp = lazyIO (lazyInput' inp)
lazyInput' inp = do x <- readFrom inp
                    if (endOfInput inp)
                        then return []
                        else do xs <- lazyInput inp
                                return (x:xs)

...where each time lazyIO is called, it defers the I/O until the value is actually used. To invoke your reporting function each time the actual read occurs, you'd need to weave it in directly, and while a generalized version of such a function could be written, to my knowledge none exist.

Given the above, you have a few options:

[0]: The underlying function here is called unsafeInterleaveIO, and to the best of my knowledge the only ways to observe impurity from it require either running the program on different input (in which case it's entitled to behave differently anyhow, it just may be doing so in ways that don't make sense in pure code), or changing the code in certain ways (i.e., refactorings that should have no effect can have non-local effects).


Here's a rough example of doing things the "plain old regular I/O" way, using more composable functions:

import System
import System.IO
import qualified Data.ByteString.Lazy as B

main = do [from, to] <- getArgs
          -- withFile closes the handle for us after the action completes
          withFile from ReadMode $ \inH ->
            withFile to WriteMode $ \outH ->
                -- run the loop with the appropriate actions
                runloop (B.hGet inH 128) (processBytes outH) B.null

-- note the very generic type; this is useful, because it proves that the
-- runloop function can only execute what it's given, not do anything else
-- behind our backs.
runloop :: (Monad m) => m a -> (a -> m ()) -> (a -> Bool) -> m ()
runloop inp outp done = do x <- inp
                           if done x
                             then return ()
                             else do outp x
                                     runloop inp outp done

-- write the output and report progress to stdout. note that this can be easily
-- modified, or composed with other output functions.
processBytes :: Handle -> B.ByteString -> IO ()
processBytes h bs | B.null bs = return ()
                  | otherwise = do onReadBytes (fromIntegral $ B.length bs)
                                   B.hPut h bs

onReadBytes :: Integer -> IO ()
onReadBytes count = putStrLn $ "Bytes read: " ++ (show count)

The "128" up there is how many bytes to read at a time. Running this on a random source file in my "Stack Overflow snippets" directory:

$ runhaskell ReadBStr.hs Corec.hs temp
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 128
Bytes read: 83
$

这篇关于Haskell懒惰ByteString +读/写进度函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆