Haskell惰性I / O和关闭文件 [英] Haskell lazy I/O and closing files

查看:76
本文介绍了Haskell惰性I / O和关闭文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个小的Haskell程序来打印当前目录中所有文件的MD5校验和(递归搜索)。基本上是一个Haskell版本的 md5deep 。除非当前目录有大量文件,否则一切都很好,并且很花哨,在这种情况下,我得到如下错误:

 < program> ;:< currentFile> ;: openBinaryFile:资源耗尽(打开文件过多)

看起来Haskell的懒惰导致它不会关闭文件,即使在相应的输出行完成之后。



相关代码如下。感兴趣的函数是 getList

 导入限定的Data.ByteString。 Lazy as BS 

main :: IO()
main = putStr。 unlines =<< getList。

getList :: FilePath - > IO [String]
getList p =
let getFileLine path = liftM(\ c - >(hex $ hash $ BS.unpack c)++++ path)(BS.readFile path )mapM中的
getFileLine =<< getRecursiveContents p

十六进制:: [Word8] - > String
hex = concatMap(\ x - > printf%0.2x(toInteger x))

getRecursiveContents :: FilePath - > IO [FilePath]
- ^获取给定目录中所有文件的路径。

有没有关于如何解决这个问题的建议?

整个计划可在这里找到: http://haskell.pastebin.com/PAZm0Dcb

编辑:我有很多不适合RAM的文件,所以我没有找到一个解决方案,整个文件一次写入内存。

解决方案

懒惰IO非常容易出错。



正如dons所建议的那样,您应该使用严格的IO。



您可以使用诸如Iteratee之类的工具来帮助您构造严格的IO代码。

  import Control.Monad.ListT(ListT) -  List 
这个工作最喜欢的工具是单子列表。导入Control.Monad.IO.Class(liftIO) - 变形金刚
导入Data.Binary(编码) - 二进制
导入Data.Digest.Pure.MD5 - pureMD5
导入数据。 List.Class(repeat,takeWhile,foldlL) - 列出
导入System.IO(IOMode(ReadMode),openFile,hClose)
将合格的Data.ByteString.Lazy导入为BS
导入前导隐藏(repeat,takeWhile)

hashFile :: FilePath - > IO BS.ByteString
hashFile =
fmap(encode。md5Finalize)。 foldlL md5Update md5InitialContext。 strictReadFileChunks 1024

strictReadFileChunks :: Int - > FilePath - > ListT IO BS.ByteString
strictReadFileChunks chunkSize filename =
takeWhile(不是。BS.null)$ do
句柄< - liftIO $ openFile文件名ReadMode
repeat() - this使循环下面的循环
chunk< - liftIO $ BS.hGet处理chunkSize
(BS.null chunk)时。 liftIO $ hClose handle
return chunk

我在这里使用了pureMD5包,因为Crypto 似乎并没有提供streamingmd5实现。



Monadic列表/ ListT 来自列表软件包(变形金刚和mtl的 ListT )被破坏,也没有像 takeWhile

I've written a small Haskell program to print the MD5 checksums of all files in the current directory (searched recursively). Basically a Haskell version of md5deep. All is fine and dandy except if the current directory has a very large number of files, in which case I get an error like:

<program>: <currentFile>: openBinaryFile: resource exhausted (Too many open files)

It seems Haskell's laziness is causing it not to close files, even after its corresponding line of output has been completed.

The relevant code is below. The function of interest is getList.

import qualified Data.ByteString.Lazy as BS

main :: IO ()
main = putStr . unlines =<< getList "."

getList :: FilePath -> IO [String]
getList p =
    let getFileLine path = liftM (\c -> (hex $ hash $ BS.unpack c) ++ " " ++ path) (BS.readFile path)
    in mapM getFileLine =<< getRecursiveContents p

hex :: [Word8] -> String
hex = concatMap (\x -> printf "%0.2x" (toInteger x))

getRecursiveContents :: FilePath -> IO [FilePath]
-- ^ Just gets the paths to all the files in the given directory.

Are there any ideas on how I could solve this problem?

The entire program is available here: http://haskell.pastebin.com/PAZm0Dcb

Edit: I have plenty of files that don't fit into RAM, so I am not looking for a solution that reads the entire file into memory at once.

解决方案

Lazy IO is very bug-prone.

As dons suggested, you should use strict IO.

You can use a tool such as Iteratee to help you structure strict IO code. My favorite tool for this job is monadic lists.

import Control.Monad.ListT (ListT) -- List
import Control.Monad.IO.Class (liftIO) -- transformers
import Data.Binary (encode) -- binary
import Data.Digest.Pure.MD5 -- pureMD5
import Data.List.Class (repeat, takeWhile, foldlL) -- List
import System.IO (IOMode(ReadMode), openFile, hClose)
import qualified Data.ByteString.Lazy as BS
import Prelude hiding (repeat, takeWhile)

hashFile :: FilePath -> IO BS.ByteString
hashFile =
    fmap (encode . md5Finalize) . foldlL md5Update md5InitialContext . strictReadFileChunks 1024

strictReadFileChunks :: Int -> FilePath -> ListT IO BS.ByteString
strictReadFileChunks chunkSize filename =
    takeWhile (not . BS.null) $ do
        handle <- liftIO $ openFile filename ReadMode
        repeat () -- this makes the lines below loop
        chunk <- liftIO $ BS.hGet handle chunkSize
        when (BS.null chunk) . liftIO $ hClose handle
        return chunk

I used the "pureMD5" package here because "Crypto" doesn't seem to offer a "streaming" md5 implementation.

Monadic lists/ListT come from the "List" package on hackage (transformers' and mtl's ListT are broken and also don't come with useful functions like takeWhile)

这篇关于Haskell惰性I / O和关闭文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆