作为Haskell中的单个IO操作,我可以懒惰地读取n个文件吗? [英] Can I read n files lazily as a single IO operation in Haskell?

查看:170
本文介绍了作为Haskell中的单个IO操作,我可以懒惰地读取n个文件吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何以恒定内存的方式将多个文件作为一个ByteString懒惰地读取?

  readFiles :: [FilePath]  - > IO ByteString 

我目前有以下实现,但是从我从分析中看到的以及我的理解我将以内存中的文件 n-1 结尾。

  readFiles = foldl1 joinIOStrings。 map ByteString.readFile 
where joinIOStrings ml mr = do
l < - ml
r < - mr
return $ l`ByteString.append` r

我明白这里的缺陷是我正在应用IO动作,然后重新包装它们,所以我认为我需要的是一种替代方法 foldl1 joinIOStrings 不应用它们。

解决方案


如何将多个文件作为一个单独的ByteString缓慢地读取常量内存?

如果您想要不断地使用内存, code> Data.ByteString.Lazy 。严格的 ByteString 不能被懒惰地读取,并且需要 O(filesizes的总和)内存。



对于不太多的文件,只读全部文件( DBLreadFile 懒惰地读取)并且连接结果很好,

 导入限定的Data.ByteString.Lazy为L 

readFiles :: [FilePath] - > IO L.ByteString
readFiles = fmap L.concat。 mapM L.readFile

mapM L.readFile 会打开这些文件,但是只在需要时读取每个文件的内容。



如果文件数量很大,那么打开文件句柄的限制操作系统所允许的单个进程可能会耗尽,您需要更复杂的东西。你可以制作你自己的懒惰版本的 mapM

  import System .IO.Unsafe(unsafeInterleaveIO)

mapM_lazy :: [IO a] - > IO [a]
mapM_lazy [] = return []
mapM_lazy(x:xs)= do
r< - x
rs< - unsafeInterleaveIO(mapM_lazy xs)
return(r:rs)

这样每个文件只有在需要其内容时才能打开,之前读取的文件可能已经关闭。由于关闭手柄的时间不能保证,所以仍有可能会遇到资源限制。

或者您可以使用您最喜爱的 iteratee enumerator conduit 或任何包以系统的方式解决问题。他们每个人都有其自己的优点和缺点,如果编码正确,消除了意外触及资源限制的可能性。


How can I read multiple files as a single ByteString lazily with constant memory?

readFiles :: [FilePath] -> IO ByteString

I currently have the following implementation but from what I have seen from profiling as well as my understanding I will end with n-1 of the files in memory.

readFiles = foldl1 joinIOStrings . map ByteString.readFile
    where joinIOStrings ml mr = do
                                l <- ml
                                r <- mr
                                return $ l `ByteString.append` r

I understand that the flaw here is that I am applying the IO actions then rewrapping them so what I think I need is a way to replace the foldl1 joinIOStrings without applying them.

解决方案

How can I read multiple files as a single ByteString lazily with constant memory?

If you want constant memory usage, you need Data.ByteString.Lazy. A strict ByteString cannot be read lazily, and would require O(sum of filesizes) memory.

For a not too large number of files, simply reading them all (D.B.L.readFile reads lazily) and concatenating the results is good,

import qualified Data.ByteString.Lazy as L

readFiles :: [FilePath] -> IO L.ByteString
readFiles = fmap L.concat . mapM L.readFile

The mapM L.readFile will open the files, but only read the contents of each file when it is demanded.

If the number of files is large, so that the limit of open file handles allowed by the OS for a single process could be exhausted, you need something more complicated. You can cook up your own lazy version of mapM,

import System.IO.Unsafe (unsafeInterleaveIO)

mapM_lazy :: [IO a] -> IO [a]
mapM_lazy [] = return []
mapM_lazy (x:xs) = do
              r <- x
              rs <- unsafeInterleaveIO (mapM_lazy xs)
              return (r:rs)

so that each file will only be opened when its contents are needed, when previously read files can already be closed. There's a slight possibility that that still runs into resource limits, since the time of closing the handles is not guaranteed.

Or you can use your favourite iteratee, enumerator, conduit or whatever package that solves the problem in a systematic way. Each of them has its own advantages and disadvantages with respect to the others and, if coded correctly, eliminates the possibility of accidentally hitting the resource limit.

这篇关于作为Haskell中的单个IO操作,我可以懒惰地读取n个文件吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆