在Haskell中读取大型二进制文件的最快方法? [英] Fastest way to read large binary file in Haskell?
问题描述
我想处理一个太大而无法读入内存的二进制文件.目前,我使用 ByteString.Lazy.readFile 以流传输字节.我认为使用 streaming 软件包来制作程序是一个好主意.快点.但是,readFile
的文档表示:
I want to process a binary file that is too large to read into memory. Currently I use ByteString.Lazy.readFile to stream the bytes. I thought it would be a good idea to use the streaming package to make my program faster. However, the documentation for readFile
says:
readFile :: FilePath -> (Stream (Of String) IO () -> IO a) -> IO a
使用类型为'Stream(Of String)IO()-> IO a'的函数读取文件的行,以将流转换为'IO a'类型的值.
Read the lines of a file, using a function of the type: 'Stream (Of String) IO () -> IO a' to turn the stream into a value of type 'IO a'.
因此,streaming
包仅读取ASCII文本文件?我可以使用该程序包以字节为单位读取二进制文件吗?
So the streaming
package only reads ASCII text files? Can I use this package to read a binary file as bytes?
推荐答案
为了详细说明@Cubic的评论,尽管人们普遍认为应该在生产代码中避免使用惰性I/O,而应将其替换为流方法,但这是与无关.如果您正在编写程序来对大文件进行一次性处理,只要您现在可以正常运行一个懒惰的I/O版本,那么可能就没有很好的 performance 理由来转换它了转到流媒体包.
To elaborate on @Cubic's comment, while there's a general consensus that lazy I/O should be avoided in production code and replaced with a streaming approach, this is not directly related to performance. If you're writing a program to do some one-off processing of a large file, as long as you have a lazy I/O version running fine now, there's probably no good performance reason to convert it over to a streaming package.
实际上,流传输更有可能增加一些开销,因此,我怀疑在大多数情况下,优化的惰性I/O解决方案的性能将不及优化的流传输解决方案.
In fact, streaming is more likely to add some overhead, so I suspect that a well optimized lazy I/O solution would out-perform a well optimized streaming solution, in most cases.
The main reasons for avoiding Lazy I/O have been previously discussed on SO. In a nutshell, lazy I/O makes it difficult to consistently manage resources (e.g., file handles and network sockets), makes it hard to reason about space usage (e.g., a small program change can cause your memory usage to explode), and is occasionally "unsafe" if the timing and ordering of the I/O in question matters (usually not a problem if you're just reading in one set of files and/or writing out another set of files).
用于读取和/或写入大文件的短时间运行实用程序可能是以惰性I/O风格编写的不错的候选者.只要它们在运行时没有任何明显的空间泄漏,就可以了.
Short-running utility programs for reading and/or writing large files are probably good candidates to be written in a lazy I/O style. As long as they don't have any obvious space leaks when they're run, they're probably fine.
这篇关于在Haskell中读取大型二进制文件的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!