在Haskell中读取大型二进制文件的最快方法? [英] Fastest way to read large binary file in Haskell?

查看:78
本文介绍了在Haskell中读取大型二进制文件的最快方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想处理一个太大而无法读入内存的二进制文件.目前,我使用 ByteString.Lazy.readFile 以流传输字节.我认为使用 streaming 软件包来制作程序是一个好主意.快点.但是,readFile文档表示:

I want to process a binary file that is too large to read into memory. Currently I use ByteString.Lazy.readFile to stream the bytes. I thought it would be a good idea to use the streaming package to make my program faster. However, the documentation for readFile says:

readFile :: FilePath -> (Stream (Of String) IO () -> IO a) -> IO a

使用类型为'Stream(Of String)IO()-> IO a'的函数读取文件的行,以将流转换为'IO a'类型的值.

Read the lines of a file, using a function of the type: 'Stream (Of String) IO () -> IO a' to turn the stream into a value of type 'IO a'.

因此,streaming包仅读取ASCII文本文件?我可以使用该程序包以字节为单位读取二进制文件吗?

So the streaming package only reads ASCII text files? Can I use this package to read a binary file as bytes?

推荐答案

为了详细说明@Cubic的评论,尽管人们普遍认为应该在生产代码中避免使用惰性I/O,而应将其替换为流方法,但这是无关.如果您正在编写程序来对大文件进行一次性处理,只要您现在可以正常运行一个懒惰的I/O版本,那么可能就没有很好的 performance 理由来转换它了转到流媒体包.

To elaborate on @Cubic's comment, while there's a general consensus that lazy I/O should be avoided in production code and replaced with a streaming approach, this is not directly related to performance. If you're writing a program to do some one-off processing of a large file, as long as you have a lazy I/O version running fine now, there's probably no good performance reason to convert it over to a streaming package.

实际上,流传输更有可能增加一些开销,因此,我怀疑在大多数情况下,优化的惰性I/O解决方案的性能将不及优化的流传输解决方案.

In fact, streaming is more likely to add some overhead, so I suspect that a well optimized lazy I/O solution would out-perform a well optimized streaming solution, in most cases.

避免使用惰性I/O的主要原因是以前在SO上进行了讨论.简而言之,懒惰的I/O使得难以一致地管理资源(例如,文件句柄和网络套接字),使其难以推断出空间使用情况(例如,小的程序更改可能会导致内存使用量爆炸),并且如果有问题的I/O的时间和顺序很重要,则有时是不安全的"(通常,如果您只是读入一组文件和/或写出另一组文件,这不是问题).

The main reasons for avoiding Lazy I/O have been previously discussed on SO. In a nutshell, lazy I/O makes it difficult to consistently manage resources (e.g., file handles and network sockets), makes it hard to reason about space usage (e.g., a small program change can cause your memory usage to explode), and is occasionally "unsafe" if the timing and ordering of the I/O in question matters (usually not a problem if you're just reading in one set of files and/or writing out another set of files).

用于读取和/或写入大文件的短时间运行实用程序可能是以惰性I/O风格编写的不错的候选者.只要它们在运行时没有任何明显的空间泄漏,就可以了.

Short-running utility programs for reading and/or writing large files are probably good candidates to be written in a lazy I/O style. As long as they don't have any obvious space leaks when they're run, they're probably fine.

这篇关于在Haskell中读取大型二进制文件的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆