Haskell读/写二进制文件完整的工作示例 [英] Haskell read/write binary files complete working example

查看:119
本文介绍了Haskell读/写二进制文件完整的工作示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望如果有人给出了一个完整的工作代码,允许在Haskell中执行以下操作:


读取一个非常大的序列(比如10亿个元素)的32位
int值从二进制文件转换为适当的容器(例如
当然不是列表,性能问题),如果小于
, 1000(十进制),然后将生成的32位
int值写入另一个二进制文件。我可能不想一次读取内存中二进制文件的全部
内容。我想在之前阅读一个
块。


我很困惑,因为我可以找到有关这方面的很少的文档。 Data.Binary,ByteString,Word8和什么不是,它只是增加了混淆。在C / C ++中,这样的问题有非常直接的解决方案。获取所需大小的数组(例如unsigned int),并使用读/写库调用并完成它。在Haskell中,看起来并不那么容易,至少对我来说是这样。



如果您的解决方案使用最好的标准包,主流的Haskell(> GHC 7.10),而不是一些晦涩/过时的。



我从这些页面阅读



https://wiki.haskell.org/Binary_IO



https://wiki.haskell.org/Dealing_with_binary_data

解决方案

如果您在执行二进制I / O,您几乎肯定需要 ByteString code>用于实际的输入/输出部分。看看它提供的 hGet hPut 函数。 (或者,如果你只需要线性访问权限,你可以尝试使用惰性I / O,但很容易出错。)



当然,一个字节串只是一个字节数组;你的下一个问题是将这些字节解释为字符/整数/双精度/他们应该是的任何其他字节。这里有几个软件包,但是 Data.Binary 似乎是最主流的软件包。



binary 的文档似乎希望引导您使用 Binary 类,您可以在其中编写代码来串行化和反序列化整个对象。但是你可以使用 Data.Binary.Get Data.Binary.Put 处理个别项目。在这里你可以找到像 getWord32be (获得 Word32 big-endian)等功能。



我现在没有时间写一个可用的代码示例,但基本上看一下上面提到的函数,忽略其他所有函数,并且应该有一些想法。 / strike>



现在使用工作代码:

  module Main 

导入Data.Word
将合格的Data.ByteString.Lazy导入为BIN
导入Data.Binary.Get
导入Data.Binary.Put
导入Control.Monad
import System.IO

main = do
h_in< - openFileFoo.binReadMode
h_out< - openFileBar.bin WriteMode
replicateM 1000(process_chunk h_in h_out)
hClose h_in
hClose h_out

chunk_size = 1000
int_size = 4

process_chunk h_in h_out = do
bin1< - BIN.hGet h_in chunk_size
let ints1 = runGet(replicateM(chunk_size`div` int_size)getWord3 2le)bin1
let ints2 = map(\ x - >如果x < 1000然后2 * x else x)ints1
let bin2 = runPut(mapM_ putWord32le ints2)
BIN.hPut h_out bin2

我相信,这就是你要求的。它读取1000块 chunk_size 字节,将每个块转换为 Word32 的列表(因此它只有 chunk_size / 4 在内存中的整数),进行你指定的计算,很明显,如果你这样做了真正的,你会想要EOF检查等。


I wish if someone gives a complete working code that allows to do the following in Haskell:

Read a very large sequence (more than 1 billion elements) of 32-bit int values from a binary file into an appropriate container (e.g. certainly not a list, for performance issues) and doubling each number if it's less than 1000 (decimal) and then write the resulting 32-bit int values to another binary file. I may not want to read the entire contents of the binary file in the memory at once. I want to read one chunk after the previous.

I am confused because I could find very little documentation about this. Data.Binary, ByteString, Word8 and what not, it just adds to the confusion. There is pretty straight-forward solution to such problems in C/C++. Take an array (e.g. of unsigned int) of desired size, and use the read/write library calls and be done with it. In Haskell it didn't seem so easy, at least to me.

I'd appreciate if your solution uses the best possible standard packages that are available with mainstream Haskell (> GHC 7.10) and not some obscure/obsolete ones.

I read from these pages

https://wiki.haskell.org/Binary_IO

https://wiki.haskell.org/Dealing_with_binary_data

解决方案

If you're doing binary I/O, you almost certainly want ByteString for the actual input/output part. Have a look at the hGet and hPut functions it provides. (Or, if you only need strictly linear access, you can try using lazy I/O, but it's easy to get that wrong.)

Of course, a byte string is just an array of bytes; your next problem is interpreting those bytes as character / integers / doubles / whatever else they're supposed to be. There are a couple of packages for that, but Data.Binary seems to be the most mainstream one.

The documentation for binary seems to want to steer you towards using the Binary class, where you write code to serialise and deserialise whole objects. But you can use the functions in Data.Binary.Get and Data.Binary.Put to deal with individual items. There you will find functions such as getWord32be (get Word32 big-endian) and so forth.

I don't have time to write a working code example right now, but basically look at the functions I mention above and ignore everything else, and you should get some idea.

Now with working code:

module Main where

import Data.Word
import qualified Data.ByteString.Lazy as BIN
import Data.Binary.Get
import Data.Binary.Put
import Control.Monad
import System.IO

main = do
  h_in  <- openFile "Foo.bin" ReadMode
  h_out <- openFile "Bar.bin" WriteMode
  replicateM 1000 (process_chunk h_in h_out)
  hClose h_in
  hClose h_out

chunk_size = 1000
int_size = 4

process_chunk h_in h_out = do
  bin1 <- BIN.hGet h_in chunk_size
  let ints1 = runGet (replicateM (chunk_size `div` int_size) getWord32le) bin1
  let ints2 = map (\ x -> if x < 1000 then 2*x else x) ints1
  let bin2 = runPut (mapM_ putWord32le ints2)
  BIN.hPut h_out bin2

This, I believe, does what you asked for. It reads 1000 chunks of chunk_size bytes, converts each one into a list of Word32 (so it only ever has chunk_size / 4 integers in memory at once), does the calculation you specified, and writes the result back out again.

Obviously if you did this "for real" you'd want EOF checking and such.

这篇关于Haskell读/写二进制文件完整的工作示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆