用scala解析二进制数据 [英] Parsing of binary data with scala

查看:61
本文介绍了用scala解析二进制数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析一些简单的二进制文件.(文件包含 n 个条目,其中包含几个不同大小的有符号/无符号整数等)

I need to parse some simple binary Files. (The files contains n entries which consists of several signed/unsigned Integers of different sizes etc.)

在我手动"进行解析的那一刻.有人知道有助于进行此类解析的库吗?

In the moment i do the parsing "by hand". Does somebody know a library which helps to do this type of parsing?

手动"意味着我将数据逐字节排序为正确的顺序并将其转换为整数/字节等.还有一些数据是无符号的.

"By hand" means that i get the Data Byte by Byte sort it in to the correct Order and convert it to an Int/Byte etc. Also some of the Data is unsigned.

推荐答案

我使用了 sbinary 之前的图书馆,非常好.文档有点稀疏,但我建议先查看旧的 wiki 页面 因为这为您提供了一个起点.然后查看测试规范,因为它为您提供了一些非常好的示例.

I've used the sbinary library before and it's very nice. The documentation is a little sparse but I would suggest first looking at the old wiki page as that gives you a starting point. Then check out the test specifications, as that gives you some very nice examples.

sbinary 的主要好处是它让您可以将每个对象的有线格式描述为 Format 对象.然后,您可以将这些格式化的类型封装在更高级别的 Format 对象中,Scala 会完成查找该类型的所有繁重工作,只要您将其作为隐式对象包含在当前范围内即可.

The primary benefit of sbinary is that it gives you a way to describe the wire format of each object as a Format object. You can then encapsulate those formatted types in a higher level Format object and Scala does all the heavy lifting of looking up that type as long as you've included it in the current scope as an implicit object.

正如我在下面所说的,我现在建议人们使用 scodec 而不是 sbinary.作为如何使用编解码器的示例,我将实现如何在内存中读取以下 C 结构体的二进制表示:

As I say below, I'd now recommend people use scodec instead of sbinary. As an example of how to use scodec, I'll implement how to read a binary representation in memory of the following C struct:

struct ST
{
   long long ll; // @ 0
   int i;        // @ 8
   short s;      // @ 12
   char ch1;     // @ 14
   char ch2;     // @ 15
} ST;

一个匹配的 Scala 案例类是:

A matching Scala case class would be:

case class ST(ll: Long, i: Int, s: Short, ch1: String, ch2: String)

我只是说我们存储的是字符串而不是字符,这样我自己就更容易了,我会说它们是结构中的 UTF-8 字符.我也没有处理这个架构上的字节序细节或 long 和 int 类型的实际大小,只是假设它们分别是 64 和 32.

I'm making things a bit easier for myself by just saying we're storing Strings instead of Chars and I'll say that they are UTF-8 characters in the struct. I'm also not dealing with endian details or the actual size of the long and int types on this architecture and just assuming that they are 64 and 32 respectively.

Scodec 解析器通常使用组合器从较低级别的解析器构建更高级别的解析器.所以在下面,我们将定义一个解析器,它结合了一个 8 字节值、一个 4 字节值、一个 2 字节值、一个 1 字节值和一个另外的 1 字节值.这个组合的返回是一个元组编解码器:

Scodec parsers generally use combinators to build higher level parsers from lower level ones. So for below, we'll define a parser which combines a 8 byte value, a 4 byte value, a 2 byte value, a 1 byte value and one more 1 byte value. The return of this combination is a Tuple codec:

val myCodec: Codec[Long ~ Int ~ Short ~ String ~ String] = 
  int64 ~ int32 ~ short16 ~ fixedSizeBits(8L, utf8) ~ fixedSizeBits(8L, utf8)

然后我们可以通过调用 xmap 函数将其转换为 ST case 类,该函数需要两个函数,一个将元组编解码器转换为目标类型,另一个函数获取目标类型并将其转换为元组形式:

We can then transform this into the ST case class by calling the xmap function on it which takes two functions, one to turn the Tuple codec into the destination type and another function to take the destination type and turn it into the Tuple form:

val stCodec: Codec[ST] = myCodec.xmap[ST]({case ll ~ i ~ s ~ ch1 ~ ch2 => ST(ll, i, s, ch1, ch2)}, st => st.ll ~ st.i ~ st.s ~ st.ch1 ~ st.ch2)

现在,您可以像这样使用编解码器:

Now, you can use the codec like so:

stCodec.encode(ST(1L, 2, 3.shortValue, "H", "I"))
res0: scodec.Attempt[scodec.bits.BitVector] = Successful(BitVector(128 bits, 0x00000000000000010000000200034849))

res0.flatMap(stCodec.decode)
=> res1: scodec.Attempt[scodec.DecodeResult[ST]] = Successful(DecodeResult(ST(1,2,3,H,I),BitVector(empty)))

我鼓励您查看 Scaladocs 而不是指南,因为 Scaladocs 中有更多详细信息.该指南是基础知识的良好开端,但它并没有深入到组合部分,但 Scaladocs 很好地涵盖了这一点.

I'd encourage you to look at the Scaladocs and not at the Guide as there's much more detail in the Scaladocs. The guide is a good start at the very basics but it doesn't get into the composition part much but the Scaladocs cover that pretty well.

这篇关于用scala解析二进制数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆