解析二进制文件.什么是现代方式? [英] Parsing a binary file. What is a modern way?

查看:81
本文介绍了解析二进制文件.什么是现代方式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个已知格式的二进制文件.例如,让format像这样:

I have a binary file with some layout I know. For example let format be like this:

  • 2个字节(无符号短)-字符串的长度
  • 5个字节(5个字符)-字符串-一些ID名称
  • 4个字节(无符号int)-大步
  • 24个字节(6个浮点数-2个步幅(每个3个浮点数))-浮点数据

文件应该看起来像(我添加了可读性的空格):

The file should look like (I added spaces for readability):

5 hello 3 0.0 0.1 0.2 -0.3 -0.4 -0.5

这里5-是2个字节:0x05 0x00. "hello"-5个字节,依此类推.

Here 5 - is 2 bytes: 0x05 0x00. "hello" - 5 bytes and so on.

现在,我想读取此文件.目前,我是这样做的:

Now I want to read this file. Currently I do it so:

  • 将文件加载到 ifstream
  • 将此流读至char buffer[2]
  • 将其强制转换为无符号的简称:unsigned short len{ *((unsigned short*)buffer) };.现在我有一个字符串的长度.
  • 将流读取到vector<char>,并从该向量创建std::string.现在我有了字符串ID.
  • 以相同的方式读取接下来的4个字节并将其强制转换为unsigned int.现在,我有了大步前进.
  • 虽然未读取文件末尾,但以相同的方式浮动-创建char bufferFloat[4]并为每个浮动强制转换*((float*)bufferFloat).
  • load file to ifstream
  • read this stream to char buffer[2]
  • cast it to unsigned short: unsigned short len{ *((unsigned short*)buffer) };. Now I have length of a string.
  • read a stream to vector<char> and create a std::string from this vector. Now I have string id.
  • the same way read next 4 bytes and cast them to unsigned int. Now I have a stride.
  • while not end of file read floats the same way - create a char bufferFloat[4] and cast *((float*)bufferFloat) for every float.

这有效,但对我来说,它看起来很丑.是否可以在不创建char [x]的情况下直接读取unsigned shortfloatstring等?如果没有,正确转换的方式是什么(我读到我正在使用的样式-是旧样式)?

This works, but for me it looks ugly. Can I read directly to unsigned short or float or string etc. without char [x] creating? If no, what is the way to cast correctly (I read that style I'm using - is an old style)?

P.S .:当我写一个问题时,脑海中浮现出更清晰的解释-如何从char [x]中的任意位置投射任意数量的字节?

P.S.: while I wrote a question, the more clearer explanation raised in my head - how to cast arbitrary number of bytes from arbitrary position in char [x]?

更新:我忘了明确提到字符串和浮点数据长度在编译时未知,并且是可变的.

Update: I forgot to mention explicitly that string and float data length is not known at compile time and is variable.

推荐答案

在C ++中可以正常工作的C方法是声明一个struct:

The C way, which would work fine in C++, would be to declare a struct:

#pragma pack(1)

struct contents {
   // data members;
};

请注意

  • 您需要使用编译指示使编译器将结构中的数据按原样对齐;
  • 此技术仅适用于 POD类型
  • You need to use a pragma to make the compiler align the data as-it-looks in the struct;
  • This technique only works with POD types

然后将读取缓冲区直接转换为struct类型:

And then cast the read buffer directly into the struct type:

std::vector<char> buf(sizeof(contents));
file.read(buf.data(), buf.size());
contents *stuff = reinterpret_cast<contents *>(buf.data());

现在,如果数据大小可变,则可以分成几个块.要从缓冲区读取单个二进制对象,可以使用读取器函数:

Now if your data's size is variable, you can separate in several chunks. To read a single binary object from the buffer, a reader function comes handy:

template<typename T>
const char *read_object(const char *buffer, T& target) {
    target = *reinterpret_cast<const T*>(buffer);
    return buffer + sizeof(T);
}

主要优点是这样的阅读器可以专门用于更高级的c ++对象:

The main advantage is that such a reader can be specialized for more advanced c++ objects:

template<typename CT>
const char *read_object(const char *buffer, std::vector<CT>& target) {
    size_t size = target.size();
    CT const *buf_start = reinterpret_cast<const CT*>(buffer);
    std::copy(buf_start, buf_start + size, target.begin());
    return buffer + size * sizeof(CT);
}

现在在主解析器中:

int n_floats;
iter = read_object(iter, n_floats);
std::vector<float> my_floats(n_floats);
iter = read_object(iter, my_floats);

注意:正如Tony D观察到的那样,即使您可以通过#pragma指令和手动填充(如果需要)正确地进行对齐,也可能会遇到处理器对齐方式不兼容的情况. (最佳情况下)性能问题或(最坏情况下)陷阱信号的形式.仅当您可以控制文件格式时,此方法才可能很有趣.

Note: As Tony D observed, even if you can get the alignment right via #pragma directives and manual padding (if needed), you may still encounter incompatibility with your processor's alignment, in the form of (best case) performance issues or (worst case) trap signals. This method is probably interesting only if you have control over the file's format.

这篇关于解析二进制文件.什么是现代方式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆