如何有效地从具有复杂结构的C文件读取二进制数据++ [英] How to efficiently read binary data from files that has complex structure in C++

查看:185
本文介绍了如何有效地从具有复杂结构的C文件读取二进制数据++的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写一张code在几个GB跨越使用C ++输入输出流,这是我选择了在C API进行了一些,我不会来烦你设计的原因多个文件中的数据读取用。由于该数据是由同一台机器在我的code将运行在一个独立的程序产生的,我相信,如那些有关字节序的问题可以,在大多数情况下,可以忽略。

该文件具有相当复杂的结构。例如,存在描述的特定二进制配置中的记录数的报头。后来在该文件中,我必须做出code有条件读取的行数。这种图案的重复在一个复杂的,但证据充分的方式

我的问题是有关如何有效地做到这一点 - 我敢肯定,我的过程将是IO限制的,所以我的直觉是,而不是在短小块,如下面的方法读取数据。

 的std ::矢量<&INT GT;缓冲;
buffer.reserve(500);
file.read((字符*)及缓冲器[0],500 *的sizeof(int)的);

我应该在一个文件中完全一次读取并试图处理它在存储器中。所以我的问题相互关联的:


  • 鉴于这似乎意味着读取一个char *或std :: vector的数组,你会如何最好去了解这个数组转换成正确地重新present文件结构所需的数据格式?

  • 是我的假设不正确的?

我知道答案显然是试图再后来的配置文件,配置文件我一定会的。但这个问题更多的是如何挑选一开始正确的方法 - 一种选择正确的算法的优化,而不是那种最佳化的,我可以设想后来确定瓶颈之后做

我有兴趣提供了答案 - 我往往只能够找到答案,相对简单的二进制文件,其上面的做法是合适的。我的问题是,二进制数据的大部分是有条件的结构上的标头中的编号,以该文件(即使标题被格式化这样!),所以我需要能够稍微更仔细地处理该文件。

先谢谢了。

编辑:通过对内存映射过来的一些评论 - 看起来不错,但不知道如何做到这一点和所有我读过告诉我,它是不可移植。我有兴趣尝试的MMAP,而且在更便携的解决方案(如果有的话!)


解决方案

使用一个64位操作系统和内存映射文件。如果您需要支持32位操作系统,以及使用映射文件的大块需要一个兼容层。

另外,如果你总是需要在文件顺序中的对象,只写一个理智的解析器来处理数据块的对象。像这样的:

1)读入文件512KB。

2),我们读出的数据中提取尽可能多的对象越好。

3)根据需要填充缓冲区回升到512KB阅读在尽可能多的字节。如果我们读任何字节可言,停下来。

4)转到步骤3。

I am writing a piece of code to read in several GB of data that spans multiple files using C++ IOStreams, which I've chosen over the C API for a number of design reasons that I won't bore you with. Since the data is produced by a separate program on the same machine where my code will run, I am confident that issues such as those relating to endianess can, for the most part, be ignored.

The files have a reasonably complicated structure. For example, there is a header that describes the number of records of a particular binary configuration. Later in the file, I must make the code conditionally read that number of lines. This sort of pattern is repeated in a complicated, but well-documented way.

My question is related to how to do this efficiently - I'm sure my process is going to be IO-limited, so my instinct is that rather than reading in data in smallish blocks, such as the following approach

std::vector<int> buffer;
buffer.reserve(500);
file.read( (char*)&buffer[0], 500 * sizeof(int));

I should read in one file entirely at a time and try to process it in memory. So my interrelated questions:

  • Given that this would seem to mean reading in a char* or std::vector array, how would you best go about converting this array into the data format required to correctly represent the file structure?
  • Are my assumptions incorrect?

I know the obvious answer is to try and then to profile later, and profile I certainly will. But this question is more about how to pick the right approach at the beginning - a sort of "pick the right algorithm" optimisation, rather than the sort of optimisations that I could envisage doing after identifying bottlenecks later on!

I'll be interested in the answers offered up - I tend to only be able to find answers for relatively simple binary files, for which the approach above is suitable. My problem is that the bulk of the binary data is structured conditionally on the numbers in the header to the file (even the header is formatted this way!) so I need to be able to process the file a little more carefully.

Thanks in advance.

EDIT: Some comments coming through about memory mapping - looks good, but not sure how to do it and all I've read tells me it isn't portable. I'm interested in trying an mmap, but also in more portable solutions (if any!)

解决方案

Use a 64-bit OS and memory map the file. If you need to support a 32-bit OS as well, use a compatibility layer that maps chunks of the file as needed.

Alternatively, if you always need the objects in file order, just write a sane parser to handle the objects in chunks. Like this:

1) Read in 512KB of file.

2) Extract as many objects as possible from the data we read.

3) Read in as many bytes as needed to fill the buffer back up to 512KB. If we read no bytes at all, stop.

4) Go to step 3.

这篇关于如何有效地从具有复杂结构的C文件读取二进制数据++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆