如何从一个文件读取数据块,然后从该块读入一个向量? [英] How to read blocks of data from a file and then read from that block into a vector?

查看:142
本文介绍了如何从一个文件读取数据块,然后从该块读入一个向量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个有x记录的文件。一个'块'有m个记录。文件中的块总数n = x / m。如果我知道一个记录的大小,说b字节(一个块的大小= b * m),我可以使用系统命令read()读取完整的块(有没有其他方法?现在,我如何从这个块读取每个记录,并将每个记录作为一个单独的元素到一个向量。

Suppose I have a file which has x records. One 'block' holds m records. Total number of blocks in file n=x/m. If I know the size of one record, say b bytes (size of one block = b*m), I can read the complete block at once using system command read() (is there any other method?). Now, how do I read each record from this block and put each record as a separate element into a vector.

我想要这样做的原因是减少磁盘i / o操作。根据我学到的东西,磁盘I / O操作要贵得多。
或者它会花费相同的时间,当我通过记录从文件中读取记录,并直接把它放入向量而不是逐块读取?在读取一个块,我将只有n个磁盘I / O,而X I / O的,如果我读记录的记录。

The reason why I want to do this in the first place is to reduce the disk i/o operations. As the disk i/o operations are much more expensive according to what I have learned. Or will it take the same amount of time as when I read record by record from file and directly put it into vectors instead of reading block by block? On reading block by block, I will have only n disk I/O's whereas x I/O's if I read record by record.

谢谢。

推荐答案

您应该考虑使用 mmap() ,而不是使用 read()读取档案。

You should consider using mmap() instead of reading your files using read().

mmap 有什么好处,你可以将文件内容简单地映射到进程空间,就像你已经有一个指向文件内容的指针一样。通过简单地检查内存内容并将其视为数组,或通过使用 memcpy()复制数据,您将隐式执行读取操作,但必要时 - 操作系统虚拟内存子系统

What's nice about mmap is that you can treat file contents as simply mapped into your process space as if you already had a pointer into the file contents. By simply inspecting memory contents and treating it as an array, or by copying data using memcpy() you will implicitly perform read operations, but only as necessary - operating system virtual memory subsystem is smart enough to do it very efficiently.

避免mmap的唯一可能的原因可能是,如果你运行在32位操作系统,文件大小超过2千兆字节小于)。在这种情况下,操作系统可能无法为您的 mmap 的内存分配地址空间。但是在64位操作系统上使用 mmap 永远不会有问题。

The only possible reason to avoid mmap maybe if you are running on 32-bit OS and file size exceeds 2 gigabytes (or slightly less than that). In this case OS may have trouble allocating address space to your mmap-ed memory. But on 64-bit OS using mmap should never be a problem.

mmap 可能很麻烦,如果你正在写很多数据,并且数据的大小不是预先知道的。除此之外,在读取上使用它总是更好更快。

Also, mmap can be cumbersome if you are writing a lot of data, and size of the data is not known upfront. Other than that, it is always better and faster to use it over the read.

实际上,系统广泛地依赖于 mmap 。例如,在Linux中,为了执行一些二进制文件,你的可执行文件只是简单的 mmap -ed并从内存执行,好像它是通过 / code>,但实际上不会

Actually, most modern operating systems rely on mmap extensively. For example, in Linux, to execute some binary, your executable is simply mmap-ed and executed from memory as if it was copied there by read, without actually reading it.

这篇关于如何从一个文件读取数据块,然后从该块读入一个向量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆