为什么读取数据块比读取文件I / O中的字节读取速度快 [英] why is reading blocks of data faster than reading byte by byte in file I/O

查看:213
本文介绍了为什么读取数据块比读取文件I / O中的字节读取速度快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到与使用 fread 读取文件相比,逐字节读取文件要花费更多的时间来读取整个文件。

I have noticed that reading a file byte-by-bye takes more time to read whole file than reading file using fread .

根据 cplusplus

size_t fread(void * ptr,size_t size,size_t count,FILE * stream);复制代码

读取一组 count 个元素,每个元素的大小为大小个字节,然后将其存储在 ptr 指定的内存块中。

Reads an array of count elements, each one with a size of size bytes, from the stream and stores them in the block of memory specified by ptr.

Q1)因此,再次 fread 读取文件1字节,这与1字节的方法读取的方式不同吗?

Q1 ) So , again fread reads the file by 1 bytes , so isn't it the same way as to read by 1-byte method ?

第二季度)结果证明,仍然 fread 花费的时间更少。

Q2 ) Results have proved that still fread takes lesser time .

此处


我用大约44兆字节的文件作为输入来运行它。使用VC ++ 2012编译时,得到以下结果:

I ran this with a file of approximately 44 megabytes as input. When compiled with VC++2012, I got the following results:

使用getc计数:400000时间:2.034

使用fread计数:400000时间:0.257

using getc Count: 400000 Time: 2.034
using fread Count: 400000 Time: 0.257

关于SO的帖子很少,这取决于操作系统。

Q3)是操作系统的角色?

Also few posts on SO talks about it that it depends on OS .
Q3) What is the role of OS ?

为什么如此,幕后到底是什么?

Why is it so and what exactly goes behind the scene ?

推荐答案

读取不会一次读取一个字节的文件。纯粹为了方便起见,该界面可让您分别指定 size count 。在后台,读取只会读取 size * count 个字节。

fread does not read a file one byte at a time. The interface, which lets you specify size and count separately, is purely for your convenience. Behind the scenes, fread will simply read size * count bytes.

读取将在一次上尝试读取的字节数高度取决于您的C实现和基础文件系统。除非您对这两种方法都非常熟悉,否则通常可以假定 fread 比您自己发明的任何东西都更接近最优。

The amount of bytes that fread will try to read at once is highly dependent on your C implementation and the underlying filesystem. Unless you're intimately familiar with both, it's often safe to assume that fread will be closer to optimal than anything you invent yourself.

编辑:与物理磁盘吞吐量相比,物理磁盘的寻道时间相对较高。换句话说,他们花费相对较长的时间开始阅读。但是一旦启动,它们就可以相对快速地读取连续的字节。因此,在没有任何操作系统/文件系统支持的情况下,对 fread 的任何调用都将导致开始每次读取的大量开销。因此,为了有效利用磁盘,您将希望一次读取尽可能多的字节。但是与CPU,RAM和物理缓存相比,磁盘速度较慢。一次读取太多内容意味着您的程序可能花了很多时间等待磁盘完成读取,而这本来可以做一些有用的事情(例如处理已读取的字节)。

physical disks tend to have a relatively high seek time compared to their throughput. In other words, they take relatively long to start reading. But once started, they can read consecutive bytes relatively fast. So without any OS/filesystem support, any call to fread would result in a severe overhead to start each read. So to utilize your disk efficiently, you'll want to read as many bytes at once as possible. But disks are slow compared to CPU, RAM and physical caches. Reading too much at once means your program spends a lot of time waiting for the disk to finish reading, when it could have been doing something useful (like processing already read bytes).

这是OS /文件系统的用武之地。从事这些工作的聪明人花了很多时间弄清楚要从磁盘请求的正确字节数。因此,当您调用 fread 并请求 X 个字节时,操作系统/文件系统会将其转换为 N 个请求,每个请求 Y 个字节。其中 Y 是一些通常的最佳值,它取决于比这里提到的更多的变量。

This is where the OS/filesystem comes in. The smart people who work on those have spent a lot of time figuring out the right amount of bytes to request from a disk. So when you call fread and request X bytes, the OS/filesystem will translate that to N requests for Y bytes each. Where Y is some generally optimal value that depends on more variables than can be mentioned here.

OS /文件系统是所谓的 readahead。基本思想是,大多数IO发生在循环内部。因此,如果程序从磁盘请求一些字节,则很有可能不久之后会请求下一个字节。因此,OS /文件系统读取的内容通常比刚开始时实际请求的要多。同样,确切的数量取决于要提及的太多变量。但是基本上,这就是为什么一次读取一个字节还是有一定效率的原因(如果不提前读取,则读取速度会再慢10倍)。

Another role of the OS/filesystem is what's called 'readahead'. The basic idea is that most IO occurs inside loops. So if a program requests some bytes from disk, there's a very good chance it'll request the next bytes shortly afterwards. Because of this, the OS/filesystem will typically read slightly more than you actually requested at first. Again, the exact amount depends on too many variables to mention. But basically, this is the reason that reading a single byte at a time is still somewhat efficient (it would be another ~10x slower without readahead).

最好将 fread 看作是向OS /文件系统提供一些有关您要读取多少字节的提示。这些提示越准确(越接近您要读取的字节总数),操作系统/文件系统将越优化磁盘IO。

In the end, it's best to think of fread as giving some hints to the OS/filesystem about how many bytes you'll want to read. The more accurate those hints are (closer to the total amount of bytes you'll want to read), the better the OS/filesystem will optimize the disk IO.

这篇关于为什么读取数据块比读取文件I / O中的字节读取速度快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆