发生的事情在磁盘I / O的窗帘后面? [英] What goes on behind the curtains during disk I/O?

查看:92
本文介绍了发生的事情在磁盘I / O的窗帘后面?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我试图在一个文件中的一些位置和写入数据(20字节)少量的,发生的事情在幕后?

When I seek to some position in a file and write a small amount of data (20 bytes), what goes on behind the scenes?

我的理解

要我所知,可写入或从磁盘读出的数据的最小单位是一个扇区(传统上为512字节,但该标准现在正在改变)。这意味着写20个字节我需要阅读整个行业,修改它的一些内存并把它写回磁盘。

To my knowledge, the smallest unit of data that can be written or read from a disk is one sector (traditionally 512 bytes, but that standard is now changing). That means to write 20 bytes I need to read a whole sector, modify some of it in memory and write it back to disk.

这是我所期望的无缓冲的I / O发生。我也希望缓冲I / O做大致相同的事情,但是要聪明关于它的缓存。所以,我会想,如果我这样做打击当地窗外随机配备缓冲现身稍好寻求和写入,既缓冲和无缓冲I / O应该有类似的表现......也许。

This is what I expect to be happening in unbuffered I/O. I also expect buffered I/O to do roughly the same thing, but be clever about its cache. So I would have thought that if I blow locality out the window by doing random seeks and writes, both buffered and unbuffered I/O ought to have similar performance... maybe with unbuffered coming out slightly better.

再然后,我知道这是疯了缓冲I / O只缓冲一个扇区,所以我可能还期望它可怕的执行。

Then again, I know it's crazy for buffered I/O to only buffer one sector, so I might also expect it to perform terribly.

我的应用

我存储由接收为一十万点向上遥控遥测一个SCADA设备驱动程序收集的值。有文件,使得每个记录是40字节,但只有20个字节的一个需要在更新期间将被写入的额外数据。

I am storing values gathered by a SCADA device driver that receives remote telemetry for upwards of a hundred thousand points. There is extra data in the file such that each record is 40 bytes, but only 20 bytes of that needs to be written during an update.

pre-执行基准

要检查我不需要做梦了一些出色过度设计的解决方案,我已经运行使用几百万写入可能总共包含20万条记录的文件记录的随机测试。每个测试种子具有相同值的随机数发生器是公平的。首先,我删除该文件,并垫它的总长度(约7.6兆),然后循环几百万次,传球一个随机文件偏移和一些数据的两个测试功能之一:

To check that I don't need to dream up some brilliantly over-engineered solution, I have run a test using a few million random records written to a file that could contain a total of 200,000 records. Each test seeds the random number generator with the same value to be fair. First I erase the file and pad it to the total length (about 7.6 meg), then loop a few million times, passing a random file offset and some data to one of two test functions:

void WriteOldSchool( void *context, long offset, Data *data )
{
    int fd = (int)context;
    lseek( fd, offset, SEEK_SET );
    write( fd, (void*)data, sizeof(Data) );
}

void WriteStandard( void *context, long offset, Data *data )
{
    FILE *fp = (FILE*)context;
    fseek( fp, offset, SEEK_SET );
    fwrite( (void*)data, sizeof(Data), 1, fp );
    fflush(fp);
}

也许没有惊喜?

老校友方法技高一筹 - 受了不少。这是快了6倍(148万对232000条记录每秒)。为了确保我没有碰到硬件缓存,我扩大了我的数据库大小为20万条记录(763兆的文件大小),并得到了相同的结果。

The OldSchool method came out on top - by a lot. It was over 6 times faster (1.48 million versus 232000 records per second). To make sure I hadn't run into hardware caching, I expanded my database size to 20 million records (file size of 763 meg) and got the same results.

你指出来 fflush 显而易见的呼叫之前,让我说,删除它没有任何效果。我想这是因为缓存必须承诺时,我争取足够远的地方,这是我在做什么的大部分时间。

Before you point out the obvious call to fflush, let me say that removing it had no effect. I imagine this is because the cache must be committed when I seek sufficiently far away, which is what I'm doing most of the time.

那么,这是怎么回事?

在我看来,该缓冲I / O必须阅读(也可能编写所有的)一大块的文件,每当我试着写。因为我很少考虑其缓存的优势,这是非常浪费的。

It seems to me that the buffered I/O must be reading (and possibly writing all of) a large chunk of the file whenever I try to write. Because I am hardly ever taking advantage of its cache, this is extremely wasteful.

另外(我不知道磁盘硬件缓存的细节),如果缓冲I / O尝试当我改变只有一个写一堆行业,这将减少硬件高速缓存的效率

In addition (and I don't know the details of hardware caching on disk), if the buffered I/O is trying to write a bunch of sectors when I change only one, that would reduce the effectiveness of the hardware cache.

是否有任何磁盘的专家那里谁可以发表评论,并解释这比我的实验结果更好? =)

Are there any disk experts out there who can comment and explain this better than my experimental findings? =)

推荐答案

的确,至少我与GNU libc的系统上,它看起来就像是STDIO回写更改部分之前读取4KB块。似乎假给我,但我想有人认为这是在当时是一个不错的主意。

Indeed, at least on my system with GNU libc, it looks like stdio is reading 4kB blocks before writing back the changed portion. Seems bogus to me, but I imagine somebody thought it was a good idea at the time.

我检查通过编写一个简单的C程序打开一个文件,写一个小的数据一次,并退出;然后跑下它strace的,看看哪些系统调用它实际上触发。以10000偏移写作,我看到了这些系统调用:

I checked by writing a trivial C program to open a file, write a small of data once, and exit; then ran it under strace, to see which syscalls it actually triggered. Writing at an offset of 10000, I saw these syscalls:

lseek(3, 8192, SEEK_SET)                = 8192
read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1808) = 1808
write(3, "hello", 5)                    = 5

看来你要坚持使用低级别的Unix风格的I / O为这个项目,是吗?

Seems that you'll want to stick with the low-level Unix-style I/O for this project, eh?

这篇关于发生的事情在磁盘I / O的窗帘后面?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆