fwrite()性能远低于磁盘容量 [英] fwrite() performance well below disk capacity

查看:502
本文介绍了fwrite()性能远低于磁盘容量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个动态分配的数组,其中包含1700万个元素的 struct 。为了将它保存到磁盘,我写了
$ b $ pre $ f $(struct,sizeof(Struct),NumStructs,FilePointer

在后面的步骤中,我用等价的 fread 语句读取它,即使用 sizeof(Struct) NumStructs 的计数。我希望得到的文件将大约3.5 GB(这是所有x64)。

是否有可能通过 sizeof(Struct)* NumStructs 作为尺寸, 1 作为加速计数?我为什么写操作可能需要在32 GB的RAM(足够的写缓存)的快速计算机上花费时间我的脑袋。我已经运行了自制基准测试,并且缓存足够积极,以前的800 MB到1 GB的典型值为400 MB /秒。 PerfMon显示它在fwrite期间占用了一个核心的100%。



我看到了这个问题:这里,所以我问的是,是否有一个循环内fwrite可以欺骗更快地告诉它写1元素大小n * s,而不是大小为n的n个元素。

编辑



我在释放模式下跑了两次,两次都放弃了等待。然后我在调试模式下运行它,知道通常 fwrite 操作会花费更长的时间。要写入的数据的确切大小是4,368,892,928字节。在这三种情况下,PerfMon都会显示两次间隔30秒的磁盘写入活动,之后CPU将达到100%的一个内核。该文件是在这一点73,924,608字节。我在 fwrite 的两边都有断点,所以我知道它就在那里。这看起来似乎有些东西卡住了,但是我会在一夜之间离开,看看。



编辑



在这个过夜的时候,它肯定会挂在 fwrite ,这个文件永远不会超过70 MB。 解决方案

这肯定是 fwrite (我试过VS2012和2010)的问题。



从一个标准的C ++项目开始,我只改变了在一个静态链接中使用多字节字符集x64 target和标准库的多线程调试版本的设置。

以下代码成功(没有错误检查的简洁性):

  #define _CRT_SECURE_NO_WARNINGS 
#include< stdio.h>
#include< stdlib.h>

int main()
{
FILE * fp;
long long n;
unsigned char * data;

n = 4LL * 1024 * 1024 * 1024 - 1;

data =(unsigned char *)malloc(n * sizeof(unsigned char));

fp = fopen(T:\\test.bin,wb);

fwrite(data,sizeof(unsigned char),n,fp);

fclose(fp);



$ b在我的机器上的调试版本中,程序在大约1分钟内完成( malloc只需要几秒钟,所以这大多是 fwrite ),平均消耗30%的CPU。 PerfMon显示写入完全发生在最后是一个4 GB的闪存(写入缓存)。
$ b

更改 - 1 code>到一个 + 1 赋值为n,你会重现这个问题:即时100%的CPU使用率,几分钟后,文件的大小仍然是0字节(在我的实际代码中回忆它设法转储70 MB左右)。



这绝对是一个问题 fwrite ,因为下面的代码可以写入文件:

  int main()
{
FILE * fp;
long long n;
long long counter = 0;
长的大块;
unsigned char * data;

n = 4LL * 1024 * 1024 * 1024 +1;

data =(unsigned char *)malloc(n * sizeof(unsigned char));

fp = fopen(T:\\test.bin,wb);

while(counter {
chunk = min(n - counter,100 * 1000);
fwrite(data + counter,sizeof(unsigned char),chunk,fp);
counter + = chunk;
}

fclose(fp);



$ b $ p
$ b

在我的机器上,这花了45秒而不是1分钟。 CPU使用率不是恒定的,它会以突发形式出现,报告的IO写入比在单块方法中分布得更多。



如果速度的增加是错误的(也就是由于缓存),因为在编写包含所有相同数据的文件之前,我已经完成了测试,包含随机数据的文件和报告的写入速度(使用缓存)是相同的。所以我敢打赌,至少这个 fwrite 的实现并不喜欢一次传递给它的大块。



我还测试了 fread 在4 GB + 1的情况下关闭要写入的文件后立即读取,并及时返回 - 最多几秒钟(在这里没有真正的数据,所以我没有检查它)。
$ b 编辑

<我用块编写方法和4 GB-1文件的单个fwrite调用(这两种方法都可以做到的最大尺寸)运行了一些测试。多次运行程序(使用代码打开文件,使用多个fwrite调用写入,关闭,然后再次打开,单次调用,关闭),毫无疑问,块写入方法返回的速度更快。在最坏的情况下,单次通话所花费的时间为68%,充其量只有20%。

I have a dynamically allocated array of a struct with 17 million elements. To save it to disk, I write

fwrite(StructList, sizeof(Struct), NumStructs, FilePointer)

In a later step I read it with an equivalent fread statement, that is, using sizeof(Struct) and a count of NumStructs. I expect the resulting file will be around 3.5 GB (this is all x64).

Is it possible instead to pass sizeof(Struct) * NumStructs as the size and 1 as the count to speed this up? I am scratching my head as to why the write operation could possibly take minutes on a fast computer with 32 GB RAM (plenty of write cache). I've run home-brew benchmarks and the cache is aggressive enough that 400 MB/sec for the first 800 MB to 1 GB is typical. PerfMon shows it is consuming 100% of one core during the fwrite.

I saw the question here so what I'm asking is, whether there is some loop inside fwrite that can be "tricked" to go faster by telling it to write 1 element of size n*s as opposed to n elements of size s.

EDIT

I ran this twice in release mode and both times I gave up waiting. Then I ran it in debug mode knowing that typically the fwrite operations take way longer. The exact size of the data to be written is 4,368,892,928 bytes. In all three cases, PerfMon shows two bursts of disk write activity about 30 seconds apart, after which the CPU goes to 100% of one core. The file is at that point 73,924,608 bytes. I have breakpoints on either side of the fwrite so I know that's where it's sitting. It certainly seems that something is stuck but I will leave it running overnight and see.

EDIT

Left this overnight and it definitely hung in fwrite, the file never went past 70 MB.

解决方案

This is definitely a problem with fwrite (I tried both VS2012 and 2010).

Starting with a standard C++ project, I changed only the setting to use multi-byte character set, x64 target, and the multithreaded debug version of the standard library in a static link.

The following code succeeds (no error checking for conciseness):

#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>

int main()
{
    FILE *fp;
    long long n;
    unsigned char *data;

    n = 4LL * 1024 * 1024 * 1024 - 1;

    data = (unsigned char *)malloc(n * sizeof(unsigned char));

    fp = fopen("T:\\test.bin", "wb");

    fwrite(data, sizeof(unsigned char), n, fp);

    fclose(fp);
}

In the debug version on my machine, the program finishes in about 1 minute (the malloc takes only a few seconds, so this is mostly fwrite), consuming on average 30% CPU. PerfMon shows the write occurs entirely at the end is a single "flash" of 4 GB (write cache).

Change the - 1 to a + 1 in the assignment of n and you reproduce the problem: instantaneous 100% CPU usage and nothing is ever written. After several minutes, the size of the file was still 0 bytes (recall in my actual code it manages to dump 70 MB or so).

This is definitely a problem in fwrite, as the following code can write the file just fine:

int main()
{
    FILE *fp;
    long long n;
    long long counter = 0;
    long long chunk;
    unsigned char *data;

    n = 4LL * 1024 * 1024 * 1024 + 1;

    data = (unsigned char *)malloc(n * sizeof(unsigned char));

    fp = fopen("T:\\test.bin", "wb");

    while (counter < n)
    {
        chunk = min(n - counter, 100*1000);
        fwrite(data+counter, sizeof(unsigned char), chunk, fp);
        counter += chunk;
    }

    fclose(fp);
}

On my machine, this took 45 seconds instead of 1 minute. CPU usage is not constant, it comes in bursts, and the reported IO writes are more distributed than in the "single chunk" method.

I would be really surprised if the increase in speed is false (that is, due to caching) because I've done tests before writing several files containing all the same data vs files containing randomized data and the reported write speeds (with caching) are the same. So I'm willing to bet that at least this implementation of fwrite does not like huge chunks passed to it at a time.

I also tested fread to read immediately after closing the file for write in the 4 GB+1 case and it returns in a timely manner - a few seconds at most (no real data here so I didn't check it).

EDIT

I ran some tests with the chunk-writing method and the single fwrite call of a 4 GB-1 file (the largest size which both methods can do). Running the program several times (with code such that the file is opened, written with multiple fwrite calls, closed, then opened again, written in a single call, and closed), there is no question the chunk-writing method returns faster. At worst case it returns in 68% of the time it takes for the single call and at best I got just 20%.

这篇关于fwrite()性能远低于磁盘容量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆