如何快速将大型缓冲区写入C ++中的二进制文件？ [英] How to write a large buffer into a binary file in C++, fast?

查看：68 发布时间：2020/9/26 20:43:32 c++ performance optimization file-io io

本文介绍了如何快速将大型缓冲区写入C ++中的二进制文件？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将大量数据写入我的SSD（固态驱动器）。大量是指80GB。

I'm trying to write huge amounts of data onto my SSD(solid state drive). And by huge amounts I mean 80GB.

我在网上浏览了解决方案，但我想到的最好的方法是：

I browsed the web for solutions, but the best I came up with was this:

#include <fstream>
const unsigned long long size = 64ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
    std::fstream myfile;
    myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    //Here would be some error handling
    for(int i = 0; i < 32; ++i){
        //Some calculations to fill a[]
        myfile.write((char*)&a,size*sizeof(unsigned long long));
    }
    myfile.close();
}

已与Visual Studio 2010编译并进行了全面优化，并在Windows7下运行，该程序可以最大限度地发挥作用大约20MB / s。真正令我困扰的是Windows可以以150MB / s至200MB / s的速度将文件从其他SSD复制到该SSD。因此至少快7倍。这就是为什么我认为我应该能够走得更快。

Compiled with Visual Studio 2010 and full optimizations and run under Windows7 this program maxes out around 20MB/s. What really bothers me is that Windows can copy files from an other SSD to this SSD at somewhere between 150MB/s and 200MB/s. So at least 7 times faster. That's why I think I should be able to go faster.

有什么想法可以加快写作速度吗？

Any ideas how I can speed up my writing?

推荐答案

做到了这一点（2012年）：

This did the job (in the year 2012):

#include <stdio.h>
const unsigned long long size = 8ULL*1024ULL*1024ULL;
unsigned long long a[size];

int main()
{
    FILE* pFile;
    pFile = fopen("file.binary", "wb");
    for (unsigned long long j = 0; j < 1024; ++j){
        //Some calculations to fill a[]
        fwrite(a, 1, size*sizeof(unsigned long long), pFile);
    }
    fclose(pFile);
    return 0;
}

我刚刚在36秒内计时8GB，大约为220MB / s，我认为最大化了我的SSD。同样值得注意的是，该问题中的代码使用了一个核心100％，而该代码仅使用了2-5％。

I just timed 8GB in 36sec, which is about 220MB/s and I think that maxes out my SSD. Also worth to note, the code in the question used one core 100%, whereas this code only uses 2-5%.

非常感谢大家。

更新：到2017年已经过去了5年。编译器，硬件，库和我的要求已更改。这就是为什么我对代码进行了一些更改并进行了一些新的测量。

Update: 5 years have passed it's 2017 now. Compilers, hardware, libraries and my requirements have changed. That's why I made some changes to the code and did some new measurements.

首先编写代码：

#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>

std::vector<uint64_t> GenerateData(std::size_t bytes)
{
    assert(bytes % sizeof(uint64_t) == 0);
    std::vector<uint64_t> data(bytes / sizeof(uint64_t));
    std::iota(data.begin(), data.end(), 0);
    std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
    return data;
}

long long option_1(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    auto startTime = std::chrono::high_resolution_clock::now();
    auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    myfile.write((char*)&data[0], bytes);
    myfile.close();
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

long long option_2(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    auto startTime = std::chrono::high_resolution_clock::now();
    FILE* file = fopen("file.binary", "wb");
    fwrite(&data[0], 1, bytes, file);
    fclose(file);
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

long long option_3(std::size_t bytes)
{
    std::vector<uint64_t> data = GenerateData(bytes);

    std::ios_base::sync_with_stdio(false);
    auto startTime = std::chrono::high_resolution_clock::now();
    auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    myfile.write((char*)&data[0], bytes);
    myfile.close();
    auto endTime = std::chrono::high_resolution_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}

int main()
{
    const std::size_t kB = 1024;
    const std::size_t MB = 1024 * kB;
    const std::size_t GB = 1024 * MB;

    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option1, " << size / MB << "MB: " << option_1(size) << "ms" << std::endl;
    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option2, " << size / MB << "MB: " << option_2(size) << "ms" << std::endl;
    for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option3, " << size / MB << "MB: " << option_3(size) << "ms" << std::endl;

    return 0;
}

此代码使用Visual Studio 2017和g ++ 7.2.0编译（新要求）。
我用两种设置运行代码：

This code compiles with Visual Studio 2017 and g++ 7.2.0 (a new requirements). I ran the code with two setups:

笔记本电脑，Core i7，SSD，Ubuntu 16.04，g ++版本7.2.0 -std = c ++ 11 -march = native -O3

台式机，Core i7，SSD，Windows 10，带有/ Ox / Ob2 / Oi / Ot的Visual Studio 2017版本15.3.1 / GT / GL / Gy

进行了以下测量（在删除1MB的值之后，因为它们是明显的离群值）： b $ b

option1和option3都最大化我的SSD。我没想到这一点，因为option2曾经是我旧计算机上最快的代码。

Which gave the following measurements (after ditching the values for 1MB, because they were obvious outliers): Both times option1 and option3 max out my SSD. I didn't expect this to see, because option2 used to be the fastest code on my old machine back then.

TL; DR ：我的测量结果表明，在 FILE 上使用 std :: fstream 。

TL;DR: My measurements indicate to use std::fstream over FILE.

这篇关于如何快速将大型缓冲区写入C ++中的二进制文件？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何快速将大型缓冲区写入C ++中的二进制文件？ [英] How to write a large buffer into a binary file in C++, fast?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

如何快速将大型缓冲区写入C ++中的二进制文件？ [英] How to write a large buffer into a binary file in C++, fast?

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭