提高file_mapping性能 [英] boost file_mapping performance

查看:72
本文介绍了提高file_mapping性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我写了一个小测试来比较boost file_mapping std :: ofstream 之间的文件写入操作.我的印象是file_mapping性能会更好,但事实并非如此.

I wrote a small test to compare file writing operations between boost file_mapping and std::ofstream. I was under the impression that file_mapping performance would be superior but it is apparently not the case.

有人可以解释为什么我使用 std :: ofstream 可以获得更好的数字吗?

Can someone explain why I would get better numbers with std::ofstream?

:因此,我对基准测试进行了分析,并发现 boost :: iostreams :: detail :: direct_streambuf 花费了大量时间来复制字节.我添加了一个新测试,该测试使用的是 std :: copy_n 而不是 ostream.write .现在的性能似乎要好得多.我还更新了测试代码,以与不同的文件大小进行比较.

: So I did a profiling of my benchmark test and noticed that boost::iostreams::detail::direct_streambuf was spending lots of time copying bytes. I've added a new test which is using std::copy_n instead of ostream.write. The performance seems much better now. I have also updated the test code to compare with different file size.

std :: copy_n 相比,boost iostream direct_streambuf 确实在高容量上苦苦挣扎.我想找到一个更好的替代方案,因为我的应用程序基于ostream,而且我负担不起重构的费用.

The boost iostream direct_streambuf is really struggling on high volumes compared to std::copy_n. I'd like to find a better alternative instead as my app is based on ostream and I can't afford the refactoring.

#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/stream.hpp>
#include <vector>
#include <chrono>
#include <iostream>
#include <fstream>

int test_mapped_file_ostream(size_t TOTAL_SIZE, size_t BLOCK_SIZE, size_t N)
{
    const std::string filename = "test_filemapping.dat";
    boost::interprocess::file_mapping::remove(filename.data());

    {
    std::ofstream file(filename, std::ios::binary | std::ios::trunc);
    file.seekp(static_cast<std::streamoff>(TOTAL_SIZE-1));
    file.write("", 1);
    }

    std::chrono::system_clock::time_point start;
    std::chrono::system_clock::time_point end;
    {
        boost::interprocess::file_mapping fmap(filename.data(), boost::interprocess::read_write);
        boost::interprocess::mapped_region mreg(fmap, boost::interprocess::read_write);
        mreg.advise( boost::interprocess::mapped_region::advice_sequential );

        std::shared_ptr<std::streambuf> buf( new boost::iostreams::stream_buffer<boost::iostreams::array_sink>((char*)(mreg.get_address()), mreg.get_size()));
        std::ostream ostream( buf.get() );

        const std::vector<char> data(BLOCK_SIZE,1);

        start=std::chrono::system_clock::now();     
        for ( size_t i=0; i<N; i++ ) {
            ostream.write( data.data(), data.size() );
        }
        end=std::chrono::system_clock::now();       
    }

    auto total = end-start;
    std::cout << "test_mapped_file_ostream (ms): " << std::chrono::duration_cast<std::chrono::milliseconds>(total).count() << std::endl;

    return 0;
}

int test_mapped_file_stdcopy_n(size_t TOTAL_SIZE, size_t BLOCK_SIZE, size_t N)
{
    const std::string filename = "test_filemapping_stdcopy.dat";
    boost::interprocess::file_mapping::remove(filename.data());

    {
    std::ofstream file(filename, std::ios::binary | std::ios::trunc);
    file.seekp(static_cast<std::streamoff>(TOTAL_SIZE-1));
    file.write("", 1);
    }

    std::chrono::system_clock::time_point start;
    std::chrono::system_clock::time_point end;
    {
        boost::interprocess::file_mapping fmap(filename.data(), boost::interprocess::read_write);
        boost::interprocess::mapped_region mreg(fmap, boost::interprocess::read_write);
        mreg.advise( boost::interprocess::mapped_region::advice_sequential );

        char* regptr = (char*)mreg.get_address();
        const std::vector<char> data(BLOCK_SIZE,1);

        start=std::chrono::system_clock::now();     
        for ( size_t i=0; i<N; i++ ) {
            std::copy_n( data.data(), data.size(), regptr );
            regptr += data.size();
        }
        end=std::chrono::system_clock::now();       
    }

    auto total = end-start;
    std::cout << "test_mapped_file_stdcopy_n (ms): " << std::chrono::duration_cast<std::chrono::milliseconds>(total).count() << std::endl;

    return 0;
}

int test_fstream_file(size_t TOTAL_SIZE, size_t BLOCK_SIZE, size_t N)
{
    const std::string filename = "test_fstream.dat";

    std::chrono::system_clock::time_point start;
    std::chrono::system_clock::time_point end;
    {
        const std::vector<char> data(BLOCK_SIZE,1);
        std::ofstream file(filename, std::ios::binary | std::ios::trunc);
        start=std::chrono::system_clock::now();     
        for ( size_t i=0; i<N; i++ ) {
            file.write( data.data(), data.size() );
        }
        end=std::chrono::system_clock::now();       
    }
    auto total = end-start;
    std::cout << "test_fstream_file (ms): " << std::chrono::duration_cast<std::chrono::milliseconds>(total).count() << std::endl;

    return 0;
}

int main(int argc, char **argv)
{
    if ( argc != 2 ) {
        std::cout << "Usage: " << argv[0] << " <size of output file in gigabytes>" << std::endl;
        exit(1);
    }

    uint64_t totalsize = std::stoull(argv[1]);
    if (totalsize==0) {
        totalsize = 1;
    }

    const std::size_t GB = (uint64_t)1 << 30; 
    const std::size_t TOTAL_SIZE = totalsize << 30; 
    const std::size_t BLOCK_SIZE = (uint64_t)1 << 20;
    const std::size_t N = TOTAL_SIZE/BLOCK_SIZE;

    std::cout << "TOTAL_SIZE (GB)=" << TOTAL_SIZE/GB << std::endl;
    test_mapped_file_ostream(TOTAL_SIZE,BLOCK_SIZE,N);
    test_mapped_file_stdcopy_n(TOTAL_SIZE,BLOCK_SIZE,N);
    test_fstream_file(TOTAL_SIZE,BLOCK_SIZE,N);
    return 0;
}

结果:Windows 7,HHD,64GB RAM

与(code)fstream.write 的性能比(以毫秒为单位):

Performance ratios compared to fstream.write in (ms):

TOTAL_SIZE (GB)=5
test_mapped_file_ostream (ms): 24610 (-1.88x)
test_mapped_file_stdcopy_n (ms): 3307 (3.9x)
test_fstream_file (ms): 13052

TOTAL_SIZE (GB)=10
test_mapped_file_ostream (ms): 49524 (-1.3x)
test_mapped_file_stdcopy_n (ms): 6610 (5.8x)
test_fstream_file (ms): 38219

TOTAL_SIZE (GB)=15
test_mapped_file_ostream (ms): 85041 (1.52x)
test_mapped_file_stdcopy_n (ms): 12387 (10.5x)
test_fstream_file (ms): 129964

TOTAL_SIZE (GB)=20
test_mapped_file_ostream (ms): 122897 (1.7x)
test_mapped_file_stdcopy_n (ms): 17542 (12.2x)
test_fstream_file (ms): 213697

分析

推荐答案

无论如何,您都在使用面向文本的 ostream .这将占格式化流所花费时间的很大一部分.

You're using a text oriented ostream anyways. This is going to account for a large portion of the time taken formatting to the stream.

除此之外,请考虑对顺序访问进行恶意调整.

Other than that consider madvising for sequential access.

最后通过配置文件找到瓶颈

Finally profile to find your bottle necks

我已经用所有已知的技巧解决了这个问题,并提出了以下真正的POSIX mmap vs. write 比较.

I've hit this problem with all the tricks I know and came up with the following really bare-bones POSIX mmap vs.write comparison.

在适用的情况下,我将 madvise fadvise SEQUENTIAL | WILL_NEED 一起使用,并确保稀疏不是导致速度变慢的原因.

I used madvise and fadvise with SEQUENTIAL|WILL_NEED where applicable, and made sure that sparseness wasn't a cause for slowness.

这一切的简短摘要是:

  • 您的代码确实可以简化很多(请参阅修订版176f546ea8f65050c)
  • 对于较小的体积,地图速度很快
  • 缓冲可能是使基于流的实现而不是基于mmap的实现闪耀的原因

在Coliru上直播

#include <boost/chrono.hpp>
#include <boost/chrono/chrono_io.hpp>
#include <iostream>
#include <vector>
#include <algorithm>

// mmap the manual way
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#ifndef COLIRU
const std::size_t TOTAL_SIZE = 5ul << 30;
const std::size_t BLOCK_SIZE = 1ul << 20;
#else
const std::size_t TOTAL_SIZE = 1ul << 20;
const std::size_t BLOCK_SIZE = 1ul <<  9;
#endif
static_assert(0 == TOTAL_SIZE%BLOCK_SIZE, "not divisable by block size");
const int N = TOTAL_SIZE/BLOCK_SIZE;

template <typename Caption, typename F>
auto timed(Caption const& task, F&& f) {
    using namespace boost::chrono;
    struct _ {
        high_resolution_clock::time_point s;
        Caption const& task;
        ~_() { std::cout << " -- (" << task << " completed in " << duration_cast<milliseconds>(high_resolution_clock::now() - s) << ")\n"; }
    } timing { high_resolution_clock::now(), task };

    return f();
}

void test_mapped_file() {
    std::vector<char> const data(BLOCK_SIZE, 1);
    const std::string filename = "test_filemapping.dat";

    std::remove(filename.c_str());

    int fd = open(filename.c_str(), O_RDWR|O_CREAT, 0644);

    if (fd==-1) {
        perror("open");
        exit(255);
    }

    if(posix_fallocate64(fd, 0, TOTAL_SIZE)) {
        perror("fallocate64");
        exit(255);
    }

    posix_fadvise64(fd, 0, TOTAL_SIZE, POSIX_FADV_WILLNEED | POSIX_FADV_SEQUENTIAL);

    char* fmap = static_cast<char*>(mmap64(nullptr, TOTAL_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0));

    if (!fmap || fmap == MAP_FAILED) {
        perror("mmap");
        exit(255);
    }

    madvise(fmap, TOTAL_SIZE, MADV_SEQUENTIAL | MADV_WILLNEED);

    timed(filename, [output=fmap, &data] () mutable {
        for (size_t i = 0; i < N; i++) {
            std::copy_n(data.data(), data.size(), output);
            output += data.size();
        }
    });

    munmap(fmap, TOTAL_SIZE);
    close(fd);
}

void test_posix_write() {
    std::vector<char> const data(BLOCK_SIZE, 1);
    const std::string filename = "test_posix.dat";

    std::remove(filename.c_str());

    int fd = open(filename.c_str(), O_RDWR|O_CREAT, 0644);

    if (fd==-1) {
        perror("open");
        exit(255);
    }

    posix_fadvise64(fd, 0, TOTAL_SIZE, POSIX_FADV_WILLNEED | POSIX_FADV_SEQUENTIAL);

    timed(filename, [&] () mutable {
        for (size_t i = 0; i < N; i++) {
            ptrdiff_t count = ::write(fd, data.data(), data.size());
            if (-1 == count) { 
                perror("write");
                exit(255);
            }
            assert(count == BLOCK_SIZE);
        }
    });

    close(fd);
}

int main() {
    test_mapped_file();
    test_posix_write();
}

在Coliru打印件上进行测试时:

When tested on Coliru prints:

./a.out; md5sum *.dat
 -- (test_filemapping.dat completed in 0 milliseconds)
 -- (test_posix.dat completed in 8 milliseconds)
d35bb2e58b602d94ccd9628f249ae7e5  test_filemapping.dat
d35bb2e58b602d94ccd9628f249ae7e5  test_posix.dat

本地运行(5GiB卷):

Run locally (5GiB volumes):

$ ./test
 -- (test_filemapping.dat completed in 1950 milliseconds)
 -- (test_posix.dat completed in 1307 milliseconds)

这篇关于提高file_mapping性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆