提高file_mapping性能 [英] boost file_mapping performance
问题描述
我写了一个小测试来比较boost file_mapping
和 std :: ofstream
之间的文件写入操作.我的印象是file_mapping性能会更好,但事实并非如此.
I wrote a small test to compare file writing operations between boost file_mapping
and std::ofstream
. I was under the impression that file_mapping performance would be superior but it is apparently not the case.
有人可以解释为什么我使用 std :: ofstream
可以获得更好的数字吗?
Can someone explain why I would get better numbers with std::ofstream
?
:因此,我对基准测试进行了分析,并发现 boost :: iostreams :: detail :: direct_streambuf
花费了大量时间来复制字节.我添加了一个新测试,该测试使用的是 std :: copy_n
而不是 ostream.write
.现在的性能似乎要好得多.我还更新了测试代码,以与不同的文件大小进行比较.
: So I did a profiling of my benchmark test and noticed that boost::iostreams::detail::direct_streambuf
was spending lots of time copying bytes. I've added a new test which is using std::copy_n
instead of ostream.write
. The performance seems much better now. I have also updated the test code to compare with different file size.
与 std :: copy_n
相比,boost iostream direct_streambuf
确实在高容量上苦苦挣扎.我想找到一个更好的替代方案,因为我的应用程序基于ostream,而且我负担不起重构的费用.
The boost iostream direct_streambuf
is really struggling on high volumes compared to std::copy_n
. I'd like to find a better alternative instead as my app is based on ostream and I can't afford the refactoring.
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/stream.hpp>
#include <vector>
#include <chrono>
#include <iostream>
#include <fstream>
int test_mapped_file_ostream(size_t TOTAL_SIZE, size_t BLOCK_SIZE, size_t N)
{
const std::string filename = "test_filemapping.dat";
boost::interprocess::file_mapping::remove(filename.data());
{
std::ofstream file(filename, std::ios::binary | std::ios::trunc);
file.seekp(static_cast<std::streamoff>(TOTAL_SIZE-1));
file.write("", 1);
}
std::chrono::system_clock::time_point start;
std::chrono::system_clock::time_point end;
{
boost::interprocess::file_mapping fmap(filename.data(), boost::interprocess::read_write);
boost::interprocess::mapped_region mreg(fmap, boost::interprocess::read_write);
mreg.advise( boost::interprocess::mapped_region::advice_sequential );
std::shared_ptr<std::streambuf> buf( new boost::iostreams::stream_buffer<boost::iostreams::array_sink>((char*)(mreg.get_address()), mreg.get_size()));
std::ostream ostream( buf.get() );
const std::vector<char> data(BLOCK_SIZE,1);
start=std::chrono::system_clock::now();
for ( size_t i=0; i<N; i++ ) {
ostream.write( data.data(), data.size() );
}
end=std::chrono::system_clock::now();
}
auto total = end-start;
std::cout << "test_mapped_file_ostream (ms): " << std::chrono::duration_cast<std::chrono::milliseconds>(total).count() << std::endl;
return 0;
}
int test_mapped_file_stdcopy_n(size_t TOTAL_SIZE, size_t BLOCK_SIZE, size_t N)
{
const std::string filename = "test_filemapping_stdcopy.dat";
boost::interprocess::file_mapping::remove(filename.data());
{
std::ofstream file(filename, std::ios::binary | std::ios::trunc);
file.seekp(static_cast<std::streamoff>(TOTAL_SIZE-1));
file.write("", 1);
}
std::chrono::system_clock::time_point start;
std::chrono::system_clock::time_point end;
{
boost::interprocess::file_mapping fmap(filename.data(), boost::interprocess::read_write);
boost::interprocess::mapped_region mreg(fmap, boost::interprocess::read_write);
mreg.advise( boost::interprocess::mapped_region::advice_sequential );
char* regptr = (char*)mreg.get_address();
const std::vector<char> data(BLOCK_SIZE,1);
start=std::chrono::system_clock::now();
for ( size_t i=0; i<N; i++ ) {
std::copy_n( data.data(), data.size(), regptr );
regptr += data.size();
}
end=std::chrono::system_clock::now();
}
auto total = end-start;
std::cout << "test_mapped_file_stdcopy_n (ms): " << std::chrono::duration_cast<std::chrono::milliseconds>(total).count() << std::endl;
return 0;
}
int test_fstream_file(size_t TOTAL_SIZE, size_t BLOCK_SIZE, size_t N)
{
const std::string filename = "test_fstream.dat";
std::chrono::system_clock::time_point start;
std::chrono::system_clock::time_point end;
{
const std::vector<char> data(BLOCK_SIZE,1);
std::ofstream file(filename, std::ios::binary | std::ios::trunc);
start=std::chrono::system_clock::now();
for ( size_t i=0; i<N; i++ ) {
file.write( data.data(), data.size() );
}
end=std::chrono::system_clock::now();
}
auto total = end-start;
std::cout << "test_fstream_file (ms): " << std::chrono::duration_cast<std::chrono::milliseconds>(total).count() << std::endl;
return 0;
}
int main(int argc, char **argv)
{
if ( argc != 2 ) {
std::cout << "Usage: " << argv[0] << " <size of output file in gigabytes>" << std::endl;
exit(1);
}
uint64_t totalsize = std::stoull(argv[1]);
if (totalsize==0) {
totalsize = 1;
}
const std::size_t GB = (uint64_t)1 << 30;
const std::size_t TOTAL_SIZE = totalsize << 30;
const std::size_t BLOCK_SIZE = (uint64_t)1 << 20;
const std::size_t N = TOTAL_SIZE/BLOCK_SIZE;
std::cout << "TOTAL_SIZE (GB)=" << TOTAL_SIZE/GB << std::endl;
test_mapped_file_ostream(TOTAL_SIZE,BLOCK_SIZE,N);
test_mapped_file_stdcopy_n(TOTAL_SIZE,BLOCK_SIZE,N);
test_fstream_file(TOTAL_SIZE,BLOCK_SIZE,N);
return 0;
}
结果:Windows 7,HHD,64GB RAM
与(code)fstream.write 的性能比(以毫秒为单位):
Performance ratios compared to fstream.write
in (ms):
TOTAL_SIZE (GB)=5
test_mapped_file_ostream (ms): 24610 (-1.88x)
test_mapped_file_stdcopy_n (ms): 3307 (3.9x)
test_fstream_file (ms): 13052
TOTAL_SIZE (GB)=10
test_mapped_file_ostream (ms): 49524 (-1.3x)
test_mapped_file_stdcopy_n (ms): 6610 (5.8x)
test_fstream_file (ms): 38219
TOTAL_SIZE (GB)=15
test_mapped_file_ostream (ms): 85041 (1.52x)
test_mapped_file_stdcopy_n (ms): 12387 (10.5x)
test_fstream_file (ms): 129964
TOTAL_SIZE (GB)=20
test_mapped_file_ostream (ms): 122897 (1.7x)
test_mapped_file_stdcopy_n (ms): 17542 (12.2x)
test_fstream_file (ms): 213697
分析
推荐答案
无论如何,您都在使用面向文本的 ostream
.这将占格式化流所花费时间的很大一部分.
You're using a text oriented ostream
anyways. This is going to account for a large portion of the time taken formatting to the stream.
除此之外,请考虑对顺序访问进行恶意调整.
Other than that consider madvising for sequential access.
最后通过配置文件找到瓶颈
Finally profile to find your bottle necks
我已经用所有已知的技巧解决了这个问题,并提出了以下真正的POSIX mmap
vs. write
比较.
I've hit this problem with all the tricks I know and came up with the following really bare-bones POSIX mmap
vs.write
comparison.
在适用的情况下,我将 madvise
和 fadvise
与 SEQUENTIAL | WILL_NEED
一起使用,并确保稀疏不是导致速度变慢的原因.
I used madvise
and fadvise
with SEQUENTIAL|WILL_NEED
where applicable, and made sure that sparseness wasn't a cause for slowness.
这一切的简短摘要是:
- 您的代码确实可以简化很多(请参阅修订版176f546ea8f65050c)
- 对于较小的体积,地图速度很快
- 缓冲可能是使基于流的实现而不是基于mmap的实现闪耀的原因
#include <boost/chrono.hpp>
#include <boost/chrono/chrono_io.hpp>
#include <iostream>
#include <vector>
#include <algorithm>
// mmap the manual way
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#ifndef COLIRU
const std::size_t TOTAL_SIZE = 5ul << 30;
const std::size_t BLOCK_SIZE = 1ul << 20;
#else
const std::size_t TOTAL_SIZE = 1ul << 20;
const std::size_t BLOCK_SIZE = 1ul << 9;
#endif
static_assert(0 == TOTAL_SIZE%BLOCK_SIZE, "not divisable by block size");
const int N = TOTAL_SIZE/BLOCK_SIZE;
template <typename Caption, typename F>
auto timed(Caption const& task, F&& f) {
using namespace boost::chrono;
struct _ {
high_resolution_clock::time_point s;
Caption const& task;
~_() { std::cout << " -- (" << task << " completed in " << duration_cast<milliseconds>(high_resolution_clock::now() - s) << ")\n"; }
} timing { high_resolution_clock::now(), task };
return f();
}
void test_mapped_file() {
std::vector<char> const data(BLOCK_SIZE, 1);
const std::string filename = "test_filemapping.dat";
std::remove(filename.c_str());
int fd = open(filename.c_str(), O_RDWR|O_CREAT, 0644);
if (fd==-1) {
perror("open");
exit(255);
}
if(posix_fallocate64(fd, 0, TOTAL_SIZE)) {
perror("fallocate64");
exit(255);
}
posix_fadvise64(fd, 0, TOTAL_SIZE, POSIX_FADV_WILLNEED | POSIX_FADV_SEQUENTIAL);
char* fmap = static_cast<char*>(mmap64(nullptr, TOTAL_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0));
if (!fmap || fmap == MAP_FAILED) {
perror("mmap");
exit(255);
}
madvise(fmap, TOTAL_SIZE, MADV_SEQUENTIAL | MADV_WILLNEED);
timed(filename, [output=fmap, &data] () mutable {
for (size_t i = 0; i < N; i++) {
std::copy_n(data.data(), data.size(), output);
output += data.size();
}
});
munmap(fmap, TOTAL_SIZE);
close(fd);
}
void test_posix_write() {
std::vector<char> const data(BLOCK_SIZE, 1);
const std::string filename = "test_posix.dat";
std::remove(filename.c_str());
int fd = open(filename.c_str(), O_RDWR|O_CREAT, 0644);
if (fd==-1) {
perror("open");
exit(255);
}
posix_fadvise64(fd, 0, TOTAL_SIZE, POSIX_FADV_WILLNEED | POSIX_FADV_SEQUENTIAL);
timed(filename, [&] () mutable {
for (size_t i = 0; i < N; i++) {
ptrdiff_t count = ::write(fd, data.data(), data.size());
if (-1 == count) {
perror("write");
exit(255);
}
assert(count == BLOCK_SIZE);
}
});
close(fd);
}
int main() {
test_mapped_file();
test_posix_write();
}
在Coliru打印件上进行测试时:
When tested on Coliru prints:
./a.out; md5sum *.dat
-- (test_filemapping.dat completed in 0 milliseconds)
-- (test_posix.dat completed in 8 milliseconds)
d35bb2e58b602d94ccd9628f249ae7e5 test_filemapping.dat
d35bb2e58b602d94ccd9628f249ae7e5 test_posix.dat
本地运行(5GiB卷):
Run locally (5GiB volumes):
$ ./test
-- (test_filemapping.dat completed in 1950 milliseconds)
-- (test_posix.dat completed in 1307 milliseconds)
这篇关于提高file_mapping性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!