mmap用于写入顺序日志文件的速度? [英] mmap for writing sequential log file for speed?

查看:629
本文介绍了mmap用于写入顺序日志文件的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 mmap (速度)写入日志文件,非结构化格式(一行一行)。什么是最好的程序?我打开空文件 truncate 到1页大小(写空字符串来调整文件大小?),然后 mmap



我通常使用 mmap 来写固定大小的结构,通常只有一页一个时间,但是这是写使用mmap的日志文件(任何地方从0.5 - 10 Gb),但不确定最佳做法是什么是第一个填充区域填充 - munmap resize file truncate mmap 下一页?



将日志写入内存区域,我将跟踪大小, msync ,一旦到达映射内存区域,什么是正确处理?



假设我从来不需要回去或覆盖现有的数据,所以我只是将新的数据写入文件。



Q1:当我到映射区域的结尾 munmap ftruncate 文件以另一个页面大小和 mmap 下一页?



Q2:有没有标准的方法来抢占下一页,写?在我们接近映射区域结束时,在另一个线程上执行此操作?



Q3:我 madvise 顺序访问?



这是用于实时数据处理,需要保留日志文件 - 目前我只是写入文件。日志文件是非结构化的,文本格式的,基于行的。



这是为linux / c ++ / c可选在Mac上测试(因此没有重映射[?])。 >

任何链接/指向最佳实践的链接。

解决方案

关于fwrite VSmmap(测量传统I / O和存储器映射文件之间的性能权衡的实验)的比较的论文。首先,对于写入,你不必去去内存映射文件,特别是对于大文件。 fwrite 是完全正常的,并且几乎总是优于使用 mmap 的方法。 mmap 会为并行数据读取提供最大的性能提升;






p>在我的例子 remapSize 是文件的初始大小和文件在每次重新映射时增加的大小。
fileSize 跟踪文件大小, mappedSpace 表示当前mmap的大小



这里是示例初始化:

  void init(){
fileDescriptor = open(outputPath,O_RDWR | O_CREAT | O_TRUNC,(mode_t)0600) ; //打开文件
result = ftruncate(fileDescriptor,remapSize); // Init size
fsync(fileDescriptor); // Flush
memoryMappedFile =(char *)mmap64(0,remapSize,PROT_WRITE,MAP_SHARED,fileDescriptor,0); // Create mmap
fileSize = remapSize; // Store mapped size
mappedSpace = remapSize; // Store mapped size
}

广告Q1: / p>

我使用了Unmap-Remap机制。



p>


  • 首次刷新( msync

  • 然后取消映射内存映射文件。



这可能如下:

  void unmap(){
msync(memoryMappedFile,mappedSpace,MS_SYNC); // Flush
munmap(memoryMappedFile,mappedSpace)
}

>重新映射,您可以选择重新映射整个文件或仅重新映射新添加的部分。



基本重新映射




  • 增加文件大小

  • 创建新的内存映射



完整重映射的示例实现:

  void fullRemap(){
ftruncate(fileDescriptor,mappedSpace + remapSize); // Make file larger
fsync(fileDescriptor); // Flush file
memoryMappedFile =(char *)mmap64(0,mappedSpace + remapSize,PROT_WRITE,MAP_SHARED,fileDescriptor,0); //在更大的文件上创建新映射
fileSize + = reampSize;
mappedSpace + = remapSize; //将mappedSpace设置为新大小
}

小重映射的示例实现:

  void smallRemap(){
ftruncate(fileDescriptor,fileSize + remapSize); // Make file larger
fsync(fileDescriptor); // Flush file
remapAt = alreadyWrittenBytes%pageSize == 0
? alreadyWrittenBytes
:alreadyWrittenBytes - (alreadyWrittenBytes%pageSize); //将remap位置调整为pagesize
memoryMappedFile =(char *)mmap64(0,fileSize + remapSize - remapAt,PROT_WRITE,MAP_SHARED,fileDescriptor,remapAt); //创建内存映射
fileSize + = remapSize;
mappedSpace = fileSize - remapAt;
}

mremap function ,但它表示


此调用是特定于Linux的,不应在程序
中用于便携。


< blockquote>

广告Q2:



我不确定我是否明白这一点。如果你想告诉内核现在加载下一页,然后没有,这是不可能的(至少据我所知)。



广告Q3:



您可以使用 MADV_SEQUENTIAL 标志使用 madvise ,但请记住,



摘录 man


/ strong>导致内核积极预读


个人结论



不要使用 mmap 进行顺序数据写入。这将导致更多的开销,并导致比使用 fwrite 的简单写入算法更多的不自然代码。



使用 mmap 可随机存取读取大型档案。



这也是我的论文。我不能通过使用 mmap 来实现任何加速,事实上,为了这个目的总是较慢。


I want to write log file, unstructured format (one line at a time), using mmap (for speed). What is the best procedure? Do I open empty file, truncate to 1 page size (write empty string to resize file?), then mmap - and repeat when mmaped area full?

I usually use mmap for writing fixed size structures, usually just one page at a time, however this is for writing log files (anywhere from 0.5 - 10 Gb) using mmap but not sure what's the best practice once the first mmaped area is filled - munmap, resize file truncate and mmap next page ?

While writing logs to memory area, I would tracking size, and msync , what is the proper handling once I get to the end of the mapped memory area?

Let's say I never need to go back or overwrite existing data, so I only write new data to file.

Q1: When I get to the end of mapped area do I munmap, ftruncate file to resize by another page size and mmap the next page ?

Q2: Is there a standard way to pre-empt and have the next page ready in memory for next write? Do this on another thread when we get close to the end of mapped area ?

Q3: Do I madvise for sequential access?

This is for real time data processing with requirement to keep log file - currently I just write to file. Log file is unstructured, text format, line based.

This is for linux/c++/c optionally testing on Mac (so no remap [?]).

Any links/pointers to best practices appreciated.

解决方案

I wrote my bachelor thesis about the comparism of fwrite VS mmap ("An Experiment to Measure the Performance Trade-off between Traditional I/O and Memory-mapped Files"). First of all, for writing, you don't have to go for memory-mapped files, espacially for large files. fwrite is totally fine and will nearly always outperform approaches using mmap. mmap will give you the most performance boosts for parallel data reading; for sequential data writing your real limitation with fwrite is your hardware.


In my examples remapSize is the initial size of the file and the size by which the file gets increased on each remapping. fileSize keeps track of the size of the file, mappedSpace represents the size of the current mmap (it's length), alreadyWrittenBytes are the bytes that have already been written to the file.

Here is the example initalization:

void init() {
  fileDescriptor = open(outputPath, O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600); // Open file
  result = ftruncate(fileDescriptor, remapSize); // Init size
  fsync(fileDescriptor); // Flush
  memoryMappedFile = (char*) mmap64(0, remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create mmap
  fileSize = remapSize; // Store mapped size
  mappedSpace = remapSize; // Store mapped size
}

Ad Q1:

I used an "Unmap-Remap"-mechanism.

Unmap

  • first flushes (msync)
  • and then unmaps the memory-mapped file.

This could look the following:

void unmap() {
  msync(memoryMappedFile, mappedSpace, MS_SYNC); // Flush
  munmap(memoryMappedFile, mappedSpace)
}

For Remap, you have the choice to remap the whole file or only the newly appended part.

Remap basically

  • increases the file size
  • creates the new memory map

Example implementation for a full remap:

void fullRemap() {
  ftruncate(fileDescriptor, mappedSpace + remapSize); // Make file bigger
  fsync(fileDescriptor); // Flush file
  memoryMappedFile = (char*) mmap64(0, mappedSpace + remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create new mapping on the bigger file
  fileSize += reampSize;
  mappedSpace += remapSize; // Set mappedSpace to new size
}

Example implementation for the small remap:

void smallRemap() {
  ftruncate(fileDescriptor, fileSize + remapSize); // Make file bigger
  fsync(fileDescriptor); // Flush file
  remapAt = alreadyWrittenBytes % pageSize == 0 
            ? alreadyWrittenBytes 
            : alreadyWrittenBytes - (alreadyWrittenBytes % pageSize); // Adjust remap location to pagesize
  memoryMappedFile = (char*) mmap64(0, fileSize + remapSize - remapAt, PROT_WRITE, MAP_SHARED, fileDescriptor, remapAt); // Create memory-map
  fileSize += remapSize;
  mappedSpace = fileSize - remapAt;
}

There is a mremap function out there, yet it states

This call is Linux-specific, and should not be used in programs intended to be portable.

Ad Q2:

I'm not sure if I understood that point right. If you want to tell the kernel "and now load the next page", then no, this is not possible (at least to my knowledge). But see Ad Q3 on how to advise the kernel.

Ad Q3:

You can use madvise with the flag MADV_SEQUENTIAL, yet keep in mind that this does not enforce the kernel to read ahead, but only advices it.

Excerp form the man:

This may cause the kernel to aggressively read-ahead

Personal conclusion:

Do not use mmap for sequential data writing. It will just cause much more overhead and will lead to much more "unnatural" code than a simple writing alogrithm using fwrite.

Use mmap for random access reads to large files.

This are also the results that were obtained during my thesis. I was not able to achieve any speedup by using mmap for sequential writing, in fact, it was always slower for this purpose.

这篇关于mmap用于写入顺序日志文件的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆