mmap用于写入顺序日志文件的速度? [英] mmap for writing sequential log file for speed?
问题描述
我想使用 mmap
(速度)写入日志文件,非结构化格式(一行一行)。什么是最好的程序?我打开空文件 truncate
到1页大小(写空字符串来调整文件大小?),然后 mmap
我通常使用 mmap
来写固定大小的结构,通常只有一页一个时间,但是这是写使用mmap的日志文件(任何地方从0.5 - 10 Gb),但不确定最佳做法是什么是第一个填充区域填充 - munmap
resize file truncate
和 mmap
下一页?
将日志写入内存区域,我将跟踪大小, msync
,一旦到达映射内存区域,什么是正确处理?
假设我从来不需要回去或覆盖现有的数据,所以我只是将新的数据写入文件。
Q1:当我到映射区域的结尾 munmap
, ftruncate
文件以另一个页面大小和 mmap
下一页?
Q2:有没有标准的方法来抢占下一页,写?在我们接近映射区域结束时,在另一个线程上执行此操作?
Q3:我 madvise
顺序访问?
这是用于实时数据处理,需要保留日志文件 - 目前我只是写入文件。日志文件是非结构化的,文本格式的,基于行的。
这是为linux / c ++ / c可选在Mac上测试(因此没有重映射[?])。 >
任何链接/指向最佳实践的链接。
关于fwrite VSmmap(测量传统I / O和存储器映射文件之间的性能权衡的实验)的比较的论文。首先,对于写入,你不必去去内存映射文件,特别是对于大文件。 fwrite
是完全正常的,并且几乎总是优于使用 mmap
的方法。 mmap
会为并行数据读取提供最大的性能提升;
p>在我的例子
remapSize
是文件的初始大小和文件在每次重新映射时增加的大小。 fileSize
跟踪文件大小, mappedSpace
表示当前mmap的大小 这里是示例初始化:
void init(){
fileDescriptor = open(outputPath,O_RDWR | O_CREAT | O_TRUNC,(mode_t)0600) ; //打开文件
result = ftruncate(fileDescriptor,remapSize); // Init size
fsync(fileDescriptor); // Flush
memoryMappedFile =(char *)mmap64(0,remapSize,PROT_WRITE,MAP_SHARED,fileDescriptor,0); // Create mmap
fileSize = remapSize; // Store mapped size
mappedSpace = remapSize; // Store mapped size
}
广告Q1: / p>
我使用了Unmap-Remap机制。
p>
- 首次刷新(
msync
) - 然后取消映射内存映射文件。
这可能如下:
void unmap(){
msync(memoryMappedFile,mappedSpace,MS_SYNC); // Flush
munmap(memoryMappedFile,mappedSpace)
}
>重新映射,您可以选择重新映射整个文件或仅重新映射新添加的部分。
基本重新映射
- 增加文件大小
- 创建新的内存映射
完整重映射的示例实现:
void fullRemap(){
ftruncate(fileDescriptor,mappedSpace + remapSize); // Make file larger
fsync(fileDescriptor); // Flush file
memoryMappedFile =(char *)mmap64(0,mappedSpace + remapSize,PROT_WRITE,MAP_SHARED,fileDescriptor,0); //在更大的文件上创建新映射
fileSize + = reampSize;
mappedSpace + = remapSize; //将mappedSpace设置为新大小
}
小重映射的示例实现:
void smallRemap(){
ftruncate(fileDescriptor,fileSize + remapSize); // Make file larger
fsync(fileDescriptor); // Flush file
remapAt = alreadyWrittenBytes%pageSize == 0
? alreadyWrittenBytes
:alreadyWrittenBytes - (alreadyWrittenBytes%pageSize); //将remap位置调整为pagesize
memoryMappedFile =(char *)mmap64(0,fileSize + remapSize - remapAt,PROT_WRITE,MAP_SHARED,fileDescriptor,remapAt); //创建内存映射
fileSize + = remapSize;
mappedSpace = fileSize - remapAt;
}
有 mremap function
,但它表示
此调用是特定于Linux的,不应在程序
中用于便携。
< blockquote>
广告Q2:
我不确定我是否明白这一点。如果你想告诉内核现在加载下一页,然后没有,这是不可能的(至少据我所知)。
广告Q3:
您可以使用
MADV_SEQUENTIAL
标志使用madvise
,但请记住,
摘录 man :
/ strong>导致内核积极预读
个人结论:
不要使用
mmap
进行顺序数据写入。这将导致更多的开销,并导致比使用fwrite
的简单写入算法更多的不自然代码。
使用
mmap
可随机存取读取大型档案。
这也是我的论文。我不能通过使用
mmap
来实现任何加速,事实上,为了这个目的总是较慢。I want to write log file, unstructured format (one line at a time), using
mmap
(for speed). What is the best procedure? Do I open empty file,truncate
to 1 page size (write empty string to resize file?), thenmmap
- and repeat when mmaped area full?I usually use
mmap
for writing fixed size structures, usually just one page at a time, however this is for writing log files (anywhere from 0.5 - 10 Gb) using mmap but not sure what's the best practice once the first mmaped area is filled -munmap
, resize filetruncate
andmmap
next page ?While writing logs to memory area, I would tracking size, and
msync
, what is the proper handling once I get to the end of the mapped memory area?Let's say I never need to go back or overwrite existing data, so I only write new data to file.
Q1: When I get to the end of mapped area do I
munmap
,ftruncate
file to resize by another page size andmmap
the next page ?Q2: Is there a standard way to pre-empt and have the next page ready in memory for next write? Do this on another thread when we get close to the end of mapped area ?
Q3: Do I
madvise
for sequential access?This is for real time data processing with requirement to keep log file - currently I just write to file. Log file is unstructured, text format, line based.
This is for linux/c++/c optionally testing on Mac (so no remap [?]).
Any links/pointers to best practices appreciated.
解决方案I wrote my bachelor thesis about the comparism of fwrite VS mmap ("An Experiment to Measure the Performance Trade-off between Traditional I/O and Memory-mapped Files"). First of all, for writing, you don't have to go for memory-mapped files, espacially for large files.
fwrite
is totally fine and will nearly always outperform approaches usingmmap
.mmap
will give you the most performance boosts for parallel data reading; for sequential data writing your real limitation withfwrite
is your hardware.
In my examples
remapSize
is the initial size of the file and the size by which the file gets increased on each remapping.fileSize
keeps track of the size of the file,mappedSpace
represents the size of the current mmap (it's length),alreadyWrittenBytes
are the bytes that have already been written to the file.Here is the example initalization:
void init() { fileDescriptor = open(outputPath, O_RDWR | O_CREAT | O_TRUNC, (mode_t) 0600); // Open file result = ftruncate(fileDescriptor, remapSize); // Init size fsync(fileDescriptor); // Flush memoryMappedFile = (char*) mmap64(0, remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create mmap fileSize = remapSize; // Store mapped size mappedSpace = remapSize; // Store mapped size }
Ad Q1:
I used an "Unmap-Remap"-mechanism.
Unmap
- first flushes (
msync
)- and then unmaps the memory-mapped file.
This could look the following:
void unmap() { msync(memoryMappedFile, mappedSpace, MS_SYNC); // Flush munmap(memoryMappedFile, mappedSpace) }
For Remap, you have the choice to remap the whole file or only the newly appended part.
Remap basically
- increases the file size
- creates the new memory map
Example implementation for a full remap:
void fullRemap() { ftruncate(fileDescriptor, mappedSpace + remapSize); // Make file bigger fsync(fileDescriptor); // Flush file memoryMappedFile = (char*) mmap64(0, mappedSpace + remapSize, PROT_WRITE, MAP_SHARED, fileDescriptor, 0); // Create new mapping on the bigger file fileSize += reampSize; mappedSpace += remapSize; // Set mappedSpace to new size }
Example implementation for the small remap:
void smallRemap() { ftruncate(fileDescriptor, fileSize + remapSize); // Make file bigger fsync(fileDescriptor); // Flush file remapAt = alreadyWrittenBytes % pageSize == 0 ? alreadyWrittenBytes : alreadyWrittenBytes - (alreadyWrittenBytes % pageSize); // Adjust remap location to pagesize memoryMappedFile = (char*) mmap64(0, fileSize + remapSize - remapAt, PROT_WRITE, MAP_SHARED, fileDescriptor, remapAt); // Create memory-map fileSize += remapSize; mappedSpace = fileSize - remapAt; }
There is a
mremap function
out there, yet it statesThis call is Linux-specific, and should not be used in programs intended to be portable.
Ad Q2:
I'm not sure if I understood that point right. If you want to tell the kernel "and now load the next page", then no, this is not possible (at least to my knowledge). But see Ad Q3 on how to advise the kernel.
Ad Q3:
You can use
madvise
with the flagMADV_SEQUENTIAL
, yet keep in mind that this does not enforce the kernel to read ahead, but only advices it.Excerp form the man:
This may cause the kernel to aggressively read-ahead
Personal conclusion:
Do not use
mmap
for sequential data writing. It will just cause much more overhead and will lead to much more "unnatural" code than a simple writing alogrithm usingfwrite
.Use
mmap
for random access reads to large files.This are also the results that were obtained during my thesis. I was not able to achieve any speedup by using
mmap
for sequential writing, in fact, it was always slower for this purpose.这篇关于mmap用于写入顺序日志文件的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!