快速调整mmap文件的大小 [英] Fast resize of a mmap file

查看:723
本文介绍了快速调整mmap文件的大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个非常大的mmap文件的无副本调整大小,同时仍然允许并发访问读取器线程.

简单的方法是在同一文件上的同一过程中使用两个MAP_SHARED映射(增长文件,然后创建第二个包含增长区域的映射),然后一旦所有可以访问它的读者都取消了对旧映射的映射完成的.但是,我很好奇下面的方案是否可行,如果可以的话,是否有任何好处.

  1. 使用MAP_PRIVATE映射文件
  2. 对多个线程进行只读访问
  3. 要么获取文件的互斥体,然后将其写入内存(假设这样做是为了使可能正在读取该内存的读者不会被它弄乱)
  4. 或获取互斥锁,但是增加文件的大小,并使用mremap将其移动到新地址(调整映射的大小,而无需复制或不必要的文件IO.)

疯狂的部分出现在(4).如果您移动内存,旧地址将变为无效,并且仍在读取该地址的读取器可能会突然出现访问冲突.如果我们修改读取器以捕获此访问冲突,然后重新启动操作(即,不要重新读取错误的地址,请给定偏移量的地址和mremap的新基址来重新计算),该怎么办? ,但在我看来,读者只能成功读取旧地址上的数据,否则会因访问冲突而失败并重试.如果采取了足够的措施,则应该是安全的.由于调整大小不会经常发生,因此读者最终会成功,而不会陷入重试循环中.

如果在阅读器仍具有指向旧地址空间的指针的情况下重新使用该旧地址空间,则可能会发生问题.这样就不会出现访问冲突,但是数据将不正确,程序将进入未定义行为的独角兽和糖果填充区(其中通常既没有独角兽也没有糖果.)

但是,如果您完全控制了分配并且可以确定在此期间发生的任何分配都不会再使用旧的地址空间,那么这应该不成问题,并且行为也不应不确定. /p>

我是对的吗?这行得通吗?与使用两个MAP_SHARED映射相比,这有什么好处吗?

解决方案

对于我来说,很难想象您不知道文件的大小上限的情况.假设这是真的,您可以通过在首次使用mmap()映射文件时提供该大小来保留"文件的最大大小.当然,超出文件实际大小的任何访问都将导致访问冲突,但这就是您无论如何都希望其工作的方式-您可能会争辩说,保留额外的地址空间可确保,而不是而不是让该地址范围开放供其他对mmap()或malloc()之类的调用使用.

无论如何,关键是我的解决方案,您永远不会移动地址范围,只需更改其大小,现在锁定就在为每个线程提供当前有效大小的数据结构周围.

如果文件太多,我的解决方案将无法正常工作,以至于每个文件的最大映射都使您用尽了地址空间,但是这是64位地址空间的使用期限,因此希望您的最大映射大小没有问题

(只是为了确保我没有忘记一些愚蠢的事情,我确实写了一个小程序说服自己,当您尝试访问超出文件大小的文件时,创建大于文件大小的映射会导致访问冲突,并且然后,只要将文件ftruncate()变大,并且都具有从第一次mmap()调用返回的相同地址,就可以正常工作.)

I need a copy-free re-size of a very large mmap file while still allowing concurrent access to reader threads.

The simple way is to use two MAP_SHARED mappings (grow the file, then create a second mapping that includes the grown region) in the same process over the same file and then unmap the old mapping once all readers that could access it are finished. However, I am curious if the scheme below could work, and if so, is there any advantage to it.

  1. mmap a file with MAP_PRIVATE
  2. do read-only access to this memory in multiple threads
  3. either acquire a mutex for the file, write to the memory (assume this is done in a way that the readers, which may be reading that memory, are not messed up by it)
  4. or acquire the mutex, but increase the size of the file and use mremap to move it to a new address (resize the mapping without copying or unnecessary file IO.)

The crazy part comes in at (4). If you move the memory the old addresses become invalid, and the readers, which are still reading it, may suddenly have an access violation. What if we modify the readers to trap this access violation and then restart the operation (i.e. don't re-read the bad address, re-calculate the address given the offset and the new base address from mremap.) Yes I know that's evil, but to my mind the readers can only successfully read the data at the old address or fail with an access violation and retry. If sufficient care is taken, that should be safe. Since re-sizing would not happen often, the readers would eventually succeed and not get stuck in a retry loop.

A problem could occur if that old address space is re-used while a reader still has a pointer to it. Then there will be no access violation, but the data will be incorrect and the program enters the unicorn and candy filled land of undefined behavior (wherein there is usually neither unicorns nor candy.)

But if you controlled allocations completely and could make certain that any allocations that happen during this period do not ever re-use that old address space, then this shouldn't be a problem and the behavior shouldn't be undefined.

Am I right? Could this work? Is there any advantage to this over using two MAP_SHARED mappings?

解决方案

It is hard for me to imagine a case where you don't know the upper bound on how large the file can be. Assuming that's true, you could "reserve" the address space for the maximum size of the file by providing that size when the file is first mapped in with mmap(). Of course, any accesses beyond the actual size of the file will cause an access violation, but that's how you want it to work anyway -- you could argue that reserving the extra address space ensures the access violation rather than leaving that address range open to being used by other calls to things like mmap() or malloc().

Anyway, the point is with my solution, you never move the address range, you only change its size and now your locking is around the data structure that provides the current valid size to each thread.

My solution doesn't work if you have so many files that the maximum mapping for each file runs you out of address space, but this is the age of the 64-bit address space so hopefully your maximum mapping size is no problem.

(Just to make sure I wasn't forgetting something stupid, I did write a small program to convince myself creating the larger-than-file-size mapping gives an access violation when you try to access beyond the file size, and then works fine once you ftruncate() the file to be larger, all with the same address returned from the first mmap() call.)

这篇关于快速调整mmap文件的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆