如何可移植地扩展使用mmap()访问的文件 [英] How to portably extend a file accessed using mmap()

查看:98
本文介绍了如何可移植地扩展使用mmap()访问的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在尝试更改嵌入式数据库系统SQLite, 使用mmap()代替通常的read()和write()调用来访问 磁盘上的数据库文件.在整个过程中使用单个大型映射 文件.假设文件足够小,我们没有问题 在虚拟内存中为此找到空间.

We're experimenting with changing SQLite, an embedded database system, to use mmap() instead of the usual read() and write() calls to access the database file on disk. Using a single large mapping for the entire file. Assume that the file is small enough that we have no trouble finding space for this in virtual memory.

到目前为止,一切都很好.在许多情况下,使用mmap()似乎要快一些 而不是read()和write().而且在某些情况下要快得多.

So far so good. In many cases using mmap() seems to be a little faster than read() and write(). And in some cases much faster.

调整映射大小以提交写事务, 扩展数据库文件似乎是一个问题.为了扩展 数据库文件,代码可以执行以下操作:

Resizing the mapping in order to commit a write-transaction that extends the database file seems to be a problem. In order to extend the database file, the code could do something like this:

  ftruncate();    // extend the database file on disk 
  munmap();       // unmap the current mapping (it's now too small)
  mmap();         // create a new, larger, mapping

然后将新数据复制到新内存映射的末尾. 但是,munmap/mmap是不可取的,因为它意味着每个下一次 访问数据库文件的页面时,发生较小的页面错误,并且 系统必须在OS页面缓存中搜索正确的帧以 与虚拟内存地址关联.换句话说,它减慢了 减少后续的数据库读取.

then copy the new data into the end of the new memory mapping. However, the munmap/mmap is undesirable as it means the next time each page of the database file is accessed a minor page fault occurs and the system has to search the OS page cache for the correct frame to associate with the virtual memory address. In other words, it slows down subsequent database reads.

在Linux上,我们可以改用非标准的mremap()系统调用 调整munmap()/mmap()的大小.这似乎避免了 较小的页面错误.

On Linux, we can use the non-standard mremap() system call instead of munmap()/mmap() to resize the mapping. This seems to avoid the minor page faults.

问题:如何在OSX等其他系统上处理此问题, 没有mremap()?

QUESTION: How should this be dealt with on other systems, like OSX, that do not have mremap()?

我们目前有两个想法.还有一个关于每个的问题:

We have two ideas at present. And a question regarding each:

1)创建大于数据库文件的映射.然后,当扩展 数据库文件,只需调用ftruncate()即可扩展文件 磁盘并继续使用相同的映射.

1) Create mappings larger than the database file. Then, when extending the database file, simply call ftruncate() to extend the file on disk and continue using the same mapping.

这将是理想的,并且似乎在实践中起作用.但是,我们 担心手册页中的此警告:

This would be ideal, and seems to work in practice. However, we're worried about this warning in the man page:

更改基础文件大小的影响 对应于已添加或删除区域的页面上的映射 该文件未指定."

"The effect of changing the size of the underlying file of a mapping on the pages that correspond to added or removed regions of the file is unspecified."

问题:这是我们应该担心的事情吗?或过时 在这一点上?

QUESTION: Is this something we should be worried about? Or an anachronism at this point?

2)扩展数据库文件时,请使用第一个参数mmap() 请求对应于数据库新页面的映射 文件位于虚拟中当前映射之后 记忆.有效地扩展了初始映射.如果系统 无法兑现之后立即放置新映射的请求 首先,回到munmap/mmap.

2) When extending the database file, use the first argument to mmap() to request a mapping corresponding to the new pages of the database file located immediately after the current mapping in virtual memory. Effectively extending the initial mapping. If the system can't honour the request to place the new mapping immediately after the first, fall back to munmap/mmap.

在实践中,我们发现OSX在定位方面非常出色 以这种方式进行映射,因此该技巧在那里起作用.

In practice, we've found that OSX is pretty good about positioning mappings in this way, so this trick works there.

问题:如果系统确实立即分配了第二个映射 跟随虚拟内存中的第一个之后,最终是否安全? 使用对munmap()的一个大调用就可以取消对它们的映射?

QUESTION: if the system does allocate the second mapping immediately following the first in virtual memory, is it then safe to eventually unmap them both using a single big call to munmap()?

推荐答案

  1. 在可用的情况下使用fallocate()而不是ftruncate().如果没有,只需在O_APPEND模式下打开文件并通过写入一些零来增加文件.这样可以大大减少碎片.

  1. Use fallocate() instead of ftruncate() where available. If not, just open file in O_APPEND mode and increase file by writing some amount of zeroes. This greatly reduce fragmentation.

使用大页面"(如果可用)–大大减少了大映射的开销.

Use "Huge pages" if available - this greatly reduce overhead on big mappings.

pread()/pwrite()/pwritev()/preadv()确实并不慢.比实际执行IO的速度快得多.

pread()/pwrite()/pwritev()/preadv() with not-so-small block size is not slow really. Much faster than IO can actually be performed.

使用mmap()时发生的IO错误只会生成segfault,而不是EIO左右.

IO errors when using mmap() will generate just segfault instead of EIO or so.

大多数SQLite WRITE性能问题都集中在良好的事务使用上(即,您应该在COMMIT实际执行时进行调试).

The most of SQLite WRITE performance problems is concentrated in good transactional use (i.e. you should debug when COMMIT actually performed).

这篇关于如何可移植地扩展使用mmap()访问的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆