用于随机读取的mmap与O_DIRECT(涉及哪些缓冲区?) [英] mmap vs O_DIRECT for random reads (what are the buffers involved?)
问题描述
我目前的解决方案包括使用
mmap()
文件与 MADV_RANDOM | MADV_DONTNEED
设置为禁止内核预取,只根据需要加载数据。 我知道内核从磁盘读取到内存缓冲区,我反序列化。
如果我调用 read()
,我仍然复制到一个缓冲区(我反序列化),所以我可以获得任何优势吗?在哪里可以找到有关 mmap()
文件涉及的缓冲区的更多信息,并调用 read()
打开的文件中有 O_DIRECT
?
我对预读或缓存不感兴趣。对于我的用例来说,它没有什么可提供的。
符合一些条件 - 保持与由I / O块对齐的内存页面缓冲区地址和缓冲区大小保持一致。
无论如何,如果您使用mmap,则不使用读/写。而且,在mmap之后,你可以关闭文件描述符,映射依然有效。因此,O_DIRECT与mmap选项无用。
我可以推荐什么来提高性能:
$ b $ ol
如果你的子系统有很多搜索丢失键的请求,你可以在内存中创建Bloom filter。此后,您将匹配Bloom filter上的搜索键 http://en.wikipedia.org/wiki/Bloom_filter,并拒绝丢失的密钥,而不需要实际的请求到磁盘。
为了节省内存,使用2级方案,当桶头保留在mmap ()中读取文件的桶本身。
这两个选项我在my自动完成子系统,你可以在这里看到它: http://olegh.ftp.sh/autocomplete.html然后估计慢速的电脑 - 赛扬-300的性能。 I am implementing a disk based hashtable supporting large amount of keys (26+ million). The value is deserialized. Reads are essentially random throughout the file, values are less than the page size, and I am optimising for SSDs. Safety/consistency are not such huge issues (performance matters). My current solution involves using a I get the idea that the kernel reads from disk to memory buffer, and I deserialize from there. What about Where can I find more info on the buffers involved with a I am not interested in read ahead or caching. It has nothing to offer for my use case. O_DIRECT is option for read/write operations, when data bypass system buffers, and copied directlty from your buffer to disk controller. For get advantages of O_DIRECT, need to
comply some conditions - keep aligned by memory page buffer address and buffer size aligned by I/O block. Anyway, if you use mmap, you do not use read/write. Moreover, after mmap, you can close file descriptor, and mapping will still works. So, O_DIRECT useless with mmap option. What can I recommend for increase performance: If your subsystem has many request for search missing key, you can create Bloom filter in the memory. Thereafter, you match your search key on Bloom filter http://en.wikipedia.org/wiki/Bloom_filter, and reject missing keys, without actual request to disk. For conserve memory, use 2-level scheme, when bucket heads you keep in the mmap-ed memory, but buckets itself you read from file by pread(). Both options I implemented in the my autocomplete subsytem, you can see it online here: http://olegh.ftp.sh/autocomplete.html and estimate performance on the slow old computer - Celeron-300. 这篇关于用于随机读取的mmap与O_DIRECT(涉及哪些缓冲区?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!mmap()
file with MADV_RANDOM | MADV_DONTNEED
set to disable prefetching by the kernel and only load data as needed on-demand.O_DIRECT
? If I call read()
, I'm still copying into a buffer (which I deserialize from) so can I gain any advantage?mmap()
file and calling read()
on a file opened with O_DIRECT
?