用于随机读取的mmap与O_DIRECT（涉及哪些缓冲区？） [英] mmap vs O_DIRECT for random reads (what are the buffers involved?)

查看：353 发布时间：2017/11/4 22:00:18 c file-io buffer hashtable mmap

本文介绍了用于随机读取的mmap与O_DIRECT（涉及哪些缓冲区？）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在实现一个基于磁盘的哈希表，支持大量的密钥（2600万以上）。该值是反序列化的。读取在整个文件中基本上是随机的，值小于页面大小，而且我正在为SSD进行优化。安全性/一致性不是那么大的问题（性能问题）。

我目前的解决方案包括使用 mmap（）文件与 MADV_RANDOM | MADV_DONTNEED 设置为禁止内核预取，只根据需要加载数据。

我知道内核从磁盘读取到内存缓冲区，我反序列化。

如果我调用 read（），我仍然复制到一个缓冲区（我反序列化），所以我可以获得任何优势吗？

在哪里可以找到有关 mmap（）文件涉及的缓冲区的更多信息，并调用 read（）打开的文件中有 O_DIRECT ？

我对预读或缓存不感兴趣。对于我的用例来说，它没有什么可提供的。

解决方案当数据绕过系统缓冲区时，O_DIRECT是读/写操作的选项，并将directlty从缓冲区复制到磁盘控制器。为了获得O_DIRECT的好处，需要
符合一些条件 - 保持与由I / O块对齐的内存页面缓冲区地址和缓冲区大小保持一致。

无论如何，如果您使用mmap，则不使用读/写。而且，在mmap之后，你可以关闭文件描述符，映射依然有效。因此，O_DIRECT与mmap选项无用。

我可以推荐什么来提高性能：
$ b $ ol

如果你的子系统有很多搜索丢失键的请求，你可以在内存中创建Bloom filter。此后，您将匹配Bloom filter上的搜索键 http://en.wikipedia.org/wiki/Bloom_filter，并拒绝丢失的密钥，而不需要实际的请求到磁盘。

为了节省内存，使用2级方案，当桶头保留在mmap （）中读取文件的桶本身。

这两个选项我在my自动完成子系统，你可以在这里看到它： http://olegh.ftp.sh/autocomplete.html

I am implementing a disk based hashtable supporting large amount of keys (26+ million). The value is deserialized. Reads are essentially random throughout the file, values are less than the page size, and I am optimising for SSDs. Safety/consistency are not such huge issues (performance matters).

My current solution involves using a mmap() file with MADV_RANDOM | MADV_DONTNEED set to disable prefetching by the kernel and only load data as needed on-demand.

I get the idea that the kernel reads from disk to memory buffer, and I deserialize from there.

What about O_DIRECT? If I call read(), I'm still copying into a buffer (which I deserialize from) so can I gain any advantage?

Where can I find more info on the buffers involved with a mmap() file and calling read() on a file opened with O_DIRECT?

I am not interested in read ahead or caching. It has nothing to offer for my use case.

解决方案

O_DIRECT is option for read/write operations, when data bypass system buffers, and copied directlty from your buffer to disk controller. For get advantages of O_DIRECT, need to comply some conditions - keep aligned by memory page buffer address and buffer size aligned by I/O block.

Anyway, if you use mmap, you do not use read/write. Moreover, after mmap, you can close file descriptor, and mapping will still works. So, O_DIRECT useless with mmap option.

What can I recommend for increase performance:

If your subsystem has many request for search missing key, you can create Bloom filter in the memory. Thereafter, you match your search key on Bloom filter http://en.wikipedia.org/wiki/Bloom_filter, and reject missing keys, without actual request to disk.
For conserve memory, use 2-level scheme, when bucket heads you keep in the mmap-ed memory, but buckets itself you read from file by pread().

Both options I implemented in the my autocomplete subsytem, you can see it online here: http://olegh.ftp.sh/autocomplete.html and estimate performance on the slow old computer - Celeron-300.

这篇关于用于随机读取的mmap与O_DIRECT（涉及哪些缓冲区？）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用于随机读取的mmap与O_DIRECT（涉及哪些缓冲区？） [英] mmap vs O_DIRECT for random reads (what are the buffers involved?)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用于随机读取的mmap与O_DIRECT（涉及哪些缓冲区？） [英] mmap vs O_DIRECT for random reads (what are the buffers involved?)

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭