C ++ io流与mmap [英] C++ io streams versus mmap
问题描述
我在C ++中开始一个键值存储的小项目。我想知道C ++ std流如何与mmap在可伸缩性和性能方面进行比较。
I am starting a small project for a key-value store, in C++. I am wondering how C++ std streams compare to mmap in terms of scalability and performance. How does using ifstream::seekg on a file that wouldn't fit in RAM compare to using mmap/lseek?
推荐答案
如何使用ifstream :: seekg对不适合RAM的文件,任何Linux用户域应用程序正在使用 syscalls(2) ,包括C + + I / O库。
Ultimately, any Linux user-land application is using syscalls(2), including the C++ I/O library.
使用非常小心, mmap
madvise
(或 lseek
+ 阅读
& posix_fadvise
)可能更有效率的C ++流(使用读
和其他 syscalls(2) ...);但是系统调用的滥用(例如读
- 太小的缓冲区)可能导致灾难性的性能。
With great care, mmap
and madvise
(or lseek
+ read
& posix_fadvise
) could be more efficient that C++ streams (which are using read
and other syscalls(2)...); but a misuse of syscalls (e.g. read
-ing too small buffer) can give catastrophic performance
非常好的页面缓存(用于包含最近访问的文件数据的一部分)。性能还取决于文件系统(硬件-SSD和机械硬盘是不同的野兽) - 和电脑)。
Also, Linux has a very good page cache (used to contain parts of recently accessed file data). And performance also depends upon the file system (and the hardware -SSD and mechanical hard disks are different beasts- and computer).
也许你不应该重塑自己的事情,并使用 sqlite 或 gdbm 或 redis 或 mongodb 或 postgresql 或 memcached 等。 。
Maybe you should not reinvent your own thing and use sqlite, or gdbm, or redis, or mongodb, or postgresql, or memcached, etc...
性能和权衡取决于实际使用(笔记本电脑上的单个4Gbyte日志文件与数据中心中的PB级视频或基因组数据不同) )。所以基准(并注意到许多工具,像我提到的,可以明智地调整)。
Performance and trade-offs depend strongly on the actual use (a single 4Gbytes log file on your laptop is not the same as petabytes of video or genomics data in a datacenter). So benchmark (and notice that many tools like the ones I mentioned can be tuned wisely).
这篇关于C ++ io流与mmap的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!