mmap真的将数据复制到内存吗? [英] Does mmap really copy data to the memory?

查看:399
本文介绍了mmap真的将数据复制到内存吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据说mmap()将文件映射到内存,并且花费到调用进程的虚拟地址空间内存中.它是否真的将数据复制到内存中,或者数据仍然存在于磁盘中? mmap()read()快吗?

It is said that mmap() maps files to the memory, and it costs to the virtual address space memory of the calling process. Does it really copy data to the memory, or the data still exists in the disk? Is mmap() faster than read()?

推荐答案

mmap函数真正唯一要做的就是更改某些内核数据结构,甚至可能更改页表.实际上,它根本没有将任何内容放入物理内存.调用mmap后,分配的区域可能甚至都没有指向物理内存:访问该区域将导致页面错误.这种页面错误由内核透明地处理,实际上,这是内核的主要职责之一.

The only thing the mmap function really does is change some kernel data structures, and possibly the page table. It doesn't actually put anything into physical memory at all. After you call mmap, the allocated region probably doesn't even point to physical memory: accessing it will cause a page fault. This kind of page fault is transparently handled by the kernel, in fact, this is one of the kernel's primary duties.

mmap发生的事情是数据保留在磁盘上,并在您的进程读取数据时将其从磁盘复制到内存.也可以推测性地将其复制到物理内存.当您的进程被换出时,mmap区域中的页面不必写入来进行交换,因为它们已经得到了长期存储的支持-当然,除非您对其进行了修改.

What happens with mmap is that the data remains on disk, and it is copied from disk to memory as your process reads it. It can also be copied to physical memory speculatively. When your process gets swapped out, the pages in the mmap region do not have to be written to swap because they are already backed by long-term storage -- unless you have modified them, of course.

但是,mmap会占用虚拟地址空间,就像malloc和其他类似功能(大多数在后台使用mmapsbrk,这基本上是mmap的特殊版本)一样.使用mmap读取文件和read读取文件之间的主要区别在于,在mmap区域中未修改的页面不会对总体内存造成压力,从内存的角度来看,它们几乎是空闲"的,只要因为它们没有被使用.相比之下,使用read函数读取的文件将始终导致内存压力,无论它们是否正在使用以及是否已被修改.

However, mmap will consume virtual address space, just like malloc and other similar functions (which mostly use mmap behind the scenes, or sbrk, which is basically a special version of mmap). The main difference between using mmap to read a file and read to read a file is that unmodified pages in an mmap region do not contribute to overall memory pressure, they are almost "free", memory wise, as long as they are not being used. In contrast, files read with the read function will always contribute to memory pressure whether they are being used or not, and whether they have been modified or not.

最后,mmap仅在它喜欢的用例(随机访问和页面重用)中才比read快.对于线性遍历文件(尤其是小文件),read通常会更快,因为它不需要修改页表,并且需要更少的系统调用.

Finally, mmap is faster than read only in the use cases which it favors -- random access and page reuse. For linearly traversing a file, especially a small file, read will generally be faster since it does not require modifying the page tables, and it takes fewer system calls.

作为建议,我可以说,要扫描的任何大文件通常应在64位系统上使用mmap完整读取,并且可以在32位系统上以大块的形式mmap读取虚拟内存较少的系统.

As a recommendation, I can say that any large file which you will be scanning through should generally be read in its entirety with mmap on 64-bit systems, and you can mmap it in chunks on 32-bit systems where virtual memory is less available.

另请参见: mmap()与阅读块

另请参阅(感谢James):何时应该使用mmap用于文件访问?

See also (thanks to James): When should I use mmap for file access?

这篇关于mmap真的将数据复制到内存吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆