mmap和内存使用情况 [英] mmap and memory usage

查看:695
本文介绍了mmap和内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,该程序从网络接收大量数据(大小不同的片段),将其处理并将其写入内存.由于某些数据可能非常大,因此我目前的方法是限制使用的缓冲区大小.如果一块大于最大缓冲区大小,我会将数据写入一个临时文件,然后再分块读取该文件以进行处理和永久存储.

I am writing a program that receives huge amounts of data (in pieces of different sizes) from the network, processes them and writes them to memory. Since some pieces of data can be very large, my current approach is limiting the buffer size used. If a piece is larger than the maximum buffer size, I write the data to a temporary file and later read the file in chunks for processing and permanent storage.

我想知道这是否可以改善.我已经阅读了一段时间的mmap,但是我不确定它是否可以帮助我.我的想法是使用mmap读取临时文件.这有什么帮助吗?我主要担心的是,偶尔的大量数据不应该填满我的主内存,导致其他所有内容都被交换掉.

I'm wondering if this can be improved. I've been reading about mmap for a while but I'm not one hundred percent sure if it can help me. My idea is to use mmap for reading the temporary file. Does this help in any way? The main thing I'm concerned about is that an occasional large piece of data should not fill up my main memory causing everything else to be swapped out.

此外,您认为使用临时文件的方法有用吗?我应该这样做还是应该信任linux内存管理器为我完成这项工作?还是我应该完全做其他事情?

Also, do you think the approach with temporary files is useful? Should I even be doing that or, perhaps, should I trust the linux memory manager to do the job for me? Or should I do something else altogether?

推荐答案

Mmap可以在某些方面为您提供帮助,我将通过一些假设的示例进行解释:

Mmap can help you in some ways, I'll explain with some hypothetical examples:

第一件事:假设您的内存不足,并且拥有100MB malloc内存块的应用程序将其中的50%换出,这意味着操作系统必须将50MB写入交换文件,如果您需要读回它,那么您已经写好了,占用了空间,然后再读回它50MB的交换文件.

First thing: Let's say you're running out of memory, and your application that have a 100MB chunk of malloc'ed memory get 50% of it swapped out, that means that the OS had to write 50MB to the swapfile, and if you need to read it back, you have written, occupied and then read it back again 50MB of your swapfile.

如果仅对内存进行了映射,则操作系统将不会将该信息写入交换文件(因为它知道该数据与文件本身相同),而只会刮擦50MB的信息(再次:假设您现在还没有写任何东西),仅此而已.如果您需要再次读取该内存,则操作系统将不会从交换文件中获取内容,而是从您已映射的原始文件中获取内容,因此,如果任何其他程序需要50MB的交换,它们都可以使用.交换文件操作也完全没有开销.

In case the memory was just mmap'ed, the operating system will not write that piece of information to the swapfile (as it knows that that data is identical to the file itself), instead, it will just scratch 50MB of information (again: supposing you have not written anything for now) and that's that. If you ever need that memory to be read again, the OS will fetch the contents not from the swapfile, but from the original file you've mmaped, so if any other program needs 50MB of swap, they're available. Also there is not overhead with swapfile manipulation at all.

假设您读取了100MB的数据块,并且根据标头数据的初始1MB,所需的信息位于偏移75MB处,因此您不需要1〜74.9MB之间的任何信息!您已经阅读了它,只是使您的代码更简单.使用mmap,您将仅读取您实际访问的数据(四舍五入后的4kb,或OS页面大小,通常为4kb),因此它将仅读取第一个和第75 MB.我认为,与映射文件相比,很难形成一种更简单,更有效的方式来避免磁盘读取. 如果由于某种原因您需要偏移量为37MB的数据,则可以使用它!您不必再次映射它,因为整个文件都可以在内存中访问(当然,受进程的内存空间限制).

Let's say you read a 100MB chunk of data, and according to the initial 1MB of header data, the information that you want is located at offset 75MB, so you don't need anything between 1~74.9MB! You have read it for nothing but to make your code simpler. With mmap, you will only read the data you have actually accessed (rounded 4kb, or the OS page size, which is mostly 4kb), so it would only read the first and the 75th MB. I think it's very hard to make a simpler and more effective way to avoid disk reading than mmaping files. And if by some reason you need the data at offset 37MB, you can just use it! You don't have to mmap it again, as the whole file is accessible in memory (of course limited by your process' memory space).

mmap'ed的所有文件均由其自身而不是swapfile进行备份,交换文件用于授予没有要备份的文件的数据,这些数据通常是已分配数据或已备份的数据通过文件,但是在程序实际上通过msync调用告诉OS这样做之前,它已经被更改并且[不能/不能]回写给它.

All files mmap'ed are backed up by themselves, not by the swapfile, the swapfile is made to grant data that doesn't have a file to back up, which usually is data malloc'ed or data that is backed up by a file, but it was altered and [can not/shall not] be written back to it before the program actually tells the OS to do so via a msync call.

请注意,您不需要映射整个文件在内存中,则可以从任意位置(第6个arg-"off_t偏移量")开始映射任何数量(第2个arg为"size_t length"),但除非您文件可能很大,即使系统仅包装64mb的物理内存,您也可以放心地安全映射1GB数据,但这只是为了读取,如果您打算进行写操作,则应该更加保守并且仅映射内容您需要的.

Beware that you don't need to map the whole file in the memory, you can map any amount (2nd arg is "size_t length") starting from any place (6th arg - "off_t offset"), but unless your file is likely to be enormous, you can safely map 1GB of data with no fear, even if the system only packs 64mb of physical memory, but that's for reading, if you plan on writing then you should be more conservative and map only the stuff that you need.

映射文件将帮助您简化代码(内存中已经有文件内容,可以使用了,因为它不是匿名内存,因此可以节省更少的内存开销),并且速度更快(仅读取程序中的数据)访问).

Mapping files will help you making your code simpler (you already have the file contents on the memory, ready to use, with much less memory overhead since it's not anonymous memory) and faster (you will only read the data that your program accessed).

这篇关于mmap和内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆