MMAP问题,分配大量内存 [英] mmap problem, allocates huge amounts of memory

查看:195
本文介绍了MMAP问题,分配大量内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到了我需要解析一些巨大的文件,人们一直建议的mmap因此应避免分配的内存中整个文件。

但看'顶'它看起来像我打开整个文件到内存中,所以我想我必须做一些错误的。 顶秀> 2.1演出

这是一个code片段,显示我在做什么。

感谢

 的#include<&stdio.h中GT;
#包括LT&;&stdlib.h中GT;
#包括LT&;&err.h GT;
#包括LT&;&fcntl.h GT;
#包括LT&;&sysexits.h GT;
#包括LT&;&unistd.h中GT;
#包括LT&; SYS / stat.h>
#包括LT&; SYS / types.h中>
#包括LT&; SYS / mman.h>
#包括LT&; CString的GT;
INT主(INT ARGC,CHAR *的argv []){
  结构统计某人;
  字符* P,* Q;
  //开放文件描述符
  INT FD =打开(的argv [1],O_RDONLY);
  //初始化一个统计用于获取文件大小
  如果(FSTAT(FD,&安培; SB)== -1){
    PERROR(FSTAT);
    返回1;
  }
  //做实际的MMAP,并保持指针的第一个元素
  P =(字符*)MMAP(0,sb.st_size,PROT_READ,MAP_SHARED,FD,0);
  Q =磷;
  //出了些问题
  如果(P == MAP_FAILED){
    PERROR(MMAP);
    返回1;
  }
  //让刚刚计算行数
  为size_t numlines = 0;
  而(* P ++!='\\ 0')
    如果(* P =='\\ n')
      numlines ++;
  fprintf中(标准错误,numlines:鲁%\\ n,numlines);
  //取消映射它
  如果(在munmap(Q,sb.st_size)== - 1){
    PERROR(则munmap);
    返回1;
  }
  如果(接近(FD)== -1){
    PERROR(亲密);
    返回1;
  }
  返回0;
}


解决方案

没有,你在做什么是的映射的文件到内存中。这是实际文件读入内存中的不同。

是你在读它,你就必须对全部内容转移到内存中。通过绘制它,你让操作系统处理它。如果试图读取或写入到该存储区的位置,操作系统将首先加载相关章节为您服务。它会的的,除非是需要整个文件加载整个文件。

这是你在何处获得的性能增益。如果您映射整个文件,但只改变一个字节,然后取消映射它,你会发现,没有太多的磁盘I / O的。

当然,如果你在文件中触及的每一​​个字节,那么,它都将在某个时候装载但不一定在物理RAM的一次。但是,这即使你整个文件加载前面的情况。操作系统将换出你的数据的部分,如果没有足够的物理内存来容纳这一切,与系统中的其他进程的一起。

存储器映射的主要优点是:


  • 您推迟读取文件部分在需要的,直到他们(如果是从来不需要他们,他们没有得到加载)。所以当你加载整个文件没有大的前期费用。它装载摊销的成本。

  • 的写操作自动化,你不必每一个字节写出来的。只需关闭它,操作系统将写出更改的部分。我想这也正好当内存被换出,以及(在低物理内存的情况下),因为你的缓冲区是一个简单的窗口到文件。

请最有可能有你的地址空间使用情况和你的物理内存的使用情况之间存在脱节。可以分配4G的地址空间(理想情况下,虽然有可能是操作系统,BIOS或硬件限制)在一个32位机只1G的RAM。操作系统处理分页,并从磁盘上。

和回答您的澄清进一步要求:


  

只是为了澄清。所以,如果我需要整个文件,MMAP实际上将整个文件加载?


是的,但它可能不会在的物理的内存的一次。操作系统将换出位回文件系统,才能在新的比特带上。

但它也将这样做,如果你读过手动整个文件。这两种情况之间的区别如下。

通过将文件手动读入内存,操作系统将交换你的地址空间的部分(可能包括数据也可能不会)到交换文件。你将需要手动改写文件时,你完成它。

使用内存映射,你已经有效地告诉它使用原始文件作为一个额外的交换区的该文件/内存仅的。而且,当数据被写入到的的交换区,它会立即影响到实际的文件。因此,没有不必手动重写,当你做了什么,并没有影响正常的掉期(通常情况下)。

这真的仅仅是一个窗口,该文件:

     
     
     
     

I got some huge files I need to parse, and people have been recommending mmap because this should avoid having to allocate the entire file in-memory.

But looking at 'top' it does look like I'm opening the entire file into the memory, so I think I must be doing something wrong. 'top shows >2.1 gig'

This is a code snippet that shows what I'm doing.

Thanks

#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <fcntl.h>
#include <sysexits.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <cstring>
int main (int argc, char *argv[] ) {
  struct stat sb;
  char *p,*q;
  //open filedescriptor
  int fd = open (argv[1], O_RDONLY);
  //initialize a stat for getting the filesize
  if (fstat (fd, &sb) == -1) {
    perror ("fstat");
    return 1;
  }
  //do the actual mmap, and keep pointer to the first element
  p =(char *) mmap (0, sb.st_size, PROT_READ, MAP_SHARED, fd, 0);
  q=p;
  //something went wrong
  if (p == MAP_FAILED) {
    perror ("mmap");
    return 1;
  }
  //lets just count the number of lines
  size_t numlines=0;
  while(*p++!='\0')
    if(*p=='\n')
      numlines++;
  fprintf(stderr,"numlines:%lu\n",numlines);
  //unmap it
  if (munmap (q, sb.st_size) == -1) {
    perror ("munmap");
    return 1;
  }
  if (close (fd) == -1) {
    perror ("close");
    return 1;
  }
  return 0;
}

解决方案

No, what you're doing is mapping the file into memory. This is different to actually reading the file into memory.

Were you to read it in, you would have to transfer the entire contents into memory. By mapping it, you let the operating system handle it. If you attempt to read or write to a location in that memory area, the OS will load the relevant section for you first. It will not load the entire file unless the entire file is needed.

That is where you get your performance gain. If you map the entire file but only change one byte then unmap it, you'll find that there's not much disk I/O at all.

Of course, if you touch every byte in the file, then yes, it will all be loaded at some point but not necessarily in physical RAM all at once. But that's the case even if you load the entire file up front. The OS will swap out parts of your data if there's not enough physical memory to contain it all, along with that of the other processes in the system.

The main advantages of memory mapping are:

  • you defer reading the file sections until they're needed (and, if they're never needed, they don't get loaded). So there's no big upfront cost as you load the entire file. It amortises the cost of loading.
  • The writes are automated, you don't have to write out every byte. Just close it and the OS will write out the changed sections. I think this also happens when the memory is swapped out as well (in low physical memory situations), since your buffer is simply a window onto the file.

Keep in mind that there is most likely a disconnect between your address space usage and your physical memory usage. You can allocate an address space of 4G (ideally, though there may be OS, BIOS or hardware limitations) in a 32-bit machine with only 1G of RAM. The OS handles the paging to and from disk.

And to answer your further request for clarification:

Just to clarify. So If I need the entire file, mmap will actually load the entire file?

Yes, but it may not be in physical memory all at once. The OS will swap out bits back to the filesystem in order to bring in new bits.

But it will also do that if you've read the entire file in manually. The difference between those two situations is as follows.

With the file read into memory manually, the OS will swap parts of your address space (may include the data or may not) out to the swap file. And you will need to manually rewrite the file when your finished with it.

With memory mapping, you have effectively told it to use the original file as an extra swap area for that file/memory only. And, when data is written to that swap area, it affects the actual file immediately. So no having to manually rewrite anything when you're done and no affecting the normal swap (usually).

It really is just a window to the file:

                       

这篇关于MMAP问题,分配大量内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆