一个进程内分配上写内存拷贝 [英] Allocating copy on write memory within a process

查看:180
本文介绍了一个进程内分配上写内存拷贝的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我这是通过 MMAP MAP_ANONYMOUS 的一个内存段。

I have a memory segment which was obtained via mmap with MAP_ANONYMOUS.

我如何分配它引用第一个相同大小的第二存储段,使这两个副本上(目前工作的Linux 2.6.36)在Linux中写?

How can I allocate a second memory segment of the same size which references the first one and make both copy-on write in Linux (Working Linux 2.6.36 at the moment)?

我想有完全按照叉同样的效果,刚而不创建新的进程。我希望新的映射停留在相同的过程。

I want to have exactly the same effect as fork, just without creating a new process. I want the new mapping to stay in the same process.

整个过程必须在原产地都重复和拷贝的网页(好像父母和孩子将继续)。

The whole process has to be repeatable on both the origin and copy pages (as if parent and child would continue to fork).

为什么我不想分配整段的直副本的原因是因为他们是大的多千兆字节,我不希望使用的内存可能被写入时复制共享。

The reason why I don't want to allocate a straight copy of the whole segment is because they are multiple gigabytes large and I don't want to use memory which could be copy-on-write shared.

我曾尝试:

MMAP 段共享的,匿名的。
在重复则mprotect 它为只读和创建 remap_file_pages 第二映射也只读的。

mmap the segment shared, anonymous. On duplication mprotect it to read-only and create a second mapping with remap_file_pages also read-only.

然后使用 libsigsegv 拦截写尝试,手动进行页面的副本,然后则mprotect 既要读-write。

Then use libsigsegv to intercept write attempts, manually make a copy of the page and then mprotect both to read-write.

做的伎俩,但非常脏。我基本上是实现我自己的虚拟机。

Does the trick, but is very dirty. I am essentially implementing my own VM.

不幸的是 MMAP ING 的/ proc /自/ MEM 不支持当前的Linux,否则 MAP_PRIVATE 的映射有可能做的伎俩。

Sadly mmaping /proc/self/mem is not supported on current Linux, otherwise a MAP_PRIVATE mapping there could do the trick.

写入时复制机制是Linux的虚拟机的一部分,必须有一种方式来使用它们,而无需创建一个新的进程。

Copy-on-write mechanics are part of the Linux VM, there has to be a way to make use of them without creating a new process.

作为一个说明:
我已经找到了合适的技工在马赫VM。

As a note: I have found the appropriate mechanics in the Mach VM.

以下code编译我的OS X 10.7.5,具有预期的行为:
达尔文达尔文11.4.2版本的内核11.4.2:周四8月23日十六时25分48秒PDT 2012;根:XNU-1699.32.7〜1 / RELEASE_X86_64 x86_64的I386

The following code compiles on my OS X 10.7.5 and has the expected behaviour: Darwin 11.4.2 Darwin Kernel Version 11.4.2: Thu Aug 23 16:25:48 PDT 2012; root:xnu-1699.32.7~1/RELEASE_X86_64 x86_64 i386

gcc版本4.2.1(基于苹果公司建立5658)(LLVM建设2336.11.00)

#include <sys/mman.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#ifdef __MACH__
#include <mach/mach.h>
#endif


int main() {

    mach_port_t this_task = mach_task_self();

    struct {
        size_t rss;
        size_t vms;
        void * a1;
        void * a2;
        char p1;
        char p2;
        } results[3];

    size_t length = sysconf(_SC_PAGE_SIZE);
    vm_address_t first_address;
    kern_return_t result = vm_allocate(this_task, &first_address, length, VM_FLAGS_ANYWHERE);

    if ( result != ERR_SUCCESS ) {
        fprintf(stderr, "Error allocating initial 0x%zu memory.\n", length);
           return -1;
    }

    char * first_address_p = first_address;
    char * mirror_address_p;
    *first_address_p = 'a';

    struct task_basic_info t_info;
    mach_msg_type_number_t t_info_count = TASK_BASIC_INFO_COUNT;

    task_info(this_task, TASK_BASIC_INFO, (task_info_t)&t_info, &t_info_count);

    task_info(this_task, TASK_BASIC_INFO, (task_info_t)&t_info, &t_info_count);
    results[0].rss = t_info.resident_size;
    results[0].vms = t_info.virtual_size;
    results[0].a1 = first_address_p;
    results[0].p1 = *first_address_p;

    vm_address_t mirrorAddress;
    vm_prot_t cur_prot, max_prot;
    result = vm_remap(this_task,
                      &mirrorAddress,   // mirror target
                      length,    // size of mirror
                      0,                 // auto alignment
                      1,                 // remap anywhere
                      this_task,  // same task
                      first_address,     // mirror source
                      1,                 // Copy
                      &cur_prot,         // unused protection struct
                      &max_prot,         // unused protection struct
                      VM_INHERIT_COPY);

    if ( result != ERR_SUCCESS ) {
        perror("vm_remap");
        fprintf(stderr, "Error remapping pages.\n");
              return -1;
    }

    mirror_address_p = mirrorAddress;

    task_info(this_task, TASK_BASIC_INFO, (task_info_t)&t_info, &t_info_count);
    results[1].rss = t_info.resident_size;
    results[1].vms = t_info.virtual_size;
    results[1].a1 = first_address_p;
    results[1].p1 = *first_address_p;
    results[1].a2 = mirror_address_p;
    results[1].p2 = *mirror_address_p;

    *mirror_address_p = 'b';

    task_info(this_task, TASK_BASIC_INFO, (task_info_t)&t_info, &t_info_count);
    results[2].rss = t_info.resident_size;
    results[2].vms = t_info.virtual_size;
    results[2].a1 = first_address_p;
    results[2].p1 = *first_address_p;
    results[2].a2 = mirror_address_p;
    results[2].p2 = *mirror_address_p;

    printf("Allocated one page of memory and wrote to it.\n");
    printf("*%p = '%c'\nRSS: %zu\tVMS: %zu\n",results[0].a1, results[0].p1, results[0].rss, results[0].vms);
    printf("Cloned that page copy-on-write.\n");
    printf("*%p = '%c'\n*%p = '%c'\nRSS: %zu\tVMS: %zu\n",results[1].a1, results[1].p1,results[1].a2, results[1].p2, results[1].rss, results[1].vms);
    printf("Wrote to the new cloned page.\n");
    printf("*%p = '%c'\n*%p = '%c'\nRSS: %zu\tVMS: %zu\n",results[2].a1, results[2].p1,results[2].a2, results[2].p2, results[2].rss, results[2].vms);

    return 0;
}

我想在Linux的同样的效果。

I want the same effect in Linux.

推荐答案

我试图达到同样的事情(其实它悦目简单,因为我只需要采取现场区域​​的快照,我不需要拿份该副本)。我没有找到这一个很好的解决方案。

I tried to achieve the same thing (in fact, its sightly simpler as I only need to take snapshots of a live region, I do not need to take copies of the copies). I did not find a good solution for this.

直接内核支持(或缺乏):通过修改/添加一个模块应该有可能实现这一目标。然而,有没有简单的方法来设置一个新的COW区域从现有之一。由叉(copy_page_rank)中使用的code从一个进程/虚拟addresse空间复制的vm_area_struct到另一个(新的),但假设新映射的地址是相同的旧的地址。如果一个人想要实现一个重映射功能,该功能必须修改/复制,以一个复制的vm_area_struct与地址转换。

Direct kernel support (or the lack thereof): By modifying/adding a module it should be possible to achieve this. However there is no simple way to setup a new COW region from an existing one. The code used by fork (copy_page_rank) copy a vm_area_struct from one process/virtual addresse space to another (new one) but assumes that the address of the new mapping is the same as the address of the old one. If one want to implement a "remap" feature, the function must be modified/duplicated in order to copy a vm_area_struct with address translation.

BTRFS :我tha​​ught使用上的btrfs为这头牛的。我写了一个简单的程序需要映射两个reflink-ED文件,并试图映射。然而,寻找与的/ proc /自/页映射页面信息显示文件的两个实例不共享同一缓存页面。 (至少,除非我的测试是错误的)。所以,你将不会被soing这个收获很多。相同的数据的物理页面不会被共享旺不同的实例。

BTRFS: I thaught of using COW on btrfs for this. I wrote a simple program mapping two reflink-ed files and tried to map them. However, looking at the page information with /proc/self/pagemap shows the the two instances of the file do not share the same cache pages. (At least unless my test is wrong). So you will not gain much by soing this. The physical pages of the same data will not be shared mong different instances.

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <unistd.h>
#include <inttypes.h>
#include <stdio.h>

void* map_file(const char* file) {
  struct stat file_stat;
  int fd = open(file, O_RDWR);
  assert(fd>=0);
  int temp = fstat(fd, &file_stat);
  assert(temp==0);
  void* res = mmap(NULL, file_stat.st_size, PROT_READ, MAP_SHARED, fd, 0);
  assert(res!=MAP_FAILED);
  close(fd);
  return res;
}

static int pagemap_fd = -1;

uint64_t pagemap_info(void* p) {
  if(pagemap_fd<0) {
    pagemap_fd = open("/proc/self/pagemap", O_RDONLY);
    if(pagemap_fd<0) {
      perror("open pagemap");
      exit(1);
    }
  }
  size_t page = ((uintptr_t) p) / getpagesize();
  int temp = lseek(pagemap_fd, page*sizeof(uint64_t), SEEK_SET);
  if(temp==(off_t) -1) {
    perror("lseek");
    exit(1);
  }
  uint64_t value;
  temp = read(pagemap_fd, (char*)&value, sizeof(uint64_t));
  if(temp<0) {
    perror("lseek");
    exit(1);
  }
  if(temp!=sizeof(uint64_t)) {
    exit(1);
  }
  return value;
}

int main(int argc, char** argv) {

  char* a = (char*) map_file(argv[1]);
  char* b = (char*) map_file(argv[2]);

  int fd = open("/proc/self/pagemap", O_RDONLY);
  assert(fd>=0);

  int x = a[0];  
  uint64_t info1 = pagemap_info(a);

  int y = b[0];
  uint64_t info2 = pagemap_info(b);

  fprintf(stderr, "%" PRIx64 " %" PRIx64 "\n", info1, info2);

  assert(info1==info2);

  return 0;
}

则mprotect + MMAP匿名网页:它不会在你的情况下工作,但一个解决方案是使用MAP_SHARED文件我主要的内存区域。在快照,文件映射别的地方,这两个实例mprotected。在写,在快照映射一个匿名页时,数据在这一新的页面复制原始页面是不受保护的。然而,这种解决方案不工作你的情况,你将无法重复这一过程中的快照(因为它不是一个普通的MAP_SHARED区域,但与一些MAP_ANONYMOUS页面MAP_SHARED Moreower不与副本的数量规模。:如果我有很多COW份,我将不得不重复同样的过程,每一个副本,这个页面将不会被复制的副本。我不能在原来的区域匿名页面映射,因为这将是不可能的地图在副本的匿名页面。该解决方案不起作用,无论如何。

mprotect+mmap anonymous pages: It does not work in your case, but a solution is to use a MAP_SHARED file for my main memory region. On a snapshot, the file is mapped somewhere else and both instances are mprotected. On a write, a anonymous page in mapped in the snapshot, the data is copied in this new page and the original page is unprotected. However this solution does not work in your case as you will not be able to repeat the process in the snapshot (because it is not a plain MAP_SHARED area but a MAP_SHARED with some MAP_ANONYMOUS pages. Moreower it does not scale with the number of copies : if I have many COW copies, I will have to repeat the same process for each copy and this page will not be duplicated for the copies. And I can't map the anonymous page in the original area as it will not be possible to map the anonymous pages in the copies. This solution does not work in anyway.

则mprotect + remap_file_pages :这看起来像你这样做不接触Linux内核的唯一途径。不利的一面是,在一般情况下,你可能会做一个副本时,做出remap_file_page系统调用的每一页:它可能不是有效的做出很多系统调用。当重复数据删除的共享页面,你至少需要:remap_file_page新的/自由页的新的书面到页面中,M-UN-保护新的一页。有必要引用计数每一页。

mprotect+remap_file_pages: This looks like the only way do do this without touching the Linux kernel. The downside it that, in general, you will probably have to make a remap_file_page syscall for each page when doing a copy : it might not be that efficient to make a lot of syscalls. When deduplicating a shared page, you need at least to : remap_file_page a new/free page for the new written-to-page, m-un-protect the new page. It is necesssary to reference count each page.

我不认为mprotect()对基础的办法将规模非常好(如果你处理大量的内存是这样)。在Linux上,则mprotect()不会在内存页粒度,但在一个vm_area_struct粒度(你找到/ //督促地图中的条目)工作。做的mprotect()的内存页粒度将导致内核不断拆分和合并的vm_area_struct:

I do not think that the mprotect() based approaches would scale very well (if you handle a lot of memory like this). On Linux, mprotect() does not work at the memory page granularity but at the vm_area_struct granularity (the entries you find in /prod//maps). Doing a mprotect() at the memory page granularity will cause the kernel to constantly split and merge vm_area_struct:


  • 你会最终有一个非常的mm_struct;

  • you will end up with a very mm_struct ;

找了一个vm_area_struct(这是用于记录虚拟内存相关的操作)是为O(log #vm_area_struct),但它仍可能产生负面影响性能;

looking up a vm_area_struct (which is used for a log of virtual memory related operations) is on O(log #vm_area_struct) but it might still have a negative performance impact;

内存消耗这些结构。

有关这种原因,在remap_file_pages()系统调用创建[ http://lwn.net /文章/ 24468 /] 为了做一个文件的非线性内存映射。与MMAP这样做,需要一个日志的vm_area_struct的。我不活动不认为他们这是专为网页粒度映射:)的remap_file_pages(是不是很为这个用例进行了优化,因为这将需要每页一个系统调用

For this kind of reason, the remap_file_pages() syscall was created [http://lwn.net/Articles/24468/] in order to do non-linear memory mapping of a file. Doing this with mmap, requires a log of vm_area_struct. I don not event think that they this was designed for page granularity mapping: the remap_file_pages() is not very optimised for this use case as it would need a syscall per page.

我认为,唯一可行的办法就是让内核做到这一点。这是可能做到这一点在与remap_file_pages用户空间,但它可能会是相当低效的作为快照将在产生需要许多系统调用中的页数成正比的。 remap_file_pages的一个变种可能会做的伎俩。

I think the only viable solution is to let the kernel do it. It is possible to do it in userspace with remap_file_pages but it will probably be quite inefficient as a snapshot will in generate need a number of syscalls proportional in the number of pages. A variant of remap_file_pages might do the trick.

本办法然而复制内核的页面逻辑。我倾向于认为我们应该让内核做到这一点。总而言之,在内核实现似乎是更好的解决方案。对于人谁知道内核的一部分,它应该是很容易做到的。

This approach however duplicate the page logic of the kernel. I tend to think we should let the kernel do this. All in all, an implementation in the kernel seems to be the better solution. For someone who knows this part of the kernel, it should be quite easy to do.

KSM (内核Samepage合并):有一个是由内核可以做到的。它可以尝试删除重复的页面。您仍然需要复制的数据,但内核应该能够合并。你需要的mmap一个新的匿名区域的副本,其中memcpy和madvide(开始,结束,MADV_MERGEABLE)领域的手工复制它。您需要启用KSM(根):

KSM (Kernel Samepage Merging): There is a thing that the kernel can do. It can try to deduplicate the pages. You will still have to copy the data, but the kernel should be able to merge them. You need to mmap a new anonymous area for your copy, copy it manually with memcpy and madvide(start, end, MADV_MERGEABLE) the areas. You need to enable KSM (in root):

echo 1 > /sys/kernel/mm/ksm/run
echo 10000 > /sys/kernel/mm/ksm/pages_to_scan

它的工作原理,它不与我的工作量工作这么好,但它可能是因为页面不分享了很多中端。缺点IG,你仍然需要做的副本(你不能有一个高效的COW),然后内核将取消合并的页面。做副本时,它会生成页面和缓存故障,KSM dameon线程会消耗大量的CPU(我有一个CPU在A00%运行整个模拟),并可能消耗日志高速缓存。这样做的副本时,你不会赢得时间,但你可能会获得一些内存。 ID YOUT的主要动机,是从长远来看,使用更少的内存,你不很在乎​​避免拷贝,这种解决方案可能为你工作。

It works, it doesn't work so well with my workload but it's probably because the pages are not shared a lot in the end. The downside ig that you still have to do the copy (you cannot have an efficient COW) and then the kernel will unmerge the page. It will generate page and cache faults when doing the copies, the KSM dameon thread will consume a lot of CPU (I have a CPU running at A00% for the whole simulation) and probably consume a log a cache. So you will not gain time when doing the copy but you might gain some memory. Id yout main motivation, is to use less memory in the long run and you do not care that much about avoiding the copies, this solution might work for you.

这篇关于一个进程内分配上写内存拷贝的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆