R和parallel :: mclapply的共享内存 [英] R and shared memory for parallel::mclapply

查看:196
本文介绍了R和parallel :: mclapply的共享内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图通过并行执行对大约1000个项目的清单执行的昂贵操作来利用四核计算机.

I am trying to take advantage of a quad-core machine by parallelizing a costly operation that is performed on a list of about 1000 items.

我目前正在使用R的parallel :: mclapply函数:

I am using R's parallel::mclapply function currently:

res = rbind.fill(parallel::mclapply(lst, fun, mc.cores=3, mc.preschedule=T))

哪个工作.问题是,产生的任何其他子进程都必须分配大量内存:

Which works. Problem is, any additional subprocess that is spawned has to allocate a large chunk of memory:

理想情况下,我希望每个内核都可以从父R进程访问共享内存,以便随着我在mclapply中使用的内核数量的增加,我不会在内核限制之前达到RAM限制.

Ideally, I would like each core to access shared memory from the parent R process, so that as I increase the number of cores used in mclapply, I don't hit RAM limitations before core limitations.

我目前对如何调试此问题一无所知.每个进程访问的所有大数据结构都是全局的(当前).这是问题所在吗?

I'm currently at a loss on how to debug this issue. All of the large data structures that each process accesses are globals (currently). Is that somehow the issue?

我确实将操作系统的共享内存最大设置增加到20 GB(可用RAM):

I did increase my shared memory max setting for the OS to 20 GB (available RAM):

$ cat /etc/sysctl.conf 
kern.sysv.shmmax=21474836480
kern.sysv.shmall=5242880
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.maxprocperuid=512
kern.maxproc=2048

我以为可以解决问题,但问题仍然存在.

I thought that would fix things, but the issue still occurs.

还有其他想法吗?

推荐答案

只是提示可能发生了什么 R-devel摘要,第149卷,第22期

Just the tip what might have been going on R-devel Digest, Vol 149, Issue 22

拉德福德·尼尔(Radford Neal)在2015年7月26日的回答:

Radford Neal's answer from Jul 26, 2015:

当mclapply派生开始新进程时,内存最初是 与父进程共享.但是,内存页必须是 任何一个进程写入它时都复制它.不幸的是,R 垃圾回收器会在每个对象上写入以标记和取消标记它 一个完整的垃圾收集已经完成,所以很有可能每个R 对象将在每个过程中重复,即使其中很多 (从R程序的角度来看)实际上并没有改变.

When mclapply forks to start a new process, the memory is initially shared with the parent process. However, a memory page has to be copied whenever either process writes to it. Unfortunately, R's garbage collector writes to each object to mark and unmark it whenever a full garbage collection is done, so it's quite possible that every R object will be duplicated in each process, even though many of them are not actually changed (from the point of view of the R programs).

这篇关于R和parallel :: mclapply的共享内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆