R并行计算和僵尸进程 [英] R parallel computing and zombie processes

查看:130
本文介绍了R并行计算和僵尸进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这基本上是对此的后续操作R中进行并行计算时,有一些关于创建僵尸进程的帖子:

This is basically a follow up to the this more specialized question. There have been some posts about the creation of zombie processes when doing parallel computing in R:

  1. 如何阻止R将僵尸进程甩在后面
  2. 如何在何时杀死DoMC工作者完成了吗?
  3. 使用并行包删除僵尸进程
  1. How to stop R from leaving zombie processes behind
  2. How to kill a doMC worker when it's done?
  3. Remove zombie processes using parallel package

有几种并行计算方法,我将重点介绍到目前为止在本地计算机上使用的三种方法.我在具有4内核的本地计算机上将doMCdoParallelforeach软件包一起使用:

There are several ways of doing parallel computing and I will focus on the three ways that I have used so far on a local machine. I used doMC and doParallel with the foreachpackage on a local computer with 4cores:

(a)注册一个fork集群:

(a) Registering a fork cluster:

library(doParallel)
cl <- makeForkCluster(4)
# equivalently here: cl <- makeForkCluster(nnodes=getOption("mc.cores", 4L))
registerDoParallel(cl)
    out <- foreach(i=1:1000, .combine = "c") %dopar% {
        print(i)
    }
stopCluster(cl)

(b)注册PSOCK集群:

(b) Registering a PSOCK cluster:

library(doParallel)
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
    out <- foreach(i=1:1000, .combine = "c") %dopar% {
        print(i)
    }
stopCluster(cl)

(c)使用doMC

library(doMC)
library(doParallel)
registerDoMC(4)
    out <- foreach(i=1:1000, .combine = "c") %dopar% {
        print(i)
    }

几个用户已经观察到,使用doMC方法时-这只是mclapply函数的包装,因此它不是doMC的错(请参见此处:如何阻止R离开僵尸进程),建议使用fork集群可能不会让僵尸进程落后.在另一个问题中,有人建议使用使用并行包删除僵尸进程) PSOCK群集可能不会留下僵尸进程.但是,似乎所有这三种方法都将僵尸进程抛在了后面.虽然僵尸进程本身通常不是问题,因为它们确实(通常)不绑定资源,但它们会使进程树变得混乱.我仍然可以通过关闭并重新打开R来摆脱它们,但是当我处于会话中间时,这不是最佳选择.有没有解释为什么发生这种情况(甚至:有没有理由发生这种情况的原因)?还有什么事情要做,以至于不留下僵尸进程吗?

Several users have observed that when using the doMC method -- which is just a wrapper for the mclapply function so its not doMCs fault (see here: How to kill a doMC worker when it's done?) -- leaves zombie processes behind. In an answer to a previous question (How to stop R from leaving zombie processes behind) it was suggested that using a fork cluster might not leave zombie processes behind. In another question it was suggested (Remove zombie processes using parallel package) that using a PSOCK cluster might not leave zombie processes behind. However, it seems that all three methods leave zombie process behind. While zombie processes per se are usually not a problem because they do (normally) not bind resources they clutter the process tree. Still I might get rid of them by closing and re-opening R but that is not the best option when I'm in the middle of a session. Is there an explanation why this happens (or even: is there a reason why this has to happen)? And is there something to be done so that no zombie processes are left behind?

我的系统信息(在与xtermtmux的简单repl会话中使用R):

My system info (R is used in a simple repl session with xterm and tmux):

library(devtools)
> session_info()
Session info-------------------------------------------------------------------
 setting  value                                             
 version  R Under development (unstable) (2014-08-16 r66404)
 system   x86_64, linux-gnu                                 
 ui       X11                                               
 language (EN)                                              
 collate  en_IE.UTF-8                                       
 tz       <NA>                                              

Packages-----------------------------------------------------------------------
 package    * version  source          
 codetools    0.2.8    CRAN (R 3.2.0)  
 devtools   * 1.5.0.99 Github (c429ae2)
 digest       0.6.4    CRAN (R 3.2.0)  
 doMC       * 1.3.3    CRAN (R 3.2.0)  
 evaluate     0.5.5    CRAN (R 3.2.0)  
 foreach    * 1.4.2    CRAN (R 3.2.0)  
 httr         0.4      CRAN (R 3.2.0)  
 iterators  * 1.0.7    CRAN (R 3.2.0)  
 memoise      0.2.1    CRAN (R 3.2.0)  
 RCurl        1.95.4.3 CRAN (R 3.2.0)  
 rstudioapi   0.1      CRAN (R 3.2.0)  
 stringr      0.6.2    CRAN (R 3.2.0)  
 whisker      0.3.2    CRAN (R 3.2.0)  


小的至少对于makeForkCluster()来说,它有时产生的叉子被父母正确地杀死并收割,有时它们没有被收割并变成僵尸.似乎只有在中止或结束循环后集群关闭得不够快时,才会发生这种情况.至少那是最后几次发生.


Small edit: At least for makeForkCluster() it seems that sometimes the forks it spawns get killed and reaped by the parent correctly and sometimes they do not get reaped and become zombies. It seems this only happens when the cluster is not closed fast enough after the loop is aborted or finished; at least that is when it happened the last few times.

推荐答案

您可以使用内联"包摆脱僵尸进程.只需实现一个调用"waitpid"的函数即可:

You could get rid of the zombie processes using the "inline" package. Just implement a function that calls "waitpid":

library(inline)
includes <- '#include <sys/wait.h>'
code <- 'int wstat; while (waitpid(-1, &wstat, WNOHANG) > 0) {};'
wait <- cfunction(body=code, includes=includes, convention='.C')

我首先使用mclapply函数创建了一些僵尸来对此进行了测试:

I tested this by first creating some zombies with the mclapply function:

> library(parallel)
> pids <- unlist(mclapply(1:4, function(i) Sys.getpid(), mc.cores=4))
> system(paste0('ps --pid=', paste(pids, collapse=',')))
  PID TTY          TIME CMD
17447 pts/4    00:00:00 R <defunct>
17448 pts/4    00:00:00 R <defunct>
17449 pts/4    00:00:00 R <defunct>
17450 pts/4    00:00:00 R <defunct>

(请注意,我正在使用支持"--pid"选项的GNU版本的"ps".)

(Note that I'm using the GNU version of "ps" which supports the "--pid" option.)

然后我调用了我的等待"函数,然后再次调用了"ps"以验证僵尸已消失:

Then I called my "wait" function and called "ps" again to verify that the zombies are gone:

> wait()
list()
> system(paste0('ps --pid=', paste(pids, collapse=',')))
  PID TTY          TIME CMD

看来,由mclapply创建的辅助进程现在已经消失了.只要这些流程是由当前的R流程创建的,它就应该起作用.

It appears that the worker processes created by mclapply are now gone. This should work as long as the processes were created by the current R process.

这篇关于R并行计算和僵尸进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆