R system()进程始终使用同一CPU,而不是多线程/多核 [英] R system() process always uses same CPU, not multi-threaded/multi-core
问题描述
在Linux 3.12.0的R 3.0.2中,我正在使用system()
函数来执行许多任务.如果我在R system()
之外通过Rscript在命令行上执行了这些任务,则这些任务中的每一个都将获得预期的效果.
In R 3.0.2 on Linux 3.12.0, I am using the system()
function to execute a number of tasks. The desired effect is for each of these tasks to run as they would if I had executed them on the command-line via Rscript outside of R system()
.
但是,当通过system()
在R中执行它们时,每个任务都与主R进程中的同一个CPU绑定.
However, when executing them inside R via system()
, each task is tied to the same single CPU from the master R process.
换句话说:
直接从R之外的bash外壳通过RScript启动时,每个任务都尽可能在其自己的内核上运行(这是需要的)
When launched via RScript directly from a bash shell, outside of R, each task runs on its own core as possible (this is desired)
在R中通过system()
启动时,每个任务都在同一单个内核上运行.没有多核共享.如果我有100个任务,它们全都停留在一个核心上.
When launched inside R via system()
, each task runs on the same single core. There is no multicore sharing. If I have 100 tasks, they are all stuck on one core.
我无法弄清楚如何在R内生成一个进程,以便每个进程都将使用其自己的内核.
I cannot figure out how to spawn a process inside of R so that each process will use its own core.
我正在使用一个简单的测试来消耗CPU周期,因此我可以使用top
/htop
来测量效果:
I am using a simple test to consume CPU cycles so I can measure the effect using top
/htop
:
dd if=/dev/urandom bs=32k count=1000 | bzip2 -9 >> /dev/null
在R之外多次启动此简单测试时,每个迭代都有其自己的核心.但是当我在R中启动它时:
When this simple test is launched outside of R multiple times, each iteration gets its own core. But when I launch it inside of R:
system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
它们都卡在一个内核上.
They are all stuck on a single core.
这是运行system()
的4个同时/并发迭代后的可视化效果.
Here is a visualization after running 4 simultaneous/concurrent iterations of system()
.
请帮助我,我必须能够告诉R启动新任务,每个任务都在各自的内核中运行.
Please help me, I need to be able to tell R to launch new tasks, with each of them running in their own core.
2013年12月4日更新:
UPDATE DEC 4 2013:
我使用以下方法在Python中进行了测试:
I tried a test in Python using this:
import thread
thread.start_new_thread(os.system,("/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000",))
我重复了新线程几次,并按预期进行了所有工作(使用了多个内核,每个线程使用一个内核).
I repeated the new thread several times, and as expected everything worked (multiple cores used, one per thread).
所以我认为在R中安装rPython
软件包,并在R中尝试相同的操作:
So I think install the rPython
package in R, and try the same from within R:
python.exec("import thread")
python.exec("thread.start_new_thread(os.system,('/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000',))")
不幸的是,即使在重复调用之后,它还是再次局限于单个内核.为什么从R执行时所有启动的东西都限于一个内核?
Unfortunately, once again it was limited to a single core even after repeated calls. Why is it that everything launched is limited to a single core when executed from R?
推荐答案
在@agstudy的评论之后,您应该先使parallel
开始工作.在我的系统上,它使用多个内核:
Following on @agstudy's comment, you should get parallel
to work first. On my system, this uses multiple cores:
f<-function(x)system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
library(parallel)
mclapply(1:4,f,mc.cores=4)
我本来会在评论中写这个的,但是太长了.我知道您已经说过您已经尝试过parallel
软件包,但是我想确认您是否正确使用了它.如果它不起作用,您是否可以确认非系统调用是否正确使用了mclapply
?
I would have wrote this in a comment myself, but it is too long. I know you have said that you have tried the parallel
package, but I wanted to confirm that you are using it correctly. If it doesn't work, can you confirm that a non-system call uses mclapply
correctly, like this one?
a<-mclapply(rep(1e8,4),rnorm,mc.cores=4)
在阅读您的评论时,我怀疑您的pthreads
Linux软件包已过期且已损坏.在我的系统上,我正在使用libpthread-2.15.so(而不是2.13).如果您使用的是Ubuntu,则可以使用apt-get install libpthread-stubs0
获取最新版本.
Reading your comments, I suspect that your pthreads
Linux package is out of date and broken. On my system, I am using libpthread-2.15.so (not 2.13). If you're on Ubuntu, you can grab the latest with apt-get install libpthread-stubs0
.
此外,请注意,您应该使用parallel
,而不是multicore
.如果您在文档中 parallel
,您会注意到他们已经整合了multicore
上的工作.
Also, note that you should be using parallel
, not multicore
. If you look at the docs for parallel
, you'll note that they have incorporated the work on multicore
.
在阅读您的下一组注释时,我必须坚持认为,自2.14开始,R中已包含的是parallel
而不是multicore
.您可以在 CRAN任务视图中阅读有关此内容的信息.
Reading your next set of comments, I must insist that it is parallel
and not multicore
that has been included in R since 2.14. You can read about this on the CRAN Task View.
使parallel
正常工作至关重要.我之前曾告诉您,您可以直接从源代码进行编译,但这是不正确的.我想重新编译它的唯一方法是从源代码编译R.
Getting parallel
to work is crucial. I previously told you that you could compile it directly from source, but this is not correct. I guess the only way to recompile it would be to compile R from source.
您还可以验证是否正确设置了CPU关联性吗?还可以检查R是否可以检测到内核数吗?只需运行:
Can you also verify that your CPU affinity is set correctly? Also can you check if R can detect the number of cores? Just run:
library(parallel)
mcaffinity()
# Should be c(1,2,3,4) for you.
detectCores()
# Should be 4 for you.
这篇关于R system()进程始终使用同一CPU,而不是多线程/多核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!