R system()进程始终使用同一CPU,而不是多线程/多核 [英] R system() process always uses same CPU, not multi-threaded/multi-core

查看:119
本文介绍了R system()进程始终使用同一CPU,而不是多线程/多核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Linux 3.12.0的R 3.0.2中,我正在使用system()函数来执行许多任务.如果我在R system()之外通过Rscript在命令行上执行了这些任务,则这些任务中的每一个都将获得预期的效果.

In R 3.0.2 on Linux 3.12.0, I am using the system() function to execute a number of tasks. The desired effect is for each of these tasks to run as they would if I had executed them on the command-line via Rscript outside of R system().

但是,当通过system()在R中执行它们时,每个任务都与主R进程中的同一个CPU绑定.

However, when executing them inside R via system(), each task is tied to the same single CPU from the master R process.

换句话说:

直接从R之外的bash外壳通过RScript启动时,每个任务都尽可能在其自己的内核上运行(这是需要的)

When launched via RScript directly from a bash shell, outside of R, each task runs on its own core as possible (this is desired)

在R中通过system()启动时,每个任务都在同一单个内核上运行.没有多核共享.如果我有100个任务,它们全都停留在一个核心上.

When launched inside R via system(), each task runs on the same single core. There is no multicore sharing. If I have 100 tasks, they are all stuck on one core.

我无法弄清楚如何在R内生成一个进程,以便每个进程都将使用其自己的内核.

I cannot figure out how to spawn a process inside of R so that each process will use its own core.

我正在使用一个简单的测试来消耗CPU周期,因此我可以使用top/htop来测量效果:

I am using a simple test to consume CPU cycles so I can measure the effect using top/htop:

dd if=/dev/urandom bs=32k count=1000 | bzip2 -9 >> /dev/null

在R之外多次启动此简单测试时,每个迭代都有其自己的核心.但是当我在R中启动它时:

When this simple test is launched outside of R multiple times, each iteration gets its own core. But when I launch it inside of R:

system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)

它们都卡在一个内核上.

They are all stuck on a single core.

这是运行system()的4个同时/并发迭代后的可视化效果.

Here is a visualization after running 4 simultaneous/concurrent iterations of system().

请帮助我,我必须能够告诉R启动新任务,每个任务都在各自的内核中运行.

Please help me, I need to be able to tell R to launch new tasks, with each of them running in their own core.

2013年12月4日更新:

UPDATE DEC 4 2013:

我使用以下方法在Python中进行了测试:

I tried a test in Python using this:

import thread
thread.start_new_thread(os.system,("/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000",))

我重复了新线程几次,并按预期进行了所有工作(使用了多个内核,每个线程使用一个内核).

I repeated the new thread several times, and as expected everything worked (multiple cores used, one per thread).

所以我认为在R中安装rPython软件包,并在R中尝试相同的操作:

So I think install the rPython package in R, and try the same from within R:

python.exec("import thread")
python.exec("thread.start_new_thread(os.system,('/bin/dd if=/dev/urandom of=/dev/null bs=32k count=2000',))")

不幸的是,即使在重复调用之后,它还是再次局限于单个内核.为什么从R执行时所有启动的东西都限于一个内核?

Unfortunately, once again it was limited to a single core even after repeated calls. Why is it that everything launched is limited to a single core when executed from R?

推荐答案

在@agstudy的评论之后,您应该先使parallel开始工作.在我的系统上,它使用多个内核:

Following on @agstudy's comment, you should get parallel to work first. On my system, this uses multiple cores:

f<-function(x)system("dd if=/dev/urandom bs=32k count=2000 | bzip2 -9 >> /dev/null", ignore.stdout=TRUE,ignore.stderr=TRUE,wait=FALSE)
library(parallel)
mclapply(1:4,f,mc.cores=4)

我本来会在评论中写这个的,但是太长了.我知道您已经说过您已经尝试过parallel软件包,但是我想确认您是否正确使用了它.如果它不起作用,您是否可以确认非系统调用是否正确使用了mclapply?

I would have wrote this in a comment myself, but it is too long. I know you have said that you have tried the parallel package, but I wanted to confirm that you are using it correctly. If it doesn't work, can you confirm that a non-system call uses mclapply correctly, like this one?

a<-mclapply(rep(1e8,4),rnorm,mc.cores=4)


在阅读您的评论时,我怀疑您的pthreads Linux软件包已过期且已损坏.在我的系统上,我正在使用libpthread-2.15.so(而不是2.13).如果您使用的是Ubuntu,则可以使用apt-get install libpthread-stubs0获取最新版本.


Reading your comments, I suspect that your pthreads Linux package is out of date and broken. On my system, I am using libpthread-2.15.so (not 2.13). If you're on Ubuntu, you can grab the latest with apt-get install libpthread-stubs0.

此外,请注意,您应该使用parallel,而不是multicore.如果您在文档中 parallel,您会注意到他们已经整合了multicore上的工作.

Also, note that you should be using parallel, not multicore. If you look at the docs for parallel, you'll note that they have incorporated the work on multicore.

在阅读您的下一组注释时,我必须坚持认为,自2.14开始,R中已包含的是parallel而不是multicore.您可以在 CRAN任务视图中阅读有关此内容的信息.

Reading your next set of comments, I must insist that it is parallel and not multicore that has been included in R since 2.14. You can read about this on the CRAN Task View.

使parallel正常工作至关重要.我之前曾告诉您,您可以直接从源代码进行编译,但这是不正确的.我想重新编译它的唯一方法是从源代码编译R.

Getting parallel to work is crucial. I previously told you that you could compile it directly from source, but this is not correct. I guess the only way to recompile it would be to compile R from source.

您还可以验证是否正确设置了CPU关联性吗?还可以检查R是否可以检测到内核数吗?只需运行:

Can you also verify that your CPU affinity is set correctly? Also can you check if R can detect the number of cores? Just run:

library(parallel)
mcaffinity()
# Should be c(1,2,3,4) for you.
detectCores()
# Should be 4 for you.

这篇关于R system()进程始终使用同一CPU,而不是多线程/多核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆