如何在R中完成并行处理? [英] How can I accomplish parallel processing in R?

查看:77
本文介绍了如何在R中完成并行处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有两个数据集(行和列的数目相等),并且希望运行自己编写的一段代码,那么显然有两个选择,要么是顺序执行,要么是并行编程.

If I have two datasets (having equal number of rows and columns) and I wish to run a piece of code that I have made, then there are two options obviously, either to go with sequential execution or parallel programming.

现在,我制作的算法(代码)很大,由多个for循环组成.我想问一下,有什么办法可以直接在它们两个上使用它,还是必须以某种方式转换代码?单挑会很棒.

Now, the algorithm (code) that I have made is a big one and consists of multiple for loops. I wish to ask, is there any way to directly use it on both of them or will I have to transform the code in some way? A heads up would be great.

推荐答案

要回答您的问题:您不必转换代码即可在两个数据集上并行运行它,它应该可以正常工作.

To answer your question: you do not have to transform the code to run it on two datasets in parallel, it should work fine like it is.

对并行处理的需求通常以两种方式出现(我想对​​于大多数用户而言):

The need for parallel processing usually arises in two ways (for most users, I would imagine):

  1. 您具有可以按顺序运行的代码,但是您希望并行执行.
  2. 您有一个需要花费很长时间才能在大型数据集上执行的函数,并且希望并行运行它以加快速度.

对于第一种情况,您无需执行任何操作,您可以使用为其设计的一个库并行执行它,或者仅在同一台计算机上运行R的两个实例并运行相同的代码,但是他们每个人都有不同的数据集. 那里有多少for循环无关紧要,甚至在数据集中的列中甚至不需要具有相同数量的行. 如果顺序运行良好,则意味着平行链之间将没有依赖关系,因此没有问题. 由于您的问题属于第一种情况,因此您可以并行运行它.

For the first case, you do not have to do anything, you can just execute it in parallel using one of the libraries designed for it, or just run two instances of R on the same computer and run the same code but with different datasets in each of them. It doesn't matter how many for loops you have in there and you don't even need to have the same number of rows in columns in the datasets. If it runs fine sequentially, it means there will be no dependence between the parallel chains and thus no problem. Since your question falls in the first case, you can run it in parallel.

如果有第二种情况,有时可以通过将数据集拆分为多个部分(可以依次运行每个部分)然后将其并行运行,从而将其转换为第一种情况.这说起来容易做起来难,而且并非总是可行.这也就是为什么并非所有功能都只有一个run.in.parallel=TRUE选项的原因:如何分割数据并不总是很明显,也不总是可能.

If you have the second case, you can sometimes turn it into the first case by splitting your dataset into pieces (where you can run each of the pieces sequentially) and then you run it in parallel. This is easier said than done, and won't always be possible. It is also why not all functions just have a run.in.parallel=TRUE option: it is not always obvious how you should split the data, nor is it always possible.

因此,您已经通过编写函数和拆分数据完成了大部分工作. 这是对两个数据集使用一个函数进行并行处理的一般方法:

So you have already done most of the work by writing the functions, and splitting the data. Here is a general way of doing parallel processing with one function, on two datasets:

library( doParallel )
cl <- makeCluster( 2 ) # for 2 processors, i.e. 2 parallel chains
registerDoParallel( cl )

datalist <- list(mydataset1 , mydataset2)

# now start the chains
nchains <- 2 # for two processors

results_list <- foreach(i=1:nchains , 
                .packages = c( 'packages_you_need') ) %dopar% {
     result <- find.string( datalist[[i]] )
     return(result) }

结果将是一个包含两个元素的列表,每个元素包含来自链的结果.然后,您可以根据需要将其组合,或使用.combine函数.有关详细信息,请参见foreach帮助.

The result will be a list with two elements, each containing the results from a chain. You can then combine it as you wish, or use a .combine function. See the foreach help for details.

遇到上述第1种情况时,您可以随时使用此代码.在大多数情况下,如果您花一些时间考虑如何分割数据,然后合并结果,则也可以将其用于数字2之类的情况.可以将其视为并行包装器". 它应该可以在Windows,GNU/Linux和Mac OS上运行,但是我还没有在所有设备上对其进行测试.

You can use this code any time you have a case like number 1 described above. Most of the time you can also use it for cases like number 2, if you spend some time thinking about how you want to divide the data, and then combine the results. Think of it as a "parallel wrapper". It should work in Windows, GNU/Linux, and Mac OS, but I haven't tested it on all of them.

每当我需要快速提高速度时,我都会方便地使用此脚本,但是我仍然总是从编写可以顺序运行的代码开始.并行思考会伤害我的大脑.

I keep this script handy whenever I need a quick speed-up, but I still always start out by writing code I can run sequentially. Thinking in parallel hurts my brain.

这篇关于如何在R中完成并行处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆