R中变换(或变异)的并行版本? [英] Parallel version of transform (or mutate) in R?

查看:37
本文介绍了R中变换(或变异)的并行版本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个慢速函数,我想将它应用到 data.frame 中的每一行.计算是令人尴尬的并行.

I have a slow function that I want to apply to each row in a data.frame. The computation is embarrassingly parallel.

我有 4 个内核,但 R 的内置函数只使用了一个.

I have 4 cores, but R's built in functions only uses one.

我想要做的就是平行等价于:

All I want to do is a parallel equivalent to:

data$c = slow.foo(data$a, data$b)

我找不到关于使用哪个库(被选择淹没)以及如何使用它的明确说明.任何帮助将不胜感激.

I can't find clear instructions on which library to use (overwhelmed by choice) and how to use it. Any help would be greatly appreciated.

推荐答案

parallel 包包含在基础 R 中.以下是使用该包中的 parApply 的快速示例:

The parallel package is included with base R. Here's a quick example using parApply from that package:

library(parallel)

# Some dummy data
d <- data.frame(x1=runif(1000), x2=runif(1000))

# Create a cluster with 1 fewer cores than are available. Adjust as necessary
cl <- makeCluster(detectCores() - 1)

# Just like regular apply, but rows get sent to the various processes
out <- parApply(cl, d, 1, function(x) x[1] - x[2])

stopCluster(cl)

# Same as x1 - x2?
identical(out, d$x1 - d$x2)

# [1] TRUE

您还可以使用例如 parSapplyparLapply.

You also have, e.g., parSapply and parLapply at your disposal.

当然,对于我给出的示例,矢量化操作 d$x1 - d$x2 快得多.考虑您的流程是否可以矢量化而不是逐行执行.

Of course, for the example I've given, the vectorised operation d$x1 - d$x2 is much faster. Think about whether your processes can be vectorised rather than performed row by row.

这篇关于R中变换(或变异)的并行版本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆