R - 并行化多模式学习(用dplyr和purrr) [英] R - Parallelizing multiple model learning (with dplyr and purrr)

查看:814
本文介绍了R - 并行化多模式学习(用dplyr和purrr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是关于学习多个模型的上一个问题的追踪。



用例是我对每个主题有多个意见,而
我想为每个主题训练一个模型。请参阅Hadley的优秀演示,了解如何做这个。



总之,这可以使用 dply purr 如此:

 库(purrr)
库(dplyr)
库fitdistrplus)
dt%>%
split(dt $ subject_id)%>%
map(〜fitdist(。观察,规范))
想知道如果 dplyr code>, purrr 对于这样的任务有一个易于使用的并行化机制(如并行映射)。



如果这些库不提供简单的并行化,可以使用经典的R并行化库( parallel foreach 等)?

解决方案

只需为completene添加答案即可在这里,您需要从Hadley的回购中安装 multidplyr 以运行此更多信息, a href =https://github.com/hadley/multidplyr/blob/master/vignettes/multidplyr.md =nofollow noreferrer>小插曲:


$ b $




$ 库(dplyr)
库(multidplyr)
库(purrr)

集群& b $ b set_default_cluster(cluster)
cluster_library(cluster,fitdistrplus)

#dt是一个数据帧,subject_id标识每个对象的观察值
by_subject< - partition(dt ,subject_id)

适合< - by_subject%>%
do(fit = fitdist(。$ observation,norm)))

gather_fits< ; - 收集(适合)$ fit
gather_summaries< - collected_fits%>%map(summary)


This is a follow up to a previous question about learning multiple models.

The use case is that I have multiple observations for each subject, and I want to train a model for each of them. See Hadley's excellent presentation on how to do this.

In short, this is possible to do using dply and purr like so:

library(purrr)
library(dplyr)
library(fitdistrplus)
dt %>% 
    split(dt$subject_id) %>%
    map( ~ fitdist(.$observation, "norm")) 

So since the model building is an embarrassingly parallel task, I was wondering if dplyr, purrr have an easy to use parallelization mechanism for such tasks (like a parallel map).

If these libraries don't provide easy parallelization could it be done using the classic R parallelization libraries (parallel, foreach etc)?

解决方案

Just adding an answer for completeness here, you will need to install multidplyr from Hadley's repo to run this, more info in the vignette:

library(dplyr)
library(multidplyr)
library(purrr)

cluster <- create_cluster(4)
set_default_cluster(cluster)
cluster_library(cluster, "fitdistrplus")

# dt is a dataframe, subject_id identifies observations from each subject
by_subject <- partition(dt, subject_id)

fits <- by_subject %>% 
    do(fit = fitdist(.$observation, "norm")))

collected_fits <- collect(fits)$fit
collected_summaries <- collected_fits %>% map(summary)

这篇关于R - 并行化多模式学习(用dplyr和purrr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆