Foreach并行-多个输出的合并功能 [英] Foreach Parallel - Combine function for Multiple Outputs

查看:74
本文介绍了Foreach并行-多个输出的合并功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对45000个用户和40部奇幻电影有一套评级.我需要根据每个用户与其他用户的皮尔逊相关性,为每个用户预测新的评分.我还需要为每个用户-电影组合存储一组相似的用户及其相似性.我正在使用foreach包并行执行循环.我设法编写的代码是这样的:

I have a set of ratings for 45000 users and 40 odd movies. I need to predict new ratings for each user based on their pearson correlation with other users. I also need to store the set of similar users and their similarities for each user-movie combination.I am using the foreach package to execute the loops in parallel. The code that I have managed to write is this:

library(foreach)

x <- matrix(rnorm(1:1000), nrow = 100 , ncol =10 )
df = list()

# correlation matrix
cor_mat <- cor(t(x))
cor_mat = abs(cor_mat)
# similarity limits
upper = 1
lower = 0.04


# Initiating parallel environment
cl = makeCluster(3)
registerDoParallel(cl)

res <- foreach(i = 1:nrow(x) , .combine = rbind,.packages=     c('base','foreach')) %dopar%{
      foreach(j = 1:ncol(x) , .combine = c, .packages = c('base','foreach')) %do%{

sim_user = which(cor_mat[i,] >= lower & cor_mat[i,] < upper)

 bx = as.numeric(t(x[sim_user,j]) %*%  
  cor_mat[sim_user,j]/sum(cor_mat[sim_user,j]))
 df[[length(df)+1]] = data.frame(i,j,sim_user,cor_mat[sim_user,j])

 return(bx)

  }
 }
stopCluster(cl)

我能够完成一半的任务,即根据foreach输出"res"创建预测收视率矩阵.但是我要添加类似用户列表的列表df在foreach循环的末尾是空的.

I am able to accomplish half of my task i.e. creating a matrix of predicted ratings from the foreach output 'res'. But my list df where I am appending the list of similar users is empty at the end of the foreach loop.

可以编写什么定制的组合函数来输出预测收视率矩阵和相似用户列表?

What customized combine function can be written to output both the matrix of predicted ratings and the list of similar users?

推荐答案

对于多个输出函数,最好返回列表中的所有内容.在这种情况下,这意味着您需要指定自己的函数来组合数据.在这里,我每次都返回两个元素:bx和df.因此,我的Combine函数分别组合了这两个元素中的每个元素,并将它们返回到长度为2的列表中.

For multiple output functions, it is always better to return everything inside a list. In that case, it means that you need to specify your own functions to combine data. Here, I return two elements each time: bx and df. My combine functions therefore combine each of those two elements separately and return them in a length-2 list.

combine_custom_j <- function(LL1, LL2) {

  bx <- c(LL1$bx, LL2$bx)
  dfs <- c(LL1$df, LL2$df)
  return(list(bx = bx, df = dfs))
}

combine_custom_i <- function(LL1, LL2) {

  bx <- rbind(LL1$bx, LL2$bx)
  dfs <- c(LL1$df, LL2$df) 
  return(list(bx = bx, df = dfs))

}

res <- foreach(i = 1:nrow(x) , .combine = combine_custom_i,.packages= c('base','foreach')) %dopar%{
  foreach(j = 1:ncol(x) , .combine = combine_custom_j, .packages = c('base','foreach')) %do%{

    sim_user = which(cor_mat[i,] >= lower & cor_mat[i,] < upper)

    bx = as.numeric(t(x[sim_user,j]) %*%  
                      cor_mat[sim_user,j]/sum(cor_mat[sim_user,j]))

    return(list(bx = bx, df = data.frame(i,j,sim_user,cor_mat[sim_user,j])))

  }
}

尽管我已按照建议的代码将数据框返回到列表中,但我相信您可能想rbind?在这种情况下,您只需在两个合并功能中将c(LL1$df, LL2$df)替换为rbind(LL1$df, LL2$df).

Although I have returned your data frames in a list like your code suggested, I believe you might want to rbind them? In that case, you can simply replace the c(LL1$df, LL2$df) by rbind(LL1$df, LL2$df) in both combine functions.

这篇关于Foreach并行-多个输出的合并功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆