使用parLapply的尺寸数错误 [英] incorrect number of dimensions error using parLapply

查看:170
本文介绍了使用parLapply的尺寸数错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用parLapply在我的机器的4个内核上并行化某些功能. 我的函数定义了两个嵌入式循环,用于填充预定义矩阵M的一些空列. 但是,当我运行下面的代码时,出现以下错误

I am trying to parallelize some function on the 4 cores of my machine using parLapply. My function defines two embedded loops which are meant to fill out some empty columns of a predefined matrix M. However, when I run the code below I obtain the following error

2 nodes produced errors; first error: incorrect number of dimensions 

代码:

require("parallel")
TheData<-list(E,T)        # list of 2 matrices of different dimensions, T is longer and wider than E

myfunction <- function(TheData) {
for (k in 1:length(TheData[[1]][,1])) {
    distance<-matrix(,nrow=length(TheData[[1]][,1]),ncol=1)
     for (j in 1:length(TheData[[2]][,1])) {
    distance[j]<-sqrt((as.numeric(TheData[[2]][j,1])-as.numeric(TheData[[1]][k,2]))^2+(as.numeric(TheData[[2]][j,2])-as.numeric(TheData[[1]][k,1]))^2)              
    }         
    index<-which(distance == min(distance))
    M[k,4:9]<-c(as.numeric(TheData[[2]][index,1]),as.numeric(TheData[[2]][index,2]),as.numeric(TheData[[2]][index,3]),as.numeric(TheData[[2]][index,4]),as.numeric(TheData[[2]][index,5]),as.numeric(TheData[[2]][index,6]))   
rm(distance)
gc() 
}  
}
n_cores <- 4
Cl = makeCluster(n_cores)
Results <- parLapplyLB(Cl, TheData, myfunction)
# I also tried: Results <- parLapply(Cl, TheData, myfunction)

推荐答案

在您的示例中,parLapply遍历矩阵列表,并将这些矩阵作为参数传递给"myfunction".但是,"myfunction"似乎期望其参数为两个矩阵的列表,因此会发生错误.我可以通过以下方式重现该错误:

In your example, parLapply is iterating over a list of matrices, and passing those matrices as the argument to "myfunction". However, "myfunction" seems to expect its argument to be a list of two matrices, and so an error occurs. I can reproduce that error with:

> E <- matrix(0, 4, 4)
> E[[1]][,1]
Error in E[[1]][, 1] : incorrect number of dimensions

我不确定您实际上要做什么,但是使用"myfunction"的当前实现,我希望您使用包含两个矩阵的列表列表来调用parLapply,例如:

I'm not sure what you're really trying to do, but with the current implementation of "myfunction", I would expect you to call parLapply with a list of lists containing two matrices, such as:

TheDataList <- list(list(A,B), list(C,D), list(E,F), list(G,H))

将此参数作为第二个参数传递给parLapply将导致"myfunction"被调用四次,每次调用包含两个矩阵的列表.

Passing this as the second argument to parLapply would result in "myfunction" being called four times, each time with a list containing two matrices.

但是您的示例还有另一个问题.看起来您希望parLapply修改矩阵"M"作为副作用,但不能这样做.我认为您应该更改"myfunction"以返回矩阵. parLapply将在列表中返回矩阵,然后可以将它们绑定在一起以得到所需的结果.

But your example has another problem. It looks like you expect parLapply to modify the matrix "M" as a side-effect, but it can't. I think you should change "myfunction" to return a matrix. parLapply will return the matrices in a list which you can then bind together into the desired result.

更新

根据您的评论,我现在认为您本质上是要并行化"myfunction".这是我的尝试:

From your comment, I now believe that you essentially want to parallelize "myfunction". Here's my attempt to do that:

library(parallel)
cl <- makeCluster(4)

myfunction <- function(Exy) {
  iM <- integer(nrow(Exy))
  for (k in 1:nrow(Exy)) {
    distance <- sqrt((Txy[,1] - Exy[k,2])^2 + (Txy[,2] - Exy[k,1])^2)
    iM[k] <- which.min(distance)
  }
  iM
}

# Random example data for testing
T <- matrix(rnorm(150), 10)
E <- matrix(rnorm(120), 10)

# Only export the first two columns to T to the workers
Txy <- T[,1:2]
clusterExport(cl, c('Txy'))

# Parallelize "myfunction" by calling it in parallel on block rows of "E".
ExyList <- parallel:::splitRows(E[,1:2], length(cl))
iM <- do.call('c', clusterApply(cl, ExyList, myfunction))

# Update "M" using data from "T" indexed by "iM"
M <- matrix(0, nrow(T), 9)  # more fake data
for (k in iM) {
  M[k,4:9] <- T[k, 1:6]
}
print(M)

stopCluster(cl)

注意:

  • 我对myfunction进行了矢量化处理,这应该使其效率更高.希望它几乎是正确的.
  • 我还修改了myfunction,以将索引向量返回到"T",以减少发送回主站的数据量.
  • 并行包中的splitRows函数用于将"E"的前两列拆分为子矩阵列表.
  • splitRows不是并行导出的,所以我使用了':::'.如果这冒犯了您,请使用导出的snow中的splitRows函数.
  • "T"的前两列将导出到每个工作人员,因为每个任务都需要整个前两列.
  • 使用
  • clusterApply而不是parLapply,因为我们需要遍历E的子矩阵.

这篇关于使用parLapply的尺寸数错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆