mclapply遇到错误,具体取决于核心ID? [英] mclapply encounters errors depending on core id?

查看:173
本文介绍了mclapply遇到错误,具体取决于核心ID?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组基因,需要为它们并行计算一些系数. 在GeneTo_GeneCoeffs_filtered内部计算系数,该系数以基因名称作为输入并返回2个数据帧的列表.

具有100个长度的gene_array我以不同数量的内核(5、6和7)运行了此命令.

Coeffslist=mclapply(gene_array,GeneTo_GeneCoeffs_filtered,mc.cores = no_cores)

我遇到不同的基因名称错误,具体取决于分配给mclapply的核心数.

GeneTo_GeneCoeffs_filtered无法返回其具有模式的数据框列表的基因的索引. 对于分配给mclapply的7个内核,它是gene_array的4、11、18、25,... 95个元素(每7个),当R与6个内核一起工作时,索引为2、8、14. ..,98(每6个),以同样的方式使用5个内核-每5个.

最重要的是,它们在这些过程中是不同的,这意味着问题不在于特定的基因.

我怀疑可能有损坏的"内核无法正常运行我的功能,只有它会产生此错误.有没有办法追溯其ID并将其从R可以使用的内核列表中排除?

解决方案

仔细阅读mclapply的

(b)

将为其中涉及的所有值返回一个"try-error"对象 失败,即使不是所有人都失败了.

在您的情况下,借助(a),您的gene_array在核心中散布为循环"样式(在连续元素的索引之间具有mc.cores的间隙),并且借助(b) ,如果任何gene_array元素引发错误,您将为发送到该核心的每个gene_array元素返回一个错误(这些元素的索引之间有mc.cores的间隔).

我昨天在与Simon Urbanek的一次交流中刷新了对此的理解:

A close reading of mclapply's manpage reveals that this behavior is by design and it arises as result of interaction between:

(a)

"the input X is split into as many parts as there are cores (currently the values are spread across the cores sequentially, i.e. first value to core 1, second to core 2, ... (core + 1)-th value to core 1 etc.) and then one process is forked to each core and the results are collected."

(b)

a "try-error" object will be returned for all the values involved in the failure, even if not all of them failed.

In your case, by virtue of (a), your gene_array is spread "round-robin" style across the cores (with a gap of mc.cores between the indexes of successive elements), and by virtue of (b), if any gene_array element raises an error, you get back an error for each gene_array element sent to that core (having a gap of mc.cores between the indices of those elements).

I refreshed my understanding of this in an exchange yesterday with Simon Urbanek: https://stat.ethz.ch/pipermail/r-sig-hpc/2019-September/002098.html in which I also provide an error-handling approach yielding errors only for the indices that generate an error.

You can also get errors only for the indices that generate an error by passing mc.preschedule=FALSE.

这篇关于mclapply遇到错误,具体取决于核心ID?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆