使用 R 并行处理 XML 节点 [英] Parallel processing XML nodes with R
问题描述
我正在尝试通过 xml2 包和 foreach 函数与 R 并行处理 XML 文档.但我收到node_attrs(x$node, nsMap = ns) 中的错误:外部指针无效".尝试使用 clusterExport 导出树.
I'm trying to process XML document parallel with R by xml2 package and foreach function. But I'm getting "Error in node_attrs(x$node, nsMap = ns) : external pointer is not valid". Tried to export tree with clusterExport.
示例代码:
library(xml2)
library(foreach)
library(doParallel)
x <- read_xml("<x> node <yy>1</yy><yy>2</yy></x>")
nCores <- detectCores()
cl <- makeCluster(nCores)
clusterExport(cl, varlist = "x")
registerDoParallel(cl)
foreach(yy = xml_find_all(x, "/x/yy")) %dopar%
yy
stopCluster(cl)
所以我不明白如何避免这个错误......
so I don't understand how to avoid this error…
推荐答案
xml2 对象(通过 yy
传递)可以不能导出到其他 R 进程,因为它们包含外部指针",它们是创建它们的 R 进程(=主 R 会话)所独有的.如果导出,这些外部指针在后台 R 进程(工作人员)上完全没有用,即它们无效".
xml2 objects (passed via yy
) can not be exported to other R processes because they hold "external pointer" that are unique to the R process (=the main R session) they were created on. If exported, those external pointers are completely useless on the background R processes (the workers), i.e. they are "not valid".
您可以在 'R 的未来:解决方案的常见问题' 小插图.
我所知道的唯一并行解决方案是让所有 xml2 处理对每个工作人员都是唯一的,例如
The only parallel solution I am aware of is to keep all xml2 processing unique to each worker, e.g.
res <- foreach(file = files) %dopar% {
x <- read_xml(file)
lapply(xml_find_all(x, "/x/yy"), ...)
}
这篇关于使用 R 并行处理 XML 节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!