内存泄漏解析R中的XML [英] Memory leaks parsing XML in r

查看：90 发布时间：2020/5/8 20:45:06 xml r memory-leaks

本文介绍了内存泄漏解析R中的XML的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在r中使用XML软件包时，内存泄漏不是什么新鲜事物.已经讨论了该主题:

Memory leaks when using XML package in r is not something new. This subject has already been discussed:

Serious Memory Leak When Iteratively Parsing XML Files
http://www.omegahat.org/RSXML/MemoryManagement.html
http://r.789695.n4.nabble.com/memory-leak-using-XML-readHTMLTable-td4643332.html

但是，在阅读了所有这些文档之后，我仍然不知道针对我的特殊情况的解决方案. 考虑以下代码:

However, after reading all these documents, I still do not know a solution for my particular case. Consider the following code:

library(XML)

GetHref = function(x) 
{
  subDoc = xmlChildren(x)
  hrefs = ifelse(is.null(subDoc$a), NA, xmlGetAttr(subDoc$a, 'href')) 
  rm(subDoc)  
  return(hrefs)
}

url = 'http://www.atpworldtour.com/Share/Event-Draws.aspx?e=338&y=2013'
parse = htmlParse(url)

print(.Call("R_getXMLRefCount", parse)) #prints 1

NodeList = xpathSApply(parse, "//td[@class='col_1']/div/div/div[@class='player']")

print(.Call("R_getXMLRefCount", parse)) #prints 33

PlNames = sapply(NodeList, xmlValue, trim = T)   

print(.Call("R_getXMLRefCount", parse)) #prints 33

hrefs = sapply(NodeList, GetHref)

print(.Call("R_getXMLRefCount", parse)) #prints 157

rm(NodeList) 
gc()

print(.Call("R_getXMLRefCount", parse)) #prints 157

似乎在后期处理期间创建的内部XML节点不会被删除.在这种情况下，什么解决方案?

It seems that internal XML nodes created during the post processing do not get deleted. What would be a solution in this case?

Session Info:  
R version 3.0.2 (2013-09-25)
Platform: i386-w64-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] XML_3.98-1.1

loaded via a namespace (and not attached):
[1] tools_3.0.2

推荐答案

我成功解决了与您非常相似的问题.

I succeed in correcting a problem very similar as yours.

我的文档是一个简单的xml文档:

My document is a simple xml doc:

doc = xmlParse(file_path)

我采用了 Duncan Temple Lang 中有关绕过内存管理的建议在收集子节点中.为此，我首先使用停用了终结器的getNodeSet来收集子节点:

I apply the advise from Duncan Temple Lang about by-passing the memory management in collecting subnodes. For that purpose, I first gather subnodes with getNodeSet with deactivating finalizer:

nodeset = getNodeSet(doc, xml_path, addFinalizer = FALSE)

从这个集合中，我可以构建一个子文档并释放它，而不会发生任何内存泄漏:

From this set, I can build a subdoc and free it without any memory leak:

subxml = subdoc(nodeset[[1]])
# ... do plenty of sapply
free(subxml)

最后，我强制对象按该顺序释放:

At the end, I force the objects to be released, in that order:

free(doc)
rm(nodeset)

有了这些，我再也没有内存泄漏了.希望能对您有所帮助！

With all of this, I have no memory leak anylonger. Hope it can help!

这篇关于内存泄漏解析R中的XML的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

内存泄漏解析R中的XML [英] Memory leaks parsing XML in r

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

内存泄漏解析R中的XML [英] Memory leaks parsing XML in r

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭