在 R 中创建大型 XML 树 [英] Creating large XML Trees in R

查看:47
本文介绍了在 R 中创建大型 XML 树的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 R 中创建一个大型 XML 树.这是代码的简化版本:

库(XML)N = 100000#实际中更大10^8/10^9seq = newXMLNode("序列")pars = as.character(1:N)for(i in 1:N)newXMLNode("参数", parent=seq, attrs=c(id=pars[i]))

当 N 大约为 N^6 时,这大约需要一分钟,而 N^7 大约需要四十分钟.有没有办法加快速度?

使用粘贴命令:

par_tmp = paste('<Parameter id="', pars, '"/>', sep="")

不到一秒钟.

解决方案

我建议使用 Rprof

I'm trying to create a large XML tree in R. Here's a simplified version of the code:

library(XML)
N = 100000#In practice is larger  10^8/ 10^9
seq = newXMLNode("sequence")
pars = as.character(1:N)
for(i in 1:N)
    newXMLNode("Parameter", parent=seq, attrs=c(id=pars[i]))

When N is about N^6 this takes about a minute, N^7 takes about forty minutes. Is there anyway to speed this up?

Using the paste command:

par_tmp = paste('<Parameter id="', pars, '"/>', sep="")

takes less than a second.

解决方案

I would recommend profiling the function using Rprof or the profr package. This will show you where your bottleneck is, and you then you can think about ways to either optimize the function or change the way that you're using it.

Your paste example would be much faster in part because it's vectorized. For a more fair comparison, you can see the difference there by looping over paste as you are currently doing with newXMLNode and see the difference in timing.

Edit:

Here is the output from profiling your loop with profr.

library(profr)
xml.prof <- profr(for(i in 1:N) 
    newXMLNode("Parameter", parent=seq, attrs=c(id=pars[i])))
plot(xml.prof)

There is nothing especially obvious in here about places that you can improve this. I see that it spends a reasonable amount of time in the %in% function, so improving that would reduce the overall time somewhat (although you still need to iterate over this repeatedly, so it won't make a huge difference). The best solution would be to rewrite newXMLNode as a vectorized function so you can skip the for loop entirely.

这篇关于在 R 中创建大型 XML 树的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆