r - xpathApplication on XMLNodeSet(带有XML包) [英] r - xpathApply on XMLNodeSet (with XML package)

查看:325
本文介绍了r - xpathApplication on XMLNodeSet(带有XML包)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在R的XML包中使用xpathApply函数从html文件中提取某些数据。
但是,在html文档的某些父节点上使用xpathApply之后,生成的对象的类变为XMLNodeSet,并且我无法在此类对象上进一步使用xpathApply,因为出现此错误消息:UseMethod( xpathApply):没有适用于'xpathApply'的方法应用于类XMLNodeSet的对象

这是我试图复制我的问题的R脚本例子只是一个简单的表,我知道我可以使用readHTMLtable函数,但我需要使用更多的低级函数才能工作,因为我的实际HTML比这个简单的表更复杂):

  library(XML)
y< - htmlParse(htmlfile)
x< - xpathApply(y,// table / tr)
z < - xpathApply(x,/ td)

以下是htmlfile:

 < table> 
< tr>
< td> Test1.1< / td> < TD> Test1.2< / td>
< / tr>
< tr>
< td> Test1.3< / td> < TD> Test1.4< / td>
< / tr>
< / table>

在使用xpathApply之后,有没有办法在节点上继续工作?或者还有其他更好的选择来播放节点中的数据吗?

解决方案

尽管定义正确xPath的解决方案似乎更好的你可以这样做:

  library(XML)
y< - htmlParse(htmlfile)
x < - getNodeSet(y,// table / tr)
z < - lapply(x,function(x){
subDoc < - xmlDoc(x)
r < - xpathApply(x,/ td)
free(subDoc)#不确定是否需要
return(r)
})


I am trying to use xpathApply function in XML package in R to extract certain data from a html file. However, after I use xpathApply on some parent nodes of the html document, the class of the resulting object becomes XMLNodeSet, and I cannot further use xpathApply on such object, as this error message appears: "Error in UseMethod("xpathApply") : no applicable method for 'xpathApply' applied to an object of class "XMLNodeSet""

Here is the R script I am trying to replicate my problem (this example is just a simple table, I know I can use readHTMLtable function, but I need to use more low level function to work because my actual html is more complicated than this simple table):

library(XML)
y <- htmlParse(htmlfile)
x <- xpathApply(y, "//table/tr")
z <- xpathApply(x, "/td")

Here is the "htmlfile":

<table>
<tr>
<td> Test1.1 </td> <td> Test1.2 </td>
</tr>
<tr>
<td> Test1.3 </td> <td> Test1.4 </td>
</tr>
</table>

Is there any method to further work on the nodes after using xpathApply? Or are there other good alternatives to play around the data in the nodes?

解决方案

Although the solution of defining the right xPath seems to be better you can do this:

library(XML)
y <- htmlParse(htmlfile)
x <- getNodeSet(y, "//table/tr")
z <- lapply(x, function(x){
                 subDoc <- xmlDoc(x)
                 r <- xpathApply(x, "/td")
                 free(subDoc) # not sure if necessary
                 return(r)
})

这篇关于r - xpathApplication on XMLNodeSet(带有XML包)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆