R 中使用 XML 包的 XPath [英] XPath within R using XML package

查看:24
本文介绍了R 中使用 XML 包的 XPath的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 XPath 的新手,但我可以看到它有多么强大.我正在查看此 link 的源代码,只是想要从页面的以下两部分中提取内容和用户名,为简单起见,它们位于源代码顶部附近.

<块引用>

content="[Archive] Simburgur 的直播流 [离线] 战争机器 3"

Simburgur

这是我在 R 中的代码:

doc <- htmlParse("http://forums.epicgames.com/archive/index.php/t-672775.html")xpathSApply(doc, "//head/meta[@name=\"description\"]")

哪个返回

<代码>[[1]]<meta name="description" content="[存档] Simburgur 的直播 [离线] 战争机器 3"/>

显然,在这个例子中,我想要的只是 content= 引号内的内容,但我卡住了,似乎无法让我的表达式返回我想要的字符串.

我再说一遍.我是 XPath 的新手.:)

解决方案

使用:

/*/head/meta[@name='description']/@content

这仍然会选择一个属性节点,但在您的 PL 中可能有一种简单的方法来获取属性的字符串值.

要仅获取字符串值,请使用:

string(/*/head/meta[@name='description']/@content)

请注意:使用 // 缩写可能会导致 XPath 表达式的计算速度非常慢,因为它可能会导致整个(子)树的线性遍历.

如果 XML 文档的结构是静态已知的,则始终避免使用 //.

I am new to XPath, but I can see how powerful it is. I am looking at the source code of this link and simply want to extract the contents and username from the following two pieces of the page, which for simplicity sake are located near the top of the source code.

content="[Archive] Simburgur's Live Stream [Offline] Gears of War 3"

<div class="username">Simburgur</div>

Here is my code within R:

doc <- htmlParse("http://forums.epicgames.com/archive/index.php/t-672775.html")
xpathSApply(doc, "//head/meta[@name=\"description\"]")

which returns

[[1]]
<meta name="description" content="[Archive]  Simburgur's Live Stream [Offline] Gears of War 3" /> 

Obviously, in this example, all I want is what is inside the quotes of content= but am stuck and can not seem to get my expression to return the string I want.

I repeat. I am new to XPath. :)

解决方案

Use:

/*/head/meta[@name='description']/@content

This still selects an attribute node, but probably there is an easy way in your PL to get the string value of the attribute.

To get just the string value, use:

string(/*/head/meta[@name='description']/@content)

Do note: Using the // abbreviation may result in very slow evaluation of the XPath expression, because it may cause a linear traversal of a whole (sub)tree.

Always avoid using // if the structure of the XML document is statically known .

这篇关于R 中使用 XML 包的 XPath的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆