获取< a>的文本当XPath埋入其他标签时<强> [英] Getting the the text of an <a> with XPath when it's buried in another tag e.g. <strong>
问题描述
以下XPath通常足以匹配所有文本包含特定字符串的锚:
// a [contains( (),'SENIOR ASSOCIATES')]
给出这样的例子:
< strong>< strong&
SENIOR ASSOCIATES< br>
< / strong>< / a>
文字包装在< strong>
,在锚点关闭之前还有一个< br>
,所以上面的XPath不会返回任何结果。
XPath如何进行调整,以便它允许包含< strong> $ c等附加标记的
< a>
$ c>,< i>
,< b>
,< br> 等等,而仍然在标准情况下工作?
不要使用 text()
。
// a [contains(。,'SENIOR ASSOCIATES') ]
与您可能认为的相反, text()
不会给你一个元素的文本。
这是一个节点测试,即一个表达式,一个元素的实际节点(!)列表,即文本节点子元素。
这里:
< a href =http:// www.freshminds.net/job/senior-associate/\"><strong>
SENIOR ASSOCIATES< br>
< / strong>< / a>
没有 a
的文本节点子元素, 。所有文本节点都是 strong
的子项。所以 text()
给你零结点。
这里:
< a href =http://www.freshminds.net/job/senior-associate/> <强>
SENIOR ASSOCIATES< br>
< / strong>< / a>
有一个 a
。它是空的(如仅限空白)。
。
另一方面仅选择一个节点(上下文节点,< a>
本身)。
现在, contains()
需要字符串作为参数。如果一个参数不是字符串,则首先完成对字符串的转换。
将节点集(由1个或多个节点组成)转换为字符串是通过将所有集合(*)中第一个节点的文本节点后代。因此,使用。
(或者其更明确的等价的 string(。)
)给你 SENIOR ASSOCIATES
被一堆空白包围,因为XML中有一堆空白。
为了消除这个空格,使用 normalize-space()
函数:
// a [contains(normalize-space(。),'SENIOR ASSOCIATES')]
或更短,因为当前节点是此函数的默认值:
// a [contains(normalize-space(),'SENIOR ASSOCIATES')]
(*)这就是为什么使用 // a [contains(.// text(),' SENIOR ASSOCIATES')]
可以在上面两个样本中的第一个样本中工作,但不在第二个样本中。
The following XPath is usually sufficient for matching all anchors whose text contains a certain string:
//a[contains(text(), 'SENIOR ASSOCIATES')]
Given a case like this though:
<a href="http://www.freshminds.net/job/senior-associate/"><strong>
SENIOR ASSOCIATES <br>
</strong></a>
The text is wrapped in a <strong>
, also there's also a <br>
before the anchor closes, and so the above XPath returns nothing.
How can the XPath be adapted so that it allows for the <a>
containing additional tags such as <strong>
, <i>
, <b>
, <br>
etc. while still working in the standard case?
Don't use text()
.
//a[contains(., 'SENIOR ASSOCIATES')]
Contrary to what you might think, text()
does not give you the text of an element.
It is a node test, i.e. an expression that selects a list of actual nodes (!), namely the text node children of an element.
Here:
<a href="http://www.freshminds.net/job/senior-associate/"><strong>
SENIOR ASSOCIATES <br>
</strong></a>
there are no text node children of a
. All the text nodes are children of strong
. So text()
gives you zero nodes.
Here:
<a href="http://www.freshminds.net/job/senior-associate/"> <strong>
SENIOR ASSOCIATES <br>
</strong></a>
there is one text node child of a
. It's empty (as in "whitespace only").
.
on the other hand selects only one node (the context node, the <a>
itself).
Now, contains()
expects strings as its arguments. If one argument is not a string, a conversion to string is done first.
Converting a node set (consisting of 1 or more nodes) to string is done by concatenating all text node descendants of the first node in the set(*). Therefore using .
(or its more explicit equivalent string(.)
) gives you SENIOR ASSOCIATES
surrounded by a bunch of whitespace, because there is a bunch of whitespace in your XML.
To get rid of that whitespace, use the normalize-space()
function:
//a[contains(normalize-space(.), 'SENIOR ASSOCIATES')]
or, shorter, because "the current node" is the default for this function:
//a[contains(normalize-space(), 'SENIOR ASSOCIATES')]
(*) That's the reason why using //a[contains(.//text(), 'SENIOR ASSOCIATES')]
would work in the first of the two samples above but not in the second one.
这篇关于获取< a>的文本当XPath埋入其他标签时<强>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!