获取< a>的文本XPath埋在另一个标签中时,例如< strong> [英] Getting the text of an <a> with XPath when it's buried in another tag e.g. <strong>
问题描述
以下XPath通常足以匹配文本中包含特定字符串的所有锚点:
//a [包含(text(),'SENIOR ASSOCIATES')]
尽管如此,还是这样:
< a href ="http://www.freshminds.net/job/senior-associate/">< strong>资深合伙人< br></strong></a>
文本被包裹在< strong>
中,在锚点关闭之前也有一个< br>
,因此上述XPath不会返回任何内容./p>
如何调整XPath,使它允许< a>
包含其他标签,例如< strong>
,< i>.
,< b>
,&br; br>
等,但仍可以在标准情况下使用?
不要使用 text()
.
//a [包含(.,'SENIOR ASSOCIATES')]
与您可能想到的相反, text()
不会为您提供元素的文本.
这是一个节点测试,即选择实际节点列表(!)的表达式,即元素的文本节点子节点.
这里:
< a href ="http://www.freshminds.net/job/senior-associate/">< strong>资深合伙人< br></strong></a>
没有 a
的文本节点子代.所有文本节点都是 strong
的子节点.因此, text()
给您零节点.
这里:
< a href ="http://www.freshminds.net/job/senior-associate/">< strong>资深合伙人< br></strong></a>
有一个 a
的文本节点子代.它是空的(如仅用于空白").
另一方面,
.
仅选择一个节点(上下文节点,< a>
本身).
现在, contains()
需要字符串作为其参数.如果一个参数不是字符串,则首先转换为字符串.
将节点集(由1个或多个节点组成)转换为字符串是通过将集合中第一个节点的所有文本节点后代串联而成的.因此,使用.
(或其更明确的等效项 string(.)
)会为您提供 SENIOR ASSOCIATES
,它被一堆空白包围,因为其中有一个XML中的一堆空白.
要摆脱该空白,请使用 normalize-space()
函数:
//a [包含(normalize-space(.),'SENIOR ASSOCIATES')]
或更短一点,因为此功能的默认值为当前节点":
//a [包含(normalize-space(),'SENIOR ASSOCIATES')]
(*)这就是为什么使用//a [contains(.//text(),'SENIOR ASSOCIATES')]
上面的两个样本,但不在第二个样本中.
The following XPath is usually sufficient for matching all anchors whose text contains a certain string:
//a[contains(text(), 'SENIOR ASSOCIATES')]
Given a case like this though:
<a href="http://www.freshminds.net/job/senior-associate/"><strong>
SENIOR ASSOCIATES <br>
</strong></a>
The text is wrapped in a <strong>
, also there's also a <br>
before the anchor closes, and so the above XPath returns nothing.
How can the XPath be adapted so that it allows for the <a>
containing additional tags such as <strong>
, <i>
, <b>
, <br>
etc. while still working in the standard case?
Don't use text()
.
//a[contains(., 'SENIOR ASSOCIATES')]
Contrary to what you might think, text()
does not give you the text of an element.
It is a node test, i.e. an expression that selects a list of actual nodes (!), namely the text node children of an element.
Here:
<a href="http://www.freshminds.net/job/senior-associate/"><strong>
SENIOR ASSOCIATES <br>
</strong></a>
there are no text node children of a
. All the text nodes are children of strong
. So text()
gives you zero nodes.
Here:
<a href="http://www.freshminds.net/job/senior-associate/"> <strong>
SENIOR ASSOCIATES <br>
</strong></a>
there is one text node child of a
. It's empty (as in "whitespace only").
.
on the other hand selects only one node (the context node, the <a>
itself).
Now, contains()
expects strings as its arguments. If one argument is not a string, a conversion to string is done first.
Converting a node set (consisting of 1 or more nodes) to string is done by concatenating all text node descendants of the first node in the set(*). Therefore using .
(or its more explicit equivalent string(.)
) gives you SENIOR ASSOCIATES
surrounded by a bunch of whitespace, because there is a bunch of whitespace in your XML.
To get rid of that whitespace, use the normalize-space()
function:
//a[contains(normalize-space(.), 'SENIOR ASSOCIATES')]
or, shorter, because "the current node" is the default for this function:
//a[contains(normalize-space(), 'SENIOR ASSOCIATES')]
(*) That's the reason why using //a[contains(.//text(), 'SENIOR ASSOCIATES')]
would work in the first of the two samples above but not in the second one.
这篇关于获取< a>的文本XPath埋在另一个标签中时,例如< strong>的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!