获取< a>的文本当XPath埋入其他标签时<强> [英] Getting the the text of an <a> with XPath when it's buried in another tag e.g. <strong>

查看:102
本文介绍了获取< a>的文本当XPath埋入其他标签时<强>的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下XPath通常足以匹配所有文本包含特定字符串的锚:

  // a [contains( (),'SENIOR ASSOCIATES')] 

给出这样的例子:



< strong>< strong&
SENIOR ASSOCIATES< br>
< / strong>< / a>

文字包装在< strong> ,在锚点关闭之前还有一个< br> ,所以上面的XPath不会返回任何结果。

XPath如何进行调整,以便它允许包含< strong> < a> $ c>,< i> < b> < br> 等等,而仍然在标准情况下工作?

解决方案

不要使用 text()

  // a [contains(。,'SENIOR ASSOCIATES') ] 






与您可能认为的相反, text()不会给你一个元素的文本。



这是一个节点测试,即一个表达式,一个元素的实际节点(!)列表,即文本节点子元素



这里:

 < a href =http:// www.freshminds.net/job/senior-associate/\"><strong> 
SENIOR ASSOCIATES< br>
< / strong>< / a>

没有 a 的文本节点子元素, 。所有文本节点都是 strong 的子项。所以 text()给你零结点。



这里:

 < a href =http://www.freshminds.net/job/senior-associate/> <强> 
SENIOR ASSOCIATES< br>
< / strong>< / a>

有一个 a 。它是空的(如仅限空白)。






另一方面仅选择一个节点(上下文节点,< a> 本身)。

现在, contains()需要字符串作为参数。如果一个参数不是字符串,则首先完成对字符串的转换。



将节点集(由1个或多个节点组成)转换为字符串是通过将所有集合(*)中第一个节点的文本节点后代。因此,使用(或者其更明确的等价的 string(。))给你 SENIOR ASSOCIATES 被一堆空白包围,因为XML中有一堆空白。



为了消除这个空格,使用 normalize-space()函数:

  // a [contains(normalize-space(。),'SENIOR ASSOCIATES')] 

或更短,因为当前节点是此函数的默认值:

  // a [contains(normalize-space(),'SENIOR ASSOCIATES')] 






(*)这就是为什么使用 // a [contains(.// text(),' SENIOR ASSOCIATES')] 可以在上面两个样本中的第一个样本中工作,但不在第二个样本中。


The following XPath is usually sufficient for matching all anchors whose text contains a certain string:

//a[contains(text(), 'SENIOR ASSOCIATES')]

Given a case like this though:

<a href="http://www.freshminds.net/job/senior-associate/"><strong>
                        SENIOR ASSOCIATES <br> 
                        </strong></a>

The text is wrapped in a <strong>, also there's also a <br> before the anchor closes, and so the above XPath returns nothing.

How can the XPath be adapted so that it allows for the <a> containing additional tags such as <strong>, <i>, <b>, <br> etc. while still working in the standard case?

解决方案

Don't use text().

//a[contains(., 'SENIOR ASSOCIATES')]


Contrary to what you might think, text() does not give you the text of an element.

It is a node test, i.e. an expression that selects a list of actual nodes (!), namely the text node children of an element.

Here:

<a href="http://www.freshminds.net/job/senior-associate/"><strong>
                    SENIOR ASSOCIATES <br> 
                    </strong></a>

there are no text node children of a. All the text nodes are children of strong. So text() gives you zero nodes.

Here:

<a href="http://www.freshminds.net/job/senior-associate/"> <strong>
                    SENIOR ASSOCIATES <br> 
                    </strong></a>

there is one text node child of a. It's empty (as in "whitespace only").


. on the other hand selects only one node (the context node, the <a> itself).

Now, contains() expects strings as its arguments. If one argument is not a string, a conversion to string is done first.

Converting a node set (consisting of 1 or more nodes) to string is done by concatenating all text node descendants of the first node in the set(*). Therefore using . (or its more explicit equivalent string(.)) gives you SENIOR ASSOCIATES surrounded by a bunch of whitespace, because there is a bunch of whitespace in your XML.

To get rid of that whitespace, use the normalize-space() function:

//a[contains(normalize-space(.), 'SENIOR ASSOCIATES')]

or, shorter, because "the current node" is the default for this function:

//a[contains(normalize-space(), 'SENIOR ASSOCIATES')]


(*) That's the reason why using //a[contains(.//text(), 'SENIOR ASSOCIATES')] would work in the first of the two samples above but not in the second one.

这篇关于获取&lt; a&gt;的文本当XPath埋入其他标签时&LT;强&GT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆