XPath 选择带有可选中间空白文本节点的前一个元素 [英] XPath to select preceding element with optional intervening whitespace-only text node

查看:60
本文介绍了XPath 选择带有可选中间空白文本节点的前一个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个元素作为上下文,我想选择前面的同级元素并检查它是否具有特定名称.需要注意的是,如果中间文本节点具有非空白内容,我不想选择它.

Given an element as context I want to select the preceding sibling element and check to see if it has a particular name. The caveat is that I do not want to select it if there is an intervening text node that has non-whitespace content.

例如,给定这个 XML 文档......

For example, given this XML document…

<r>
  <a>a1</a><a>a2</a>
   b
  <a>a3</a>
    <a>a4</a>
  <b/>
  <a>a5</a>
</r>

……然后:

  • For "a1" there should be no match (there is no <a> sibling element immediately preceding it)
  • For "a2" then "a1" should be matched (there is no intervening text node)
  • For "a3" there should be no match (there is an intervening text node with non-whitespace contents)
  • For "a4" then "a3" should be matched (the intervening text node is only whitespace)
  • For "a5" there should be no match (the preceding sibling element is not an <a>).

我可以使用 preceding-sibling::*[1][name()="a"]>

但是,我不知道如何说选择以下同级节点,无论元素或文本如何,看看它是否不是文本或normalize-space(.)="". 我最好的猜测是:

However, I can't figure out how to say "select the following sibling node, regardless of element or textness, and see if that's not text or normalize-space(.)="". My best guess was this:

preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]

……但这似乎没有效果.

…but that appears to have no effect.

这是我的测试 Ruby 文件:

Here's my test Ruby file:

require 'nokogiri'

xpath = 'preceding-sibling::*[1][name()="a"][following-sibling::node()[1][not(text()) or normalize-space(.)=""]]'
fragment = Nokogiri::XML.fragment '<a>a1</a><a>a2</a> b <a>a3</a> <a>a4</a> <b/> <a>a5</a>'    

fragment.css('a').each{ |a| p [a.text,a.xpath(xpath).to_s] }
#=> ["a1", ""]
#=> ["a2", ""]
#=> ["a3", "<a>a2</a>"]
#=> ["a4", "<a>a3</a>"]
#=> ["a5", ""]

a2"和a3"的结果是错误的,让我感到困惑.它正确地找到了前面的 <a>,但没有正确验证它的第一个后续兄弟不是文本(应该允许a2"找到a1")或它只是空格(这应该可以防止a3"找到a2".

The result for "a2" and "a3" are what is wrong and confuses me. It finds the preceding <a> correctly, but then does not correctly verify that the first following-sibling of that is either not text (which should allow "a2" to find "a1") or that it is whitespace only (which should prevent "a3" from finding "a2".

编辑:这是我正在编写的 XPath,以及我打算用来做什么:

Edit: Here's the XPath I was writing, and what I intended it to do:

  • preceding-sibling::*[1][name()="a"]... - 找到前面的第一个元素,并确保它是一个 <<代码>.这似乎按预期工作.

  • preceding-sibling::*[1][name()="a"]… - find the first preceding element, and ensure that it is an <a>. This appears to be working as desired.

  • [following-sibling::node()[1][…]] - ensure that the first following node (of the found preceding <a>) matches some conditions

  • not(text()) 或 normalize-space(.)="" - 确保后面的节点不是文本节点,或者它的规范化空间为空
  • not(text()) or normalize-space(.)="" - ensure that this following node is either not a text node, or that the normalized space of it is empty

推荐答案

使用:

/*/a/preceding-sibling::node()
       [not(self::text()[not(normalize-space())])]
            [1]
              [self::a]

基于 XSLT 的验证:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:copy-of select=
       "/*/a
          /preceding-sibling::node()
                      [not(self::text()[not(normalize-space())])]
                                        [1]
                                         [self::a]
    "/>
 </xsl:template>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时:

<r>
  <a>a1</a><a>a2</a>
   b
  <a>a3</a>
    <a>a4</a>
  <b/>
  <a>a5</a>
</r>

XPath 表达式被评估,并且被评估选择的节点被复制到输出:

<a>a1</a>
<a>a3</a>

更新:

问题中的 XPath 表达式有什么问题?

What is wrong with the XPath expression in the question?

问题就在这里:

[not(text()) or normalize-space(.)='']

这会测试上下文节点是否没有文本节点.

This tests if the context node doesn't have a text node child.

但是 OP 想要测试上下文节点是否一个文本节点.

But the OP wants to test if the context node is a text node.

解决方案:

将上面的替换为:

[not(self::text()) or normalize-space(.)='']

基于 XSLT 的验证:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*/a">
     <xsl:copy-of select=
     "preceding-sibling::*[1]
                      [name()='a']
                         [following-sibling::node()[1]
                                    [not(self::text()) or normalize-space(.)='']
                       ]"/>
 </xsl:template>
 <xsl:template match="text()"/>
</xsl:stylesheet>

现在这个转换产生了想要的结果:

<a>a1</a>
<a>a3</a>

这篇关于XPath 选择带有可选中间空白文本节点的前一个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆