带有 text() 和 SimpleXMLElement->xpath 的 php xpath 与 xpath 预期结果不符 [英] php xpath with text() and SimpleXMLElement->xpath not in line with xpath expected results

查看:40
本文介绍了带有 text() 和 SimpleXMLElement->xpath 的 php xpath 与 xpath 预期结果不符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取/td/span 的所有文本节点.

我正在尝试使用 xpath/td/span/text()

问题是它返回每个文本元素的所有文本节点(这里有两个,193"和120",它返回193120"两次,而不是在单独的元素中返回 193 和 120).

我在任何在线工具上都尝试过完全相同的 xpath,它运行良好,在 php 中,结果完全不同.

使用 SimpleXMLElement

$xhtmlSnippet = '19310

66

195<;/span>.3424212064</span></td>';$xml = new SimpleXMLElement($xhtmlSnippet);$xresult = $xml->xpath('/td/span/text()');foreach($xresult 作为 $xnode){echo "<br/><br/>NodeValue: ".$xnode;}

给我:

<块引用>

节点值:193120

节点值:193120

以下是通过在线工具正常工作的示例(所有其他在线工具也提供预期输出):

在线测试器中的工作示例

使用 DOMDocument + DOMXPath,它似​​乎按预期工作:

 $dom = new DOMDocument;$dom->loadXML($xhtmlSnippet);$xpath = new DOMXPath($dom);foreach ($xpath->query('/td/span/text()) as $textNode) {echo "\n\nTextNode: ".$textNode->nodeValue;}

给出:

<块引用>

文本节点:193

文本节点:120

解决方案

SimpleXMLElement 只能表示元素和属性,无论是单独的还是相同类型的兄弟元素的集合.->xpath() 方法 返回一个SimpleXMLElement 对象的数组,这允许它们是非兄弟节点,但不允许任何其他节点类型.

因此,表达式 /td/span/text() 匹配两个文本节点,但将它们作为表示其父元素的对象返回,在这种情况下恰好是相同的 <span> 元素,为您提供两次相同对象的数组.

谜题的剩余部分是,当您将 SimpleXML 元素转换为字符串时,它将所有直接后代文本和 CDATA 节点合并为一个字符串,因此 193120 粘在一起.

因此输出是193120,两次.

(这绝对是不直观的行为,尽管很难完全了解 SimpleXML 在这种情况下应该做什么;如果 XPath 表达式解析为元素或属性以外​​的其他内容,则生成错误可能会更好).

<小时>

由于 DOM API 为每种可能存在于 XML 中的节点提供了对象,而且 PHP 包含该 API 的完整实现,因此 XPath 表达式将在那里按预期工作.更重要的是,SimpleXML 和 DOM 对象实际上都是围绕相同内部内存结构的包装器,因此您可以使用 dom_import_simplexml()simplexml_import_dom() 编写将两者结合起来的操作.

举一个不太优雅的例子,如果您想在已经使用 SimpleXML 遍历到的元素的上下文中运行 XPath 表达式,您可以执行以下操作:

$dom_node = dom_import_simplexml($simplexml_node);$dom_xpath = new DOMXPath($dom_node->ownerDocument);$dom_xpath_result = $dom_xpath->query('span/text()', $dom_node);foreach($dom_xpath_result 作为 $xnode){echo "<br/><br/>NodeValue: ".$xnode->nodeValue;}

显然,您可以根据需要将其包装成一个函数.另请注意,由于您的表达式从文档根(前导 /)开始,因此实际上下文无关紧要,这就是为什么我在上面使用了略有不同的表达式.

I'm trying to get all text nodes of /td/span.

I'm trying with xpath /td/span/text()

The problem is it's returning ALL the text nodes for every text element (there are two here, "193" and "120", it returns "193120" twice, instead of 193 and 120 in separate elements).

I try the exact same xpath on any online tool, it works fine, in php, completely different results.

using SimpleXMLElement

$xhtmlSnippet = '<td><span>193<span>10</span><span></span><div>66</div><span>195</span><span>.</span><span>34</span><span>242</span><span></span>120<span>64</span></span></td>';

$xml = new SimpleXMLElement($xhtmlSnippet);

$xresult = $xml->xpath('/td/span/text()');    

foreach($xresult as $xnode){
    echo "<br /><br />NodeValue: " . $xnode;
}

Gives me:

NodeValue: 193120

NodeValue: 193120

Here is an example of it working properly via an online tool (ALL of the other online tools give the expected output also):

Working example in online tester

EDIT:

Using DOMDocument + DOMXPath, it seems to work as expected:

    $dom = new DOMDocument;
    $dom->loadXML($xhtmlSnippet);

    $xpath = new DOMXPath($dom);
    
    foreach ($xpath->query('/td/span/text()) as $textNode) {
        echo "\n\nTextNode: " . $textNode->nodeValue;
    }

Gives:

TextNode: 193

TextNode: 120

解决方案

A SimpleXMLElement can only represent elements and attributes, either individually or a collection of siblings of the same type. The ->xpath() method returns an array of SimpleXMLElement objects, which allows them to be non-siblings, but does not allow for any other node type.

Consequently, the expression /td/span/text() matches the two text nodes, but returns them as objects representing their parent element, which in this case happens to be the same <span> element, giving you an array with the same object in twice.

The remaining part of the puzzle is that when you cast a SimpleXML element to string it combines all its direct descendant text and CDATA nodes into one string, so the 193 and 120 get stuck together.

Thus the output is 193120, twice.

(This is definitely unintuitive behaviour, although it's hard to know quite what SimpleXML should do in this situation; perhaps it would be better to produce an error if the XPath expression resolves to something other than elements or attributes).


Since the DOM API has objects for every kind of node that can possibly exist in XML, and PHP includes a full implementation of that API, the XPath expression will work as expected there. What's more, the SimpleXML and DOM objects are actually both wrappers around the same internal memory structures, so you can write operations combining the two using dom_import_simplexml() and simplexml_import_dom().

As a slightly inelegant example, if you wanted to run an XPath expression in the context of an element you'd already traversed to with SimpleXML, you could do something like this:

$dom_node = dom_import_simplexml($simplexml_node);
$dom_xpath = new DOMXPath($dom_node->ownerDocument);
$dom_xpath_result = $dom_xpath->query('span/text()', $dom_node);

foreach($dom_xpath_result as $xnode){
    echo "<br /><br />NodeValue: " . $xnode->nodeValue;
}

Obviously, you could wrap this up into a function as desired. Also note that since your expression starts at the document root (leading /) the actual context is irrelevant, which is why I've used a slightly different expression above.

这篇关于带有 text() 和 SimpleXMLElement->xpath 的 php xpath 与 xpath 预期结果不符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆