如何遍历 SimpleXML 来编辑文本节点? [英] How to traverse SimpleXML to edit text nodes?

查看:30
本文介绍了如何遍历 SimpleXML 来编辑文本节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要用 SimpleXML 实现以下算法:

  1. 将 XML 片段字符串放入 SimpleXML 对象中;
  2. 遍历所有节点,选择文本节点;
  3. 编辑文本节点(例如转换为大写);
  4. 以字符串形式返回 xml.

问题:

  • 如何加载带有命名实体的 XML(例如  ).

  • 要遍历 XML 以仅获取文本节点...使用 $sx->xpath('//text()'); 我无法编辑节点,如何选择要编辑的文本节点?

解决方案

您可以通过分配给 $node[0] 来覆盖 SimpleXML XPath 查询返回的节点的文本内容,例如

foreach ( $sx->xpath('//text()') as $text_node ){$text_node[0] = '你好';}

但是,请注意 SimpleXML 本身并没有真正的文本节点表示,因此如果一个元素中既有子元素又有文本,这种循环的行为就会很奇怪.>

例如给定 XML foobarbaz quux,包含 foobar 的两个文本节点都将在 SimpleXML 中由第一个 元素表示,其全部内容将替换成'Hello',重复两次,如下图(live demo这里).在替换文本中使用计数器变量,我们可以清楚地看到发生了什么 - 所需的输出将是 Hello 1Hello 2Hello3,但实际结果是Hello 2Hello 3.

$sx = simplexml_load_string('<a><b>foo<c/>bar</b><b>baz quux</b></a>');$计数器= 1;foreach ( $sx->xpath('//text()') as $text_node ){$text_node[0] = '你好' .$计数器++;}echo $sx->asXML();

这种操作,至少在您对问题进行构架时(查找文本节点,而不是迭代,可能递归地遍历一组特定的元素),更适合 DOM API 而不是 SimpleXML.请记住,两者之间没有性能差异(它们都是围绕同一个 XML 解析器的包装器),并且您可以通过使用 simplexml_import_dom()dom_import_simplexml(),同样没有额外的开销,因为文档没有需要重新解析.

这是上面的示例,使用 SimpleXML 和 DOM 的混合修复(现场演示).如果这是整个代码,您可以直接使用 DOM 进行解析,但这表明如果您已经有其他代码使用 SimpleXML 操作此文档,那么将它们混合是多么容易.请注意,最后,我们使用原始 SimpleXML 对象输出 XML - 我们不需要运行 simplexml_import_dom($dom),因为两个对象都引用内存中相同的解析文档".

$sx = simplexml_load_string('<a><b>foo<c/>bar</b><b>baz quux</b></a>');$dom = dom_import_simplexml($sx);$计数器= 1;$xpath = new DOMXpath($dom->ownerDocument);foreach ( $xpath->query('//text()') as $text_node ){$text_node->nodeValue = '你好' .$计数器++;}echo $sx->asXML();

I need to implement the following algorithm with SimpleXML:

  1. put a XML fragment string into a SimpleXML object;
  2. traverse all the nodes, selecting text nodes;
  3. edit the text node (example convert to upper case);
  4. return the xml as string.

PROBLEMS:

  • How to load a XML with named entities (ex. &nbsp;).

  • To traverse XML to get only text nodes... With $sx->xpath('//text()'); I can not edit the nodes, how to select text nodes to edition?

解决方案

You can override the text content of a node returned by a SimpleXML XPath query by assigning to $node[0], e.g.

foreach ( $sx->xpath('//text()') as $text_node )
{
    $text_node[0] = 'Hello';
}

However, beware that SimpleXML does not really have a representation of a text node per se, so this kind of loop will behave oddly if there are both child elements and text within an element.

For instance given the XML <a><b>foo<c />bar</b><b>baz quux</b></a>, the two text nodes containing foo and bar will both be represented in SimpleXML by the first <b> element, the entire contents of which will be replaced by 'Hello', twice over, as shown in the below (live demo here). Using a counter variable in the substituted text, we can see clearly what's happening - the desired output would be <a><b>Hello 1<c />Hello 2</b><b>Hello 3</b></a>, but the actual result is <a><b>Hello 2</b><b>Hello 3</b></a>.

$sx = simplexml_load_string('<a><b>foo<c />bar</b><b>baz quux</b></a>');

$counter = 1;
foreach ( $sx->xpath('//text()') as $text_node )
{
     $text_node[0] = 'Hello ' . $counter++;
}

echo $sx->asXML();

This kind of manipulation, at least as you've framed the problem (finding text nodes, rather than iterating, possibly recursively, over a particular set of elements), is much more suited to the DOM API rather than SimpleXML. Bear in mind that there is no performance difference between the two (they are both wrappers around the same XML parser), and that you can combine operations using the two APIs on the same document by using simplexml_import_dom() and dom_import_simplexml(), again without additional overhead as the document doesn't need to be re-parsed.

Here is the above example fixed by using a mixture of SimpleXML and DOM (live demo). If this were the whole code, you could just parse with DOM directly, but this demonstrates how easy they are to mix if you already have other code manipulating this document with SimpleXML. Note that at the end, we output the XML using the original SimpleXML object - we don't need to run simplexml_import_dom($dom), because both objects refer to the same parsed "document" in memory.

$sx = simplexml_load_string('<a><b>foo<c />bar</b><b>baz quux</b></a>');
$dom = dom_import_simplexml($sx);

$counter = 1;
$xpath = new DOMXpath($dom->ownerDocument);
foreach ( $xpath->query('//text()') as $text_node )
{
     $text_node->nodeValue = 'Hello ' . $counter++;
}

echo $sx->asXML();

这篇关于如何遍历 SimpleXML 来编辑文本节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆