通过DOM解析器转换PRE标签之间的空格 [英] Convert spaces between PRE tags, via DOM parser

查看:129
本文介绍了通过DOM解析器转换PRE标签之间的空格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

正则表达式是我的一个解决方案的原始想法,尽管很明显DOM解析器会更加合适...我想将空格转换为& nbsp; 在HTML文本的字符串中的PRE标签之间。例如:

Regex was my original idea as a solution, although it soon became apparent a DOM parser would be more appropriate... I'd like to convert spaces to   between PRE tags within a string of HTML text. For example:

<table atrr="zxzx"><tr>
<td>adfa a   adfadfaf></td><td><br /> dfa  dfa</td>
</tr></table>
<pre class="abc" id="abc">
abc 123
<span class="abc">abc 123</span>
</pre>
<pre>123 123</pre>

into(注意,span标签属性中的空格保留):

into (note the space in the span tag attribute is preserved):

<table atrr="zxzx"><tr>
<td>adfa a   adfadfaf></td><td><br /> dfa  dfa</td>
</tr></table>
<pre class="abc" id="abc">
abc&nbsp;123
<span class="abc">abc&nbsp;123</span>
</pre>
<pre>123 123</pre>

结果需要序列化成字符串格式,供其他地方使用。

The result needs to be serialised back into string format, for use elsewhere.

推荐答案

当您要插入& nbsp; &符号为& amp; amp; 实体,因为实体是节点和空格只是字符数据。这是怎么做的:

This is somewhat tricky when you want to insert &nbsp; Entities without DOM converting the ampersand to &amp; entities because Entities are nodes and spaces are just character data. Here is how to do it:

$dom = new DOMDocument;
$dom->loadHtml($html);
$xp = new DOMXPath($dom);
foreach ($xp->query('//text()[ancestor::pre]') as $textNode)
{
    $remaining = $textNode;
    while (($nextSpace = strpos($remaining->wholeText, ' ')) !== FALSE) {
        $remaining = $remaining->splitText($nextSpace);
        $remaining->nodeValue = substr($remaining->nodeValue, 1);
        $remaining->parentNode->insertBefore(
            $dom->createEntityReference('nbsp'),
            $remaining
        );
    }
}

获取所有前缀元素并使用其nodeValues不会在这里工作,因为nodeValue属性将包含所有子项的组合的 DOMText值,例如它将包括span子节点的nodeValue。在pre元素上设置nodeValue会删除那些。

Fetching all the pre elements and working with their nodeValues doesnt work here because the nodeValue attribute would contain the combined DOMText values of all the children, e.g. it would include the nodeValue of the span childs. Setting the nodeValue on the pre element would delete those.

因此,我们不是提取预先节点,而是将所有具有前置元素父元素的DOMText节点提取到其轴上:

So instead of fetching the pre nodes, we fetch all the DOMText nodes that have a pre element parent somewhere up on their axis:

DOMElement pre
    DOMText "abc 123"         <-- picking this
    DOMElement span
       DOMText "abc 123"      <-- and this one
DOMElement
    DOMText "123 123"         <-- and this one

然后我们遍历每个DOMText节点,并将它们分割成每个空间的单独的DOMText节点。我们删除空格并在分割节点之前插入一个Entity节点,所以最后你会得到一个树如

We then go through each of those DOMText nodes and split them into separate DOMText nodes at each space. We remove the space and insert a nbsp Entity node before the split node, so in the end you get a tree like

DOMElement pre
    DOMText "abc"
    DOMEntity nbsp
    DOMText "123"
    DOMElement span
       DOMText "abc"
       DOMEntity nbsp
       DOMText "123"
DOMElement
    DOMText "123"
    DOMEntity nbsp
    DOMText "123"

因为我们只使用DOMText节点,任何DOMElements都保持不变,所以它将保留pre元素内的span元素。

Because we only worked with the DOMText nodes, any DOMElements are left untouched and so it will preserve the span elements inside the pre element.

警告:

您的代码片段无效,因为它没有根元素。当使用loadHTML时,libxml将向DOM添加任何缺少的结构,这意味着您将获取包含DOCTYPE,html和body标签的代码段。

Your snippet is not valid because it doesnt have a root element. When using loadHTML, libxml will add any missing structure to the DOM, which means you will get your snippet including a DOCTYPE, html and body tag back.

如果你想原来的代码片段,你必须将 getElementsByTagName 身体节点,并获取所有的孩子以获取 innerHTML 。不幸的是, PHP的DOM实现中没有innerHTML函数或属性,所以我们必须手动执行:

If you want the original snippet back, you'd have to getElementsByTagName the body node and fetch all the children to get the innerHTML. Unfortunately, there is no innerHTML function or property in PHP's DOM implementation, so we have to do that manually:

$innerHtml = '';
foreach ($dom->getElementsByTagName('body')->item(0)->childNodes as $child) {
    $tmp_doc = new DOMDocument();
    $tmp_doc->appendChild($tmp_doc->importNode($child,true));
    $innerHtml .= $tmp_doc->saveHTML();
}
echo $innerHtml;

另见

  • How to get innerHTML of DOMNode?
  • DOMDocument in php
  • https://stackoverflow.com/search?q=user%3A208809+dom

这篇关于通过DOM解析器转换PRE标签之间的空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆