为什么这两个DOMDocument函数的行为不同? [英] Why do these two DOMDocument functions behave differently?

查看:74
本文介绍了为什么这两个DOMDocument函数的行为不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里建议使用两种方法来获取DOMDocument节点的外部HTML:如何返回DOMDocument的外部html?

There are two approaches to getting the outer HTML of a DOMDocument node suggested here: How to return outer html of DOMDocument?

我对它们为什么似乎对待HTML实体的区别感兴趣.

I'm interested in why they seem to treat HTML entities differently.

示例:

function outerHTML($node) {
    $doc = new DOMDocument();
    $doc->appendChild($doc->importNode($node, true));
    return $doc->saveHTML();
}

$html = '<p>ACME&rsquo;s 27&rdquo; Monitor is $200.</p>';
$dom = new DOMDocument();
@$dom->loadHTML($html);
$el = $dom->getElementsByTagname('p')->item(0);
echo $el->ownerDocument->saveHtml($el) . PHP_EOL;
echo outerHTML($el) . PHP_EOL;

输出:

<p>ACME’s 27" Monitor is $200.</p>
<p>ACME&rsquo;s 27&rdquo; Monitor is $200.</p>

这两种方法都使用 saveHTML() 但是由于某种原因,该函数将html实体保留在最终输出中,而直接使用节点上下文调用 saveHTML()却没有.任何人都可以解释原因-最好是使用某种权威参考吗?

Both methods use saveHTML() but for some reason the function preserves html entities in the final output, while directly calling saveHTML() with a node context does not. Can anyone explain why - preferably with some kind of authoritative reference?

推荐答案

这比上面的测试用例还要简单:

What this comes down to is even more simple than your test case above:

<?php
$html = '<p>ACME&rsquo;s 27&rdquo; Monitor is $200.</p>';
$dom = new DOMDocument();
@$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

echo $dom->saveHtml($dom->documentElement) . PHP_EOL;
echo $dom->saveHtml() . PHP_EOL;

所以问题就变成了,为什么 DomDocument :: saveHtml 在保存整个文档而不只是一个特定节点时会表现出不同的行为?

So the question becomes, why does DomDocument::saveHtml behave differently when saving an entire document instead of just a specific node?

窥视PHP源代码,我们发现检查以查看它是否在处理单个节点还是整个文档.对于前者,调用 htmlNodeDumpFormatOutput 函数,并将编码显式设置为null.对于后者,使用 htmlDocDumpMemoryFormat 函数,该函数的参数不包括编码.

Taking a peek at the PHP source, we find a check for whether it's working with a single node or a whole document. For the former, the htmlNodeDumpFormatOutput function is called with the encoding explicitly set to null. For the latter, the htmlDocDumpMemoryFormat function is used, the encoding is not included as an argument to this function.

这两个函数均来自libxml2库.查看那个源,我们可以看到 htmlDocDumpMemoryFormat

Both of these functions are from the libxml2 library. Looking at that source, we can see that htmlDocDumpMemoryFormat tries to detect the document encoding, and explicitly sets it to ASCII/HTML if it can't find one.

两个函数最终都调用 htmlNodeListDumpOutput ,将确定的编码传递给它;要么为null(不进行任何编码),要么为ASCII/HTML(使用HTML实体进行编码).

Both functions end up calling htmlNodeListDumpOutput, passing it the encoding that's been determined; either null – which results in no encoding – or ASCII/HTML – which encodes using HTML entities.

我的猜测是,对于文档片段或单个节点,编码被认为不如完整文档重要.

My guess is that, for a document fragment or single node, encoding is considered less important than for a full document.

这篇关于为什么这两个DOMDocument函数的行为不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆