PHP DOMDocument saveHTML无法正确编码西里尔字母 [英] PHP DOMDocument saveHTML not encoding cyrillic correctly
问题描述
我使用DOMDocument
来操纵html和php7.问题是文本在页面上显示良好(西里尔字母),但是当我转到查看HTML页面源代码"时,效果不佳.它显示如下:
Здесь осн
I use DOMDocument
to manipulate html and php 7. The problem is that text shows good on page (cyrillic), but when I go to "See HTML page source", it is not good. It shows like this:
Здесь осн
可能是什么问题? <meta>
字符集为utf-8.我的代码:
What might be wrong? <meta>
charset is utf-8. My code:
$dom = new DOMDocument();
if (@$dom->loadHTML(mb_convert_encoding("<div>$body</div>", 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD)) {
// https://stackoverflow.com/questions/29493678/loadhtml-libxml-html-noimplied-on-an-html-fragment-generates-incorrect-tags
$container = $dom->getElementsByTagName('div')->item(0);
$container = $container->parentNode->removeChild($container);
while ($dom->firstChild)
$dom->removeChild($doc->firstChild);
while ($container->firstChild )
$dom->appendChild($container->firstChild);
$xpath = new DOMXPath($dom);
$headlines = $xpath->query("//h2");
// some code..
return $dom->saveHTML();
}
推荐答案
$dom->saveHTML();
的问题是,您需要将根节点添加为参数,如下所示:
The problem is with $dom->saveHTML();
, you need to add the root node as a parameter, like this:
return $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0));
突然间,它用替换显示了不同的页面.如果不是,请再次检查$dom->encoding
和$dom->substituteEntities
的值,它们应读为UTF-8
和TRUE
.
The suddenly it renders the page differently, with substitution. If it does not, double check the values of $dom->encoding
and $dom->substituteEntities
, they should read UTF-8
and TRUE
.
这篇关于PHP DOMDocument saveHTML无法正确编码西里尔字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!