PHP DOMDocument saveHTML无法正确编码西里尔字母 [英] PHP DOMDocument saveHTML not encoding cyrillic correctly

查看:136
本文介绍了PHP DOMDocument saveHTML无法正确编码西里尔字母的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用DOMDocument来操纵html和php7.问题是文本在页面上显示良好(西里尔字母),但是当我转到查看HTML页面源代码"时,效果不佳.它显示如下: Здесь осн

I use DOMDocument to manipulate html and php 7. The problem is that text shows good on page (cyrillic), but when I go to "See HTML page source", it is not good. It shows like this: Здесь осн

可能是什么问题? <meta>字符集为utf-8.我的代码:

What might be wrong? <meta> charset is utf-8. My code:

$dom = new DOMDocument();
if (@$dom->loadHTML(mb_convert_encoding("<div>$body</div>", 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD)) {

    // https://stackoverflow.com/questions/29493678/loadhtml-libxml-html-noimplied-on-an-html-fragment-generates-incorrect-tags

    $container = $dom->getElementsByTagName('div')->item(0);
    $container = $container->parentNode->removeChild($container);

    while ($dom->firstChild)
        $dom->removeChild($doc->firstChild);

    while ($container->firstChild )
        $dom->appendChild($container->firstChild);

    $xpath = new DOMXPath($dom); 
    $headlines = $xpath->query("//h2");
    // some code..

    return $dom->saveHTML();
}

推荐答案

$dom->saveHTML();的问题是,您需要将根节点添加为参数,如下所示:

The problem is with $dom->saveHTML();, you need to add the root node as a parameter, like this:

return $dom->saveHTML((new \DOMXPath($dom))->query('/')->item(0));

突然间,它用替换显示了不同的页面.如果不是,请再次检查$dom->encoding$dom->substituteEntities的值,它们应读为UTF-8TRUE.

The suddenly it renders the page differently, with substitution. If it does not, double check the values of $dom->encoding and $dom->substituteEntities, they should read UTF-8 and TRUE.

这篇关于PHP DOMDocument saveHTML无法正确编码西里尔字母的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆