为什么php DOM解析影响charset? [英] Why does php DOM parsing affect charset?

查看:126
本文介绍了为什么php DOM解析影响charset?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

$dom = new DOMDocument();
$dom->loadHTML($string);
$dom->preserveWhiteSpace = false;
$elements = $dom->getElementsByTagName('span');
$spans = array();
foreach($elements as $span) {
    $spans[] = $span;
}
foreach($spans as $span) {
    $span->parentNode->removeChild($span);
}
return $dom->saveHTML();    
//return $string;

当我使用这段代码解析字符串时,改变编码和符号不会显示相同 return $ string 未注释。为什么会这样,以及如何避免字符集更改

When I use this code to parse string it changes encoding and symbols are not shown the same when return $string is uncommented. Why is that so and how to avoid charset change

Ile

推荐答案

不幸的是, DOMDocument 会自动将所有字符转换为HTML实体,除非它知道原始文档的编码。

Unfortunately, it seems that DOMDocument will automatically convert all characters to HTML entities unless it knows the encoding of the original document.

显然,一个选项是添加一个< meta> 标签与内容类型/编码到原始字符串,但这意味着它将存在于输出。

Apparently, one option is to add a <meta> tag with the content type/encoding to the original string, but this means that it will be present in the output as well. Removing it might not be so easy.

我可以想到的另一个选项是手动解码HTML实体,使用如下代码:

Another option I can think of is manually decoding the HTML entities, using a code like this:

$trans = array_flip(get_html_translation_table(HTML_ENTITIES));
unset($trans["&quot;"], $trans["&lt;"], $trans["&gt;"], $trans["&amp;"]);
echo strtr($dom->saveHTML(), $trans);

这是一个非常难看的解决方案,但我不能想到别的东西,不同的HTML解析器。 :(

This is a seriously ugly solution, but I can't think of anything else, other than using a different HTML parser. :(

这篇关于为什么php DOM解析影响charset?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆