php DOMDocument-操作和编码 [英] php DOMDocument - manipulating and encoding

查看:78
本文介绍了php DOMDocument-操作和编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($content);
$divs = $dom->getElementsByTagName("div");
foreach ( $divs as $div ) {
    if ( $class = $div->attributes->getNamedItem("class") ) {
        if ( $class->nodeValue == "simplegalleryholder" ) 
            $div->parentNode->removeChild( $div );
    }
}
$content = $dom->saveHTML();

这个简单的代码应该可以帮助我删除

This simple code should help me with removing

<div class="simplegalleryholder"> .... </div> 

来自文档。唯一的问题是,$ content包含utf8编码的特殊字符(±ęść等),这些特殊字符已被处理程序破坏(我改为使用iÄ™Å,ż)。

from the document. The only problem is, that $content contains utf8 encoded special characters (ąęść etc), that are destroyed by proces (i get iÄ™ Å‚ ż instead).

如何我应该解决这个问题以获得正确的结果吗?

How should I approach this issue to get correct result?

推荐答案

指定 UTF-8 不会使基础xml处理库将其作为utf8进行处理。以下变通办法确实很棘手,但是效果很好。

Specifying UTF-8 in the constructor doesn't make the underlying xml processing library process it as utf8. The following workaround is really hacky, but its works reasonably well.

$encodingHint = '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">';
$dom->loadHTML($encodingHint . $html);

https://bugs.php.net/bug.php?id=32547

如果您正在查看在网络浏览器中输出,则发送真实的http标头,而不是http-equiv元标记。这仅用于查看。

If you're viewing the output in a web browser, send a real http header, not an http-equiv meta tag. This is only for viewing. processing with domdocument specifically needs the meta tag.

header('content-type: text/html; charset=utf-8');

这篇关于php DOMDocument-操作和编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆