来自DomDocument的nodeValue在PHP中返回奇怪的字符 [英] nodeValue from DomDocument returning weird characters in PHP

查看：167 发布时间：2016/11/19 15:38:13 php character-encoding domdocument nodevalue

本文介绍了来自DomDocument的nodeValue在PHP中返回奇怪的字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以，我试图解析HTML页面，并使用 get_elements_by_tag_name（'p'）;查找段落（< p> / code>


So I'm trying to parse HTML pages and looking for paragraphs (<p>) using get_elements_by_tag_name('p');
问题是，当我使用 $ element-> nodeValue 奇怪的字符。使用curl将文档首先加载到$ html中，然后将其加载到DomDocument中。
The problem is that when I use $element->nodeValue, it's returning weird characters. The document is loaded first into $html using curl then loading it into a DomDocument.
我确信它与charsets有关。
I'm sure it has to do with charsets.
下面是一个响应示例：aujourdÃ¢Â€hui。
Here's an example of a response: "aujourdÃ¢Â€Â™hui".
提前感谢。
推荐答案
我有同样的问题，现在注意到loadHTML（）不再需要2个参数，所以我不得不找到一个不同的解决方案。在我的DOM库中使用以下函数，我能够从HTML内容中删除有趣的字符。
I had the same issues and now noticed that loadHTML() no longer takes 2 parameters, so I had to find a different solution.  Using the following function in my DOM library, I was able to remove the funky characters from my HTML content.
private static function load_html($html)
{
    $doc = new DOMDocument;
    $doc->loadHTML('<?xml encoding="UTF-8">' . $html);

    foreach ($doc->childNodes as $node)
        if ($node->nodeType == XML_PI_NODE)
            $doc->removeChild($node);

    $doc->encoding = 'UTF-8';

    return $doc;
}


                        这篇关于来自DomDocument的nodeValue在PHP中返回奇怪的字符的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

来自DomDocument的nodeValue在PHP中返回奇怪的字符 [英] nodeValue from DomDocument returning weird characters in PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录关闭

来自DomDocument的nodeValue在PHP中返回奇怪的字符 [英] nodeValue from DomDocument returning weird characters in PHP

问题描述

推荐答案

相关文章

PHP最新文章

热门教程

热门工具

登录 关闭

登录关闭