Symfony2爬网程序-将UTF-8与XPATH一起使用 [英] Symfony2 Crawler - Use UTF-8 with XPATH

查看:69
本文介绍了Symfony2爬网程序-将UTF-8与XPATH一起使用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Symfony2 Crawler-捆绑软件来使用XPath。
除编码外,一切正常。

I am using Symfony2 Crawler - Bundle for using XPath. Everything works fine, except the encoding.

我想使用UTF-8编码,而Crawler却不使用它。我注意到,因为& nbsp; 被转换为& nbsp; ,这是一个已知问题:< a href = https://stackoverflow.com/questions/3597105/why-cant-i-get-rid-of-this-nbsp> UTF-8编码问题

I would like to use UTF-8 encoding and the Crawler is somehow not using it. I noticed that because th &nbsp; are converted to Â&nbsp;, which is a known issue: UTF-8 Encoding Issue

我的问题是:如何强制Symfony Crawler使用UTF-8编码?

My question is: How could I force the Symfony Crawler to use UTF-8 Encoding?

这是我正在使用的代码:

Here is the code I am using:

$dom_input = new \DOMDocument("1.0","UTF-8");
$dom_input->encoding = "UTF-8";
$dom_input->formatOutput = true;

$dom_input->loadHTMLFile($myFile);

$crawler = new Crawler($dom_input); 
$paragraphs = $crawler->filterXPath('descendant-or-self::p');

现在,当我在做

foreach($paragraphs as $paragraph) {
    var_dump($paragraph->nodeValue);
}

只要我有  在我的段落中,我得到& nbsp;

As soon as I have a &nbsp; in my paragraph, I am getting Â&nbsp;.

非常感谢

推荐答案

感谢@halfer,我找到了一种解决方法:

Thanks to @halfer, I found a workaround:

而不是使用

$crawler = new Crawler($dom_input);

我使用过:

$crawler = new Crawler();
$crawler->addHtmlContent(utf8_decode($dom_input->saveXML()));

这篇关于Symfony2爬网程序-将UTF-8与XPATH一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆