Symfony2爬网程序-将UTF-8与XPATH一起使用 [英] Symfony2 Crawler - Use UTF-8 with XPATH
问题描述
我正在使用Symfony2 Crawler-捆绑软件来使用XPath。
除编码外,一切正常。
I am using Symfony2 Crawler - Bundle for using XPath. Everything works fine, except the encoding.
我想使用UTF-8编码,而Crawler却不使用它。我注意到,因为& nbsp;
被转换为& nbsp;
,这是一个已知问题:< a href = https://stackoverflow.com/questions/3597105/why-cant-i-get-rid-of-this-nbsp> UTF-8编码问题
I would like to use UTF-8 encoding and the Crawler is somehow not using it. I noticed that because th
are converted to Â
, which is a known issue: UTF-8 Encoding Issue
我的问题是:如何强制Symfony Crawler使用UTF-8编码?
My question is: How could I force the Symfony Crawler to use UTF-8 Encoding?
这是我正在使用的代码:
Here is the code I am using:
$dom_input = new \DOMDocument("1.0","UTF-8");
$dom_input->encoding = "UTF-8";
$dom_input->formatOutput = true;
$dom_input->loadHTMLFile($myFile);
$crawler = new Crawler($dom_input);
$paragraphs = $crawler->filterXPath('descendant-or-self::p');
现在,当我在做
foreach($paragraphs as $paragraph) {
var_dump($paragraph->nodeValue);
}
只要我有
在我的段落中,我得到& nbsp;
。
As soon as I have a
in my paragraph, I am getting Â
.
非常感谢
推荐答案
感谢@halfer,我找到了一种解决方法:
Thanks to @halfer, I found a workaround:
而不是使用
$crawler = new Crawler($dom_input);
我使用过:
$crawler = new Crawler();
$crawler->addHtmlContent(utf8_decode($dom_input->saveXML()));
这篇关于Symfony2爬网程序-将UTF-8与XPATH一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!