domdocument字符集问题 [英] domdocument character set issue
问题描述
这是来自女巫的视频,我想获取 og:title
This the video from witch i want to get the og:title
http://www.youtube.com/watch?feature=player_embedded&v=A683kmvRH_8
Php代码
function file_get_contents_curl($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$html = file_get_contents_curl($pageurl);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$titleBackUp = $nodes->item(0)->nodeValue;
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++){
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'title')
$title = $meta->getAttribute('content');
}
标题为Мастило-Връцететиенай-добре[ HQ] ,我得到
ÐаÑÑило-ÐÑÑÑеÑеÑиенай-доб Ñе [HQ]
我也尝试使用
curl_setopt( $ch, CURLOPT_ENCODING, "UTF-8" );
但这确实有用。
I尝试使用 html_entity_decode 但不起作用
I try with html_entity_decode but is not working
推荐答案
如果文档本身不包含此错误一个<元http-equiv = Content-Type content = text / html; charset = utf-8 />
标记。
This can happen if the document itself doesn't contain a <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
tag.
您可以尝试以下任一操作:
You can try either of the following:
-
让
DomDocument
直接从服务器加载HTML(即使用-> loadHTMLFile()
)
Let
DomDocument
load the HTML directly from the server (i.e. use->loadHTMLFile()
)
在通过-> loadHTML()
运行文件之前,使用前面提到的meta标记对文档进行前缀。
Prefix the document with aforementioned meta tag before running it through ->loadHTML()
.
例如,您可以执行以下操作:
For example, you could do this:
libxml_use_internal_errors(true);
$doc->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />' . $html);
libxml_clear_errors();
这是一种让libxml知道应该读取utf-8数据的黑客行为...这是不可能的通过-> loadHTML()
传递编码。
It's a hack to let libxml know it's supposed to read utf-8 data ... it's not possible to pass that encoding via ->loadHTML()
.
这篇关于domdocument字符集问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!