php-curl - php curl 获取网页内容 中文乱码
本文介绍了php-curl - php curl 获取网页内容 中文乱码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
问 题
获取是没问题。。但是似乎字符编码上有些问题,
<?php
//header( "Content-type:text/html;Charset=utf-8" );
$urls = [
'http://jobs.51job.com/'
];
$array = [
// 'user-agent:Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36;'
// 'accept-language:zh-CN,zh;q=0.8,zh-TW;q=0.6;
'Content-Type:text/html; charset=utf-8'
];
var_dump($urls);
foreach ($urls as $url) {
$ch = curl_init();
curl_setopt_array($ch, [
CURLOPT_URL => $url,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => 10,
CURLOPT_TIMEOUT => 30,
CURLOPT_BINARYTRANSFER=>true,
CURLOPT_ENCODING => 'gzip,deflate',
CURLOPT_HTTPHEADER => $array
]);
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
var_dump($info);
mb_convert_encoding($output, 'utf-8', 'GBK,UTF-8,ASCII');
echo $output;
// file_put_contents('str.txt' , $output,FILE_APPEND);
}
顺带问一下获取拉钩内容一直显示页面加载中。。。
<br><html><head><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta name="renderer" content="webkit"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><script type="text/javascript" src="https://www.lagou.com/utrack/trackMid.js?version=1.0.0.3&t=1503291026"></script><body><input type="hidden" id="KEY" value="rsagIwk3yl2hnrkI98FuQACf9eerWodYa0dPJ"/><script type="text/javascript">kfGNYOsx();</script>页面加载中...<script type="text/javascript" src="https://www.lagou.com/upload/oss.js"></script></body></html>
解决方案
51job是gb2312编码,抓到内容转换一下就行
mb_convert_encoding($contents,'utf-8','gb2312');
这篇关于php-curl - php curl 获取网页内容 中文乱码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文